Hibernate MassIndexer on Unstable DB Server

We are developing a system that allows users to search within an unstable DB which is located on a remote server that we do not control. For this task we use Hibernate and Springboot infrastructures.
When our server application is started it starts the indexing task which that is supposed to run every day at 12:00. The purpose of the task is to index the unstable remote DB. For that purpose we use MassIndexer in the following way –

FullTextEntityManager fullTextEntityManagerL = Search.getFullTextEntityManager(em);
fullTextEntityManagerL.createIndexer(classes)
        .batchSizeToLoadObjects(50)
        .threadsToLoadObjects(5)
        .typesToIndexInParallel(2)
        .idFetchSize(15)
        //.progressMonitor(monitor)
        .transactionTimeout(1800)
        .startAndWait();

The problem is that the server disconnects frequently during the indexing time and the results of the index that we receive are broken. It causes issues with the search – we can not execute the search with the broken index and our server stops working properly. We would like to find a way to identify that the index that we are using completed correctly and not broken. In case it is broken we would like to use previous successfully completed index and restart the task to try again in 1 hour.
The database is Legacy one and it should not be changed by our application and also that no primary keys to the tables we need are present

Another problem that we face is that during the search functionality of our application, while running search on the index, the server disconnected and the results can not be retrieved. We would like to recognize during the search function that server is not available and to inform the user about it and to return empty results properly. Currently there is no way to check it. For search we use DatabaseService .search(filter, handler) functionality in a following way –

@Component
public class AsyncSearchExecutor {

    private Logger logger = LoggerFactory.getLogger(AsyncSearchExecutor.class);

    private final DatabaseService databaseService;

    @Autowired
    public AsyncSearchExecutor(DatabaseService databaseService) {
        this.databaseService = databaseService;
    }

    public List search(FullTextSearchFilter filter, DatabaseSearchHandler... searchHandlers) {

        logger.info("Executed search for {}.", filter);

        List<FullTextSearchSimpleResult> result = new ArrayList();

        int index = 0;
        CompletableFuture[] completableFutures = new CompletableFuture[searchHandlers.length];
        for (DatabaseSearchHandler handler : searchHandlers) {
            CompletableFuture<List<FullTextSearchSimpleResult>> results = databaseService.search(filter, handler);
            completableFutures[index] = results;
            index++;
        }

        // Wait until they are all done
        CompletableFuture.allOf(completableFutures).join();

        try {
            for (CompletableFuture future : completableFutures) {
                result.addAll((List<FullTextSearchSimpleResult>) future.get());
            }
        } catch (InterruptedException | ExecutionException e) {
            logger.error("Exception {} occurred on uniting search results.", e.getMessage());
        }

        return result;
    }
}

return asyncSearchExecutor.search(filter, oDatabaseSerachHandler, wDBSearchDBHandler);

There are multiple questions here. I’ll start with the easiest one:

Detecting loading failures during searches

during the search functionality of our application, while running search on the index, the server disconnected and the results can not be retrieved

So from what I understand, the database disconnects while Hibernate Search is retrieving the entities from the database. Hibernate Search is not designed to recover from such failures. Depending on what fails exactly, it will throw an exception. Can’t you just catch this exception and return empty results accordingly?

If there’s, somehow, a problem with that strategy, another option is to ask Hibernate Search to only retrieve entity IDs (from the index), then do the loading yourself. Then you’ll be able to handle loading errors as you see fit.
To that end, just use projections: Hibernate Search 5.11.12.Final: Reference Guide

FullTextQuery query = fullTextEntityManager.createFullTextQuery(luceneQuery, MyEntity.class);
query.setProjection(ProjectionConstants.ID);
List<Object[]> idProjections = query.list();

// Then load the corresponding entities
List<Serializable> ids = new ArrayList<>( idProjections.size() );
for ( Object[] projection : idProjections ) {
    ids.add( (Serializable) projection[0] );
}
try {
    return entityManager.unwrap( Session.class ).byMultipleIds( clazz )
        .with( CacheMode.<pick a cache mode> ) // May be ommitted
        .withBatchSize( <pick a batch size> ) // May be ommitted
        .multiLoad( ids );
}
catch (RuntimeException e) {
   // Something went wrong when loading, handle this here.
}

I’m not sure why you would need that, though, since Hibernate Search should also throw an exception when loading fails.

Merging results from multiple searches

for (DatabaseSearchHandler handler : searchHandlers)

I see you’re running multiple queries on multiple servers and then trying to merge the results back into a single list. Be warned: if you want to use paging (and you likely will at some point), merging the results back into a single list will not be an easy task. What you’re doing will only be easy if you just want to retrieve the top results.

Detecting MassIndexing failure

We would like to find a way to identify that the index that we are using completed correctly and not broken

You could use an ErrorHandler to detect any error happening during indexing. The default one just logs errors, but you can plug in your own behavior.

Implement the dedicated interface:

package com.mycompany;

public class MyErrorHandler implements ErrorHandler {
	public static final LongAdder ERROR_COUNT = new LongAdder();

	public static void resetErrorCount() {
		ERROR_COUNT.reset();
	}

	public static void getErrorCount() {
		ERROR_COUNT.longValue();
	}

	@Override
	public void handle(ErrorContext context) {
		// Handle indexing errors, typically I/O errors while writing to the index.
		
		// Don't forget to log with your logging framework. There's more information in the context if you want a better message.
		logger.error( "Error while indexing", context.getThrowable() );

		// Increment the error count: 
		errorCount.increment();
	}

	@Override
	public void handleException(String errorMsg, Throwable exception) {
		// Handle other errors, typically errors while getting information from the database.
		
		// Don't forget to log with your logging framework.
		logger.error( errorMsg, context.getThrowable() );

		// Increment the error count: 
		ERROR_COUNT.increment();
	}
}

Then reference your implementation in the configuration:

hibernate.search.error_handler com.mycompany.MyErrorHandler

Then in your code:

FullTextEntityManager fullTextEntityManagerL = Search.getFullTextEntityManager(em);
MyErrorHandler.resetErrorCount();
fullTextEntityManagerL.createIndexer(classes)
        .batchSizeToLoadObjects(50)
        .threadsToLoadObjects(5)
        .typesToIndexInParallel(2)
        .idFetchSize(15)
        //.progressMonitor(monitor)
        .transactionTimeout(1800)
        .startAndWait();
if ( MyErrorHandler.getErrorCount() > 0 ) {
    // Something went wrong
}

Recovering from MassIndexing failure

In case it is broken we would like to use previous successfully completed index and restart the task to try again in 1 hour.

That’s… tough. The MassIndexer only works on one copy of the index, so it will completely purge the existing index on starts. So if it fails, the index is gone.
Your only solution would be to run the mass indexer in a separate application with a separate index location, and copy the index directory to your main application if indexing succeeds.

Before you copy the index directory, if you’re using the Lucene integration, don’t forget to tell Hibernate Search to release all locks:

EntityManagerFactory entityManagerFactory = ...; // this can be either injected with @PersistenceUnit, or retrieved from an entity manager using .getEntityManagerFactory
SearchIntegrator searchIntegrator = SearchIntegratorHelper.extractFromEntityManagerFactory( entityManagerFactory );
for ( EntityIndexBinding value : searchIntegrator.getIndexBindings().values() ) {
    for ( IndexManager indexManager : value.getIndexManagerSelector().all() ) {
        // Flush pending index changes and release lock
        indexManager.flushAndReleaseResources();
    }
}

Be aware that this code uses SPIs, meaning you might experience incompatible changes in minor releases of Hibernate Search. If you’re ready to update your code when upgrading Hibernate Search, that should not be a problem.

If you run into index corruption, read this thread, which is fairly close from your use case and explains a few things: Lucene index corruption

Hello. I am a colleague of Andrew. Thank you for Your answer. It covered pretty much what i thought, but it is best know to ask professionals. About the search results we are considering to make refactoring of the search to be step by step on pages one database after another.

But because we all are newbies when related to full text search with hibernate i have additional question. When i try to open particular Entity index folder, luke won’t open it and it says that luke wasn’t able to open it. I am using luke-swing-8.0.0. If i sent you somehow any folder. Would you be able to tell me what we are doing wrong? Here is a link to archive of very small entity folder i have shared it on google drive - https://drive.google.com/open?id=1Zu01qAnNjl400njSEOeX5UuovnR1eFHV

Hi,

Use an older version of luke, such as 5 or 6. Hibernate Search uses Lucene 5.5, and luke 8 doesn’t support it anymore.

Lucene 5.5 is fairly old, but we haven’t been able to upgrade for backward compatibility reasons. We will upgrade to Lucene 8 in Search 6.

Hi,

Thank you. I will try with 5 or 6. If there is a problem i will write.

Have a really nice and peaceful day.