Faster indexing rebuild

I use hibernate search 5.11.11.Final.
My database has around 5.000.000 results.

For re-indexing i have the below code and need around 2hours in order to complete the process:

public void initiateIndexing() throws InterruptedException {
	log.info("Initiating indexing...");
	final FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(em);
	fullTextEntityManager.createIndexer().purgeAllOnStart(true);
	fullTextEntityManager.createIndexer().optimizeAfterPurge(true);
	fullTextEntityManager.createIndexer().batchSizeToLoadObjects(30);
	fullTextEntityManager.createIndexer().cacheMode(CacheMode.NORMAL);
	fullTextEntityManager.createIndexer().threadsToLoadObjects(12);
	fullTextEntityManager.createIndexer().typesToIndexInParallel(3);
	fullTextEntityManager.createIndexer().startAndWait();
	fullTextEntityManager.flushToIndexes();
	log.info("All entities indexed");
}

Is it possible to be the re-indexing faster ?

Thank you !

Probably? Without spending time on your model (more time that I can reasonably spend on free support forums), it’s hard to say.

You’ll find a section with some advice to tune mass indexing in the documentation: Hibernate Search 6.1.7.Final: Reference Documentation

Upgrading to Hibernate Search 6 might help, though we didn’t work much on improving performance. The Elasticsearch integration should be better, though. See here for a migration guide: Hibernate Search 6.0.10.Final: Migration Guide from 5.11

Apart from the recommendations above, I’d say you probably should use a larger batch size if you can. If you have other, more specific questions, I can try to answer that.

1 Like

Please also note that in the code example you’ve shared with us, you are creating a new indexer to set each option and then calling a startAndWait() on yet another new indexer. That leads to none of the settings being actually applied when the process starts. Instead, you can create an indexer once:

public void initiateIndexing() throws InterruptedException {
	log.info("Initiating indexing...");
	final FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(em);
	MassIndexer indexer = fullTextEntityManager.createIndexer();
	indexer.purgeAllOnStart(true);
	indexer.optimizeAfterPurge(true);
	indexer.batchSizeToLoadObjects(30);
	indexer.cacheMode(CacheMode.NORMAL);
	indexer.threadsToLoadObjects(12);
	indexer.typesToIndexInParallel(3);
	indexer.startAndWait();
	fullTextEntityManager.flushToIndexes();
	log.info("All entities indexed");
}
2 Likes

Thank you for your repy !Could you suggest me a batchSizeToLoadObjects to use ?

Thank you for your reply. I will try it and i will come back ! :slight_smile:

Not really, that depends on the size of your object graphs and what your machine is able to handle. Try 50, 100, 200? Higher numbers can be better, but obviously you can’t load the whole database all at once in memory, so going too high will not be good either.

1 Like

i get the following error:

ERROR- 2022-11-08 12:40:52.120 21732-[ntifierloader-1] o.h.s.e.i.LogErrorHandler : HSEARCH000058: HSEARCH000211: An exception occurred while the MassIndexer was fetching the primary identifiers list

If I were you, I would also have a look at enabling the slow query log for your database (for instance, for PostgreSQL, you have log_min_duration_statement) to check if nothing pops up.

You might miss an index on some of your columns and loading all the items might make it extremely visible.

1 Like

One of the errors is the following :

DEBUG- 2022-11-08 13:25:33.612 e[35m15296e[0;39me[2m-e[0;39me[2m[entityloader-11]e[0;39m e[36mo.h.SQL                                 e[0;39m  : select table0_.fieldid as field from table0_ where table0_.fieldid =?
ERROR- 2022-11-08 13:25:33.611 e[35m15296e[0;39me[2m-e[0;39me[2m[ entityloader-8]e[0;39m e[36mo.h.s.e.i.LogErrorHandler               e[0;39m  : HSEARCH000058: HSEARCH000212: An exception occurred while the MassIndexer was transforming identifiers to Lucene Documents

org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
	at org.hibernate.exception.internal.SQLExceptionTypeDelegate.convert(SQLExceptionTypeDelegate.java:48) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:37) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:113) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:99) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:111) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getPhysicalConnection(LogicalConnectionManagedImpl.java:138) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getConnectionForTransactionManagement(LogicalConnectionManagedImpl.java:276) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.begin(LogicalConnectionManagedImpl.java:284) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl$TransactionDriverControlImpl.begin(JdbcResourceLocalTransactionCoordinatorImpl.java:246) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.engine.transaction.internal.TransactionImpl.begin(TransactionImpl.java:83) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.internal.AbstractSharedSessionContract.beginTransaction(AbstractSharedSessionContract.java:503) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.search.batchindexing.impl.IdentifierConsumerDocumentProducer.beginTransaction(IdentifierConsumerDocumentProducer.java:194) ~[hibernate-search-orm-5.11.10.Final.jar:5.11.10.Final]
	at org.hibernate.search.batchindexing.impl.IdentifierConsumerDocumentProducer.loadList(IdentifierConsumerDocumentProducer.java:164) ~[hibernate-search-orm-5.11.10.Final.jar:5.11.10.Final]
	at org.hibernate.search.batchindexing.impl.IdentifierConsumerDocumentProducer.loadAllFromQueue(IdentifierConsumerDocumentProducer.java:140) ~[hibernate-search-orm-5.11.10.Final.jar:5.11.10.Final]
	at org.hibernate.search.batchindexing.impl.IdentifierConsumerDocumentProducer.run(IdentifierConsumerDocumentProducer.java:120) ~[hibernate-search-orm-5.11.10.Final.jar:5.11.10.Final]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
	at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264) ~[?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
	at java.lang.Thread.run(Thread.java:834) ~[?:?]
Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30016ms.
	at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:696) ~[HikariCP-4.0.3.jar:?]
	at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:197) ~[HikariCP-4.0.3.jar:?]
	at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:162) ~[HikariCP-4.0.3.jar:?]
	at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:128) ~[HikariCP-4.0.3.jar:?]
	at org.hibernate.engine.jdbc.connections.internal.DatasourceConnectionProviderImpl.getConnection(DatasourceConnectionProviderImpl.java:122) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:38) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:108) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	... 16 more

But i can’t understand why it happens !

Most likely the underlying DB connection pool (HikariCP) has fewer connections than the number of threads you are trying to index in parallel. If the loading of one batch takes long - the other threads won’t acquire a connection and will timeout.

I’d suggest checking how many threads are configured in the DB pool and either increasing it or decreasing the number of threads to load the objects.

Which is better to use ?

indexer.threadsToLoadObjects(4);
indexer.typesToIndexInParallel(2);

“threadsToLoadObjects(8 or bigger) give me the above error”

or

indexer.threadsToLoadObjects(12);
indexer.typesToIndexInParallel(1);

“typesToIndexInParallel(2 or bigger) give me the above error”

Read the documentation I mentioned, make an educated guess based on your model and data, and test each alternative (possibly on a small subset of your data). We don’t know any more than you do how indexing will perform in your case, it depends on too many variables.

That documentation also mentions how to determine the maximum number of threads you can use.

1 Like

Thank you both for your time ! :slight_smile:

1 Like