Faster indexing rebuild

Tony · November 8, 2022, 7:33am

I use hibernate search 5.11.11.Final.
My database has around 5.000.000 results.

For re-indexing i have the below code and need around 2hours in order to complete the process:

public void initiateIndexing() throws InterruptedException {
	log.info("Initiating indexing...");
	final FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(em);
	fullTextEntityManager.createIndexer().purgeAllOnStart(true);
	fullTextEntityManager.createIndexer().optimizeAfterPurge(true);
	fullTextEntityManager.createIndexer().batchSizeToLoadObjects(30);
	fullTextEntityManager.createIndexer().cacheMode(CacheMode.NORMAL);
	fullTextEntityManager.createIndexer().threadsToLoadObjects(12);
	fullTextEntityManager.createIndexer().typesToIndexInParallel(3);
	fullTextEntityManager.createIndexer().startAndWait();
	fullTextEntityManager.flushToIndexes();
	log.info("All entities indexed");
}

Is it possible to be the re-indexing faster ?

Thank you !

yrodiere · November 8, 2022, 8:05am

Probably? Without spending time on your model (more time that I can reasonably spend on free support forums), it’s hard to say.

You’ll find a section with some advice to tune mass indexing in the documentation: Hibernate Search 6.1.7.Final: Reference Documentation

Upgrading to Hibernate Search 6 might help, though we didn’t work much on improving performance. The Elasticsearch integration should be better, though. See here for a migration guide: Hibernate Search 6.0.10.Final: Migration Guide from 5.11

Apart from the recommendations above, I’d say you probably should use a larger batch size if you can. If you have other, more specific questions, I can try to answer that.

mbekhta · November 8, 2022, 10:24am

Please also note that in the code example you’ve shared with us, you are creating a new indexer to set each option and then calling a startAndWait() on yet another new indexer. That leads to none of the settings being actually applied when the process starts. Instead, you can create an indexer once:

public void initiateIndexing() throws InterruptedException {
	log.info("Initiating indexing...");
	final FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(em);
	MassIndexer indexer = fullTextEntityManager.createIndexer();
	indexer.purgeAllOnStart(true);
	indexer.optimizeAfterPurge(true);
	indexer.batchSizeToLoadObjects(30);
	indexer.cacheMode(CacheMode.NORMAL);
	indexer.threadsToLoadObjects(12);
	indexer.typesToIndexInParallel(3);
	indexer.startAndWait();
	fullTextEntityManager.flushToIndexes();
	log.info("All entities indexed");
}

Tony · November 8, 2022, 10:36am

Thank you for your repy !Could you suggest me a batchSizeToLoadObjects to use ?

Tony · November 8, 2022, 10:36am

Thank you for your reply. I will try it and i will come back !

yrodiere · November 8, 2022, 10:40am

Not really, that depends on the size of your object graphs and what your machine is able to handle. Try 50, 100, 200? Higher numbers can be better, but obviously you can’t load the whole database all at once in memory, so going too high will not be good either.

Tony · November 8, 2022, 10:42am

i get the following error:

ERROR- 2022-11-08 12:40:52.120 21732-[ntifierloader-1] o.h.s.e.i.LogErrorHandler : HSEARCH000058: HSEARCH000211: An exception occurred while the MassIndexer was fetching the primary identifiers list

gsmet · November 8, 2022, 10:55am

If I were you, I would also have a look at enabling the slow query log for your database (for instance, for PostgreSQL, you have log_min_duration_statement) to check if nothing pops up.

You might miss an index on some of your columns and loading all the items might make it extremely visible.

Tony · November 8, 2022, 11:50am

One of the errors is the following :

DEBUG- 2022-11-08 13:25:33.612 e[35m15296e[0;39me[2m-e[0;39me[2m[entityloader-11]e[0;39m e[36mo.h.SQL                                 e[0;39m  : select table0_.fieldid as field from table0_ where table0_.fieldid =?
ERROR- 2022-11-08 13:25:33.611 e[35m15296e[0;39me[2m-e[0;39me[2m[ entityloader-8]e[0;39m e[36mo.h.s.e.i.LogErrorHandler               e[0;39m  : HSEARCH000058: HSEARCH000212: An exception occurred while the MassIndexer was transforming identifiers to Lucene Documents

org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
	at org.hibernate.exception.internal.SQLExceptionTypeDelegate.convert(SQLExceptionTypeDelegate.java:48) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:37) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:113) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:99) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:111) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getPhysicalConnection(LogicalConnectionManagedImpl.java:138) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getConnectionForTransactionManagement(LogicalConnectionManagedImpl.java:276) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.begin(LogicalConnectionManagedImpl.java:284) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl$TransactionDriverControlImpl.begin(JdbcResourceLocalTransactionCoordinatorImpl.java:246) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.engine.transaction.internal.TransactionImpl.begin(TransactionImpl.java:83) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.internal.AbstractSharedSessionContract.beginTransaction(AbstractSharedSessionContract.java:503) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.search.batchindexing.impl.IdentifierConsumerDocumentProducer.beginTransaction(IdentifierConsumerDocumentProducer.java:194) ~[hibernate-search-orm-5.11.10.Final.jar:5.11.10.Final]
	at org.hibernate.search.batchindexing.impl.IdentifierConsumerDocumentProducer.loadList(IdentifierConsumerDocumentProducer.java:164) ~[hibernate-search-orm-5.11.10.Final.jar:5.11.10.Final]
	at org.hibernate.search.batchindexing.impl.IdentifierConsumerDocumentProducer.loadAllFromQueue(IdentifierConsumerDocumentProducer.java:140) ~[hibernate-search-orm-5.11.10.Final.jar:5.11.10.Final]
	at org.hibernate.search.batchindexing.impl.IdentifierConsumerDocumentProducer.run(IdentifierConsumerDocumentProducer.java:120) ~[hibernate-search-orm-5.11.10.Final.jar:5.11.10.Final]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
	at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264) ~[?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
	at java.lang.Thread.run(Thread.java:834) ~[?:?]
Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30016ms.
	at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:696) ~[HikariCP-4.0.3.jar:?]
	at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:197) ~[HikariCP-4.0.3.jar:?]
	at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:162) ~[HikariCP-4.0.3.jar:?]
	at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:128) ~[HikariCP-4.0.3.jar:?]
	at org.hibernate.engine.jdbc.connections.internal.DatasourceConnectionProviderImpl.getConnection(DatasourceConnectionProviderImpl.java:122) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:38) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:108) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
	... 16 more

But i can’t understand why it happens !

mbekhta · November 8, 2022, 12:17pm

Most likely the underlying DB connection pool (HikariCP) has fewer connections than the number of threads you are trying to index in parallel. If the loading of one batch takes long - the other threads won’t acquire a connection and will timeout.

I’d suggest checking how many threads are configured in the DB pool and either increasing it or decreasing the number of threads to load the objects.

Tony · November 8, 2022, 12:45pm

Which is better to use ?

indexer.threadsToLoadObjects(4);
indexer.typesToIndexInParallel(2);

“threadsToLoadObjects(8 or bigger) give me the above error”

or

indexer.threadsToLoadObjects(12);
indexer.typesToIndexInParallel(1);

“typesToIndexInParallel(2 or bigger) give me the above error”

yrodiere · November 8, 2022, 12:57pm

Read the documentation I mentioned, make an educated guess based on your model and data, and test each alternative (possibly on a small subset of your data). We don’t know any more than you do how indexing will perform in your case, it depends on too many variables.

That documentation also mentions how to determine the maximum number of threads you can use.

Tony · November 8, 2022, 12:58pm

Thank you both for your time !

Topic		Replies	Views
Index creation (Hibernate Search 5.11) (mass indexer) taking a long time Hibernate Search	4	921	February 14, 2022
Re-indexing on PRODUCTION site Hibernate Search	5	1039	March 1, 2019
Indexing in Hibernate Search 5.5.8.Final Hibernate Search	5	1338	August 23, 2018
Better and faster mass indexing Hibernate Search	1	221	March 26, 2024
Hibernate Search + Elastic Search indexing/search taking too much time Hibernate Search	1	826	March 25, 2021

Faster indexing rebuild

Related topics