Upgrading to Hibernate Search 6 might help, though we didn’t work much on improving performance. The Elasticsearch integration should be better, though. See here for a migration guide: Hibernate Search 6.0.10.Final: Migration Guide from 5.11
Apart from the recommendations above, I’d say you probably should use a larger batch size if you can. If you have other, more specific questions, I can try to answer that.
Please also note that in the code example you’ve shared with us, you are creating a new indexer to set each option and then calling a startAndWait() on yet another new indexer. That leads to none of the settings being actually applied when the process starts. Instead, you can create an indexer once:
Not really, that depends on the size of your object graphs and what your machine is able to handle. Try 50, 100, 200? Higher numbers can be better, but obviously you can’t load the whole database all at once in memory, so going too high will not be good either.
ERROR- 2022-11-08 12:40:52.120 21732-[ntifierloader-1] o.h.s.e.i.LogErrorHandler : HSEARCH000058: HSEARCH000211: An exception occurred while the MassIndexer was fetching the primary identifiers list
If I were you, I would also have a look at enabling the slow query log for your database (for instance, for PostgreSQL, you have log_min_duration_statement) to check if nothing pops up.
You might miss an index on some of your columns and loading all the items might make it extremely visible.
DEBUG- 2022-11-08 13:25:33.612 e[35m15296e[0;39me[2m-e[0;39me[2m[entityloader-11]e[0;39m e[36mo.h.SQL e[0;39m : select table0_.fieldid as field from table0_ where table0_.fieldid =?
ERROR- 2022-11-08 13:25:33.611 e[35m15296e[0;39me[2m-e[0;39me[2m[ entityloader-8]e[0;39m e[36mo.h.s.e.i.LogErrorHandler e[0;39m : HSEARCH000058: HSEARCH000212: An exception occurred while the MassIndexer was transforming identifiers to Lucene Documents
org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
at org.hibernate.exception.internal.SQLExceptionTypeDelegate.convert(SQLExceptionTypeDelegate.java:48) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:37) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:113) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:99) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:111) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getPhysicalConnection(LogicalConnectionManagedImpl.java:138) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getConnectionForTransactionManagement(LogicalConnectionManagedImpl.java:276) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.begin(LogicalConnectionManagedImpl.java:284) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl$TransactionDriverControlImpl.begin(JdbcResourceLocalTransactionCoordinatorImpl.java:246) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
at org.hibernate.engine.transaction.internal.TransactionImpl.begin(TransactionImpl.java:83) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
at org.hibernate.internal.AbstractSharedSessionContract.beginTransaction(AbstractSharedSessionContract.java:503) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
at org.hibernate.search.batchindexing.impl.IdentifierConsumerDocumentProducer.beginTransaction(IdentifierConsumerDocumentProducer.java:194) ~[hibernate-search-orm-5.11.10.Final.jar:5.11.10.Final]
at org.hibernate.search.batchindexing.impl.IdentifierConsumerDocumentProducer.loadList(IdentifierConsumerDocumentProducer.java:164) ~[hibernate-search-orm-5.11.10.Final.jar:5.11.10.Final]
at org.hibernate.search.batchindexing.impl.IdentifierConsumerDocumentProducer.loadAllFromQueue(IdentifierConsumerDocumentProducer.java:140) ~[hibernate-search-orm-5.11.10.Final.jar:5.11.10.Final]
at org.hibernate.search.batchindexing.impl.IdentifierConsumerDocumentProducer.run(IdentifierConsumerDocumentProducer.java:120) ~[hibernate-search-orm-5.11.10.Final.jar:5.11.10.Final]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:834) ~[?:?]
Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30016ms.
at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:696) ~[HikariCP-4.0.3.jar:?]
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:197) ~[HikariCP-4.0.3.jar:?]
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:162) ~[HikariCP-4.0.3.jar:?]
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:128) ~[HikariCP-4.0.3.jar:?]
at org.hibernate.engine.jdbc.connections.internal.DatasourceConnectionProviderImpl.getConnection(DatasourceConnectionProviderImpl.java:122) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
at org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:38) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:108) ~[hibernate-core-5.6.11.Final.jar:5.6.11.Final]
... 16 more
Most likely the underlying DB connection pool (HikariCP) has fewer connections than the number of threads you are trying to index in parallel. If the loading of one batch takes long - the other threads won’t acquire a connection and will timeout.
I’d suggest checking how many threads are configured in the DB pool and either increasing it or decreasing the number of threads to load the objects.
Read the documentation I mentioned, make an educated guess based on your model and data, and test each alternative (possibly on a small subset of your data). We don’t know any more than you do how indexing will perform in your case, it depends on too many variables.
That documentation also mentions how to determine the maximum number of threads you can use.