Hello, I’m new to this mailing list so I hope that I’m sending my question to the right place.
We recently experienced issues with Lucene Mass indexation process with data stored in MariaDB.
The connection is closed after few seconds and the indexation is partially processed.
I’ve got a piece of code that is re-indexing a table containing contacts.
This code was running fine until we execute it on a table containing more than 2 millions of contacts.
In that configuration the process does not run entirely and stop due to the following exception.
11/15 16:12:32 ERROR rg.hibernate.search.exception.impl.LogErrorHandler - HSEARCH000058: HSEARCH000211: An exception occurred while the MassIndexer was fetching the primary identifiers list
org.hibernate.exception.JDBCConnectionException: could not advance using next()
at org.hibernate.exception.internal.SQLExceptionTypeDelegate.convert(SQLExceptionTypeDelegate.java:48)
at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:42)
at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:111)
at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:97)
at org.hibernate.internal.ScrollableResultsImpl.convert(ScrollableResultsImpl.java:69)
at org.hibernate.internal.ScrollableResultsImpl.next(ScrollableResultsImpl.java:104)
at org.hibernate.search.batchindexing.impl.IdentifierProducer.loadAllIdentifiers(IdentifierProducer.java:148)
at org.hibernate.search.batchindexing.impl.IdentifierProducer.inTransactionWrapper(IdentifierProducer.java:109)
at org.hibernate.search.batchindexing.impl.IdentifierProducer.run(IdentifierProducer.java:85)
at org.hibernate.search.batchindexing.impl.OptionallyWrapInJTATransaction.runWithErrorHandler(OptionallyWrapInJTATransaction.java:69)
at org.hibernate.search.batchindexing.impl.ErrorHandledRunnable.run(ErrorHandledRunnable.java:32)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.sql.SQLNonTransientConnectionException: (conn=18) Server has closed the connection. If result set contain huge amount of data, Server expects client to read off the result set relatively fast. In this case, please consider increasing net_wait_timeout session variable / processing your result set faster (check Streaming result sets documentation for more information)
at org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.get(ExceptionMapper.java:234)
at org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.getException(ExceptionMapper.java:165)
at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.handleIoException(SelectResultSet.java:381)
at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.next(SelectResultSet.java:650)
at org.apache.commons.dbcp2.DelegatingResultSet.next(DelegatingResultSet.java:1160)
at org.apache.commons.dbcp2.DelegatingResultSet.next(DelegatingResultSet.java:1160)
at org.hibernate.internal.ScrollableResultsImpl.next(ScrollableResultsImpl.java:99)
... 10 more
Here is the piece of code :
private void index(EntityManager em, LongConsumer callBack) {
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(em);
MassIndexer indexer = fullTextEntityManager.createIndexer(Contact.class);
indexer.batchSizeToLoadObjects(BATCH_SIZE);
indexer.threadsToLoadObjects(NB_THREADS);
indexer.progressMonitor(new IndexerProgressMonitor(callBack));
indexer.start();
}
To make the process run until the end and after several unsuccessful tries, the only thing that seems to work is to edit mariadb config file and set : net_write_timeout = 7200
By default the time out is 60 sec. Here we put it to 1 hour… as the process took 45 minutes…
Does someone have an idea of what we did wrong ? It does not seems reasonable to set a such huge value to ask a full re-index on the table.
Is there a way to ask MassIndexer to process by limited chunk of data or something else to avoid the connection to stay open for a too long time ?
Thanks for your time.