Is mergeSegmentsOnFinish in HS6 (6.0.1) performing the same work as optimizeOnFinish in HS5? I’m asking because I am seeing a very large performance delta (elapsed time). These tests were performed on the exact same postgresql-10 database using a Lucene backend. Total index rebuild time on HS 5 - 2 hour, 8 mins, 27 secs @~500 docs/sec. Optimize time ~18 mins. On HS 6 - 2 hours 59 minutes, 25 secs @~472 docs/sec (just a little slower than HS5). mergeSegments time ~63 minutes (this is the pain point)
Any thoughts would be appreciated as this is not the fastest thing to iteratively test
Thanks, Keith
These are my settings:
HS5:
spring.jpa.properties.hibernate.search.default.exclusive_index_use = true
spring.jpa.properties.hibernate.search.default.worker.execution=sync
spring.jpa.properties.hibernate.search.default.index_flush_interval=2000
spring.jpa.properties.hibernate.search.default.max_queue_length=2000
int numCores = Runtime.getRuntime().availableProcessors();
int numThreads = Integer.max(numCores, 1);
MassIndexer indexer = fullTextEntityManager.createIndexer(PatentDocDO.class, TpsMainDO.class)
.typesToIndexInParallel(1)
.batchSizeToLoadObjects(200)
.cacheMode(CacheMode.IGNORE)
.threadsToLoadObjects(numThreads)
.idFetchSize(5000)
.progressMonitor(new SimpleIndexingProgressMonitor())
.optimizeOnFinish(true);
HS6:
spring.jpa.properties.hibernate.search.automatic_indexing.synchronization.strategy=write-sync
int numCores = Runtime.getRuntime().availableProcessors();
int numThreads = Integer.max(numCores, 1);
MassIndexer indexer = searchSession.massIndexer(PatentDocDO.class, TpsMainDO.class)
.typesToIndexInParallel(1)
.batchSizeToLoadObjects(200)
.cacheMode(CacheMode.IGNORE)
.threadsToLoadObjects(numThreads)
.idFetchSize(5000)
.mergeSegmentsOnFinish(true);