Hello - I’m currently in the process of refactoring my application to use Hibernate Search 6 Beta8. While working through this - I ran into a change in the documentation regarding the Mass Indexer:
Indexes are purged completely when mass indexing starts.
We currently use this to re-index while the application is live, understanding the results may be out of date until the complete index is complete. I also see the flag to disable the purge - but it mentions dupes?
Removes all entities from the indexes before indexing.
Only set this to false if you know the index is already empty; otherwise,
you will end up with duplicates in the index.
Is there a preferred method of re-indexing (mass indexing) that doesn’t purge the existing index along the same line as the version 5.x version?
The behavior is exactly the same as in Hibernate Search 5. In Hibernate Search 5, the purge was executed before mass indexing by default, and could be disabled by calling .purge(false). In Hibernate Search 5 as well, disabling the purge when there are still documents in the index would lead to duplicates in the index.
EDIT: Actually with the Elasticserach backend, you probably won’t get any duplicate. But that’s more a result of how it’s implemented than an actual intended feature. Also, disabling the purge means the mass indexer won’t remove entities from the index if they’ve been removed from the database.
If you want to reindex without purging the index, you’re probably after zero-downtime reindexing. there is no built-in feature for this at the moment, be it in Hibernate Search 5 or Hibernate Search 6. You can track HSEARCH-3499 if you’re interested in progress on this feature.
Until then, with the Elasitcsearch backend, there is a way to leverage index aliases to direct all read requests to a (static) copy of the index while the new index is being rebuilt, but that requires that you send requests to the Elasticsearch REST API manually. See here for more information. There is no equivalent for the Lucene backend at the moment.