Re-indexing on PRODUCTION site

Dimitri · February 26, 2019, 7:48pm

Our mass indexer takes about 24 hours to run, start to finish.
During this time, we still want our existing index to be accessible by our users.
But, the mass indexer seems to purge data, as it re-indexes, resulting in 0 results being returned to our users, during these re-indexing.
How can we avoid this?
A brute-force solution is to create a 2nd entity manager factory that sets hibernate.search.default.indexBase to a different folder than the production indexBase folder, and then swap the folders when the re-indexing is done.

yrodiere · February 27, 2019, 7:43am

Yes, the solution you mentioned in basically the only one right now. It has, however, a notable drawback: you will not be able to take advantage of automatic indexing, since your writes will presumably be performed using the “normal” entity manager, and will therefore be directed to the old copy of the index. So if you go this way, you will have to reindex everything periodically, say every week.

We plan on adding an integrated solution in the next versions (HSEARCH-3499), but we’re not there yet.

Dimitri · February 27, 2019, 10:13pm

If we keep track of what’s been indexed (e.g., in an audit table), we could index those additional entities (to “catch up” to the production index) before swapping the old and index. That should work, albeit with some extra coding.

yrodiere · February 28, 2019, 7:13am

Yes that will work. I’d recommend performing the “catch-up” after the swapping, not before.

I suppose you already know that, but just in case: for this “catch-up”, you can use FullTextSession.index.

Dimitri · February 28, 2019, 9:29pm

Our catch-up may take a few hours to run (or longer, depending on what’s been updated that day), so we can’t run it on production, post-swap.
Management and users would complain if new data appeared one day (realtime indexing), vanished the next day (post-swap with new but slightly-out-of-sync index), and then re-appearing the next day (after “catch-up”).

yrodiere · March 1, 2019, 7:43am

Well in that case you run the risk of losing the data that was indexed in production after you started the catch-up (off production) but before you made the swap. That’s not ideal either, though depending on your use case it may be unlikely.

I suppose you could also run two catch-ups, a large one off-site (catching up ~24 hours) and a small one post-swap (catching up ~2 hours) on production, and then you’d get the best of both worlds. Technically you could use the exact same code, so this shouldn’t be much harder than performing just one catch-up.

Topic		Replies	Views
Faster indexing rebuild Hibernate Search	12	1107	November 8, 2022
MassIndexer parallel run with working indexes Hibernate Search	5	494	November 24, 2023
Hibernate Search 6 + Elastic Search 7+ Integration Test Hibernate Search	3	751	September 24, 2020
HS6: About massIndexer Hibernate Search	7	948	May 4, 2021
Auto Indexing in Hibernate search Hibernate Search	6	2278	July 11, 2018

Re-indexing on PRODUCTION site

Related topics