Re-indexing on PRODUCTION site

Our mass indexer takes about 24 hours to run, start to finish.
During this time, we still want our existing index to be accessible by our users.
But, the mass indexer seems to purge data, as it re-indexes, resulting in 0 results being returned to our users, during these re-indexing.
How can we avoid this?
A brute-force solution is to create a 2nd entity manager factory that sets hibernate.search.default.indexBase to a different folder than the production indexBase folder, and then swap the folders when the re-indexing is done.

Yes, the solution you mentioned in basically the only one right now. It has, however, a notable drawback: you will not be able to take advantage of automatic indexing, since your writes will presumably be performed using the “normal” entity manager, and will therefore be directed to the old copy of the index. So if you go this way, you will have to reindex everything periodically, say every week.

We plan on adding an integrated solution in the next versions (HSEARCH-3499), but we’re not there yet.

If we keep track of what’s been indexed (e.g., in an audit table), we could index those additional entities (to “catch up” to the production index) before swapping the old and index. That should work, albeit with some extra coding.

Yes that will work. I’d recommend performing the “catch-up” after the swapping, not before.

I suppose you already know that, but just in case: for this “catch-up”, you can use FullTextSession.index.

Our catch-up may take a few hours to run (or longer, depending on what’s been updated that day), so we can’t run it on production, post-swap.
Management and users would complain if new data appeared one day (realtime indexing), vanished the next day (post-swap with new but slightly-out-of-sync index), and then re-appearing the next day (after “catch-up”).

Well in that case you run the risk of losing the data that was indexed in production after you started the catch-up (off production) but before you made the swap. That’s not ideal either, though depending on your use case it may be unlikely.

I suppose you could also run two catch-ups, a large one off-site (catching up ~24 hours) and a small one post-swap (catching up ~2 hours) on production, and then you’d get the best of both worlds. Technically you could use the exact same code, so this shouldn’t be much harder than performing just one catch-up.