Re-indexing and overwriting index folder causes errors

Our site’s hibernate search index folder is named “production”.

Periodically, we want to re-index our site’s search index (to pick up changes made by external processes) in a temporary “staging” folder.

When done, we back up our old production folder, and rename “staging” to “production”.

Unfortunately, our live web app won’t accept the new “production” folder.

It complains with the following:
java.lang.IllegalStateException: same segment _1 has invalid changes; likely you are re-opening a reader after illegally removing index files yourself and building a new index in their place. Use IndexWriter.deleteAll or OpenMode.CREATE instead
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:194)
at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:326)
at org.apache.lucene.index.StandardDirectoryReader$2.doBody(StandardDirectoryReader.java:320)
at org.apache.lucene.index.StandardDirectoryReader$2.doBody(StandardDirectoryReader.java:316)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:731)
at org.apache.lucene.index.StandardDirectoryReader.doOpenFromCommit(StandardDirectoryReader.java:316)
at org.apache.lucene.index.StandardDirectoryReader.doOpenNoWriter(StandardDirectoryReader.java:312)
at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:263)
at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:251)
at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:137)
at org.hibernate.search.indexes.impl.SharingBufferReaderProvider$PerDirectoryLatestReader.refreshAndGet(SharingBufferReaderProvider.java:240)
at org.hibernate.search.indexes.impl.SharingBufferReaderProvider.openIndexReader(SharingBufferReaderProvider.java:74)
at org.hibernate.search.indexes.impl.SharingBufferReaderProvider.openIndexReader(SharingBufferReaderProvider.java:36)
at org.hibernate.search.reader.impl.ManagedMultiReader.createInstance(ManagedMultiReader.java:70)
at org.hibernate.search.reader.impl.MultiReaderFactory.openReader(MultiReaderFactory.java:49)
at org.hibernate.search.query.engine.impl.LuceneHSQuery.buildSearcher(LuceneHSQuery.java:482)
at org.hibernate.search.query.engine.impl.LuceneHSQuery.queryResultSize(LuceneHSQuery.java:222)
at org.hibernate.search.query.hibernate.impl.FullTextQueryImpl.doGetResultSize(FullTextQueryImpl.java:272)
at org.hibernate.search.query.hibernate.impl.FullTextQueryImpl.getResultSize(FullTextQueryImpl.java:263)

We’ve tried setting hibernate.search.default.exclusive_index_use to false, and playing with various locking strategies, but nothing seems to work.

The only way we can avoid this error seems to be:

  1. Completely disable locking altogether (using hibernate.search.default.locking_strategy=none). That’s not an option, as we may have multiple users making updates simultaneously, and don’t want user A overwriting user B’s edits.
  2. Destroy our connection pool, and create a new one. That’s not an option, as we can’t know that this is needed, until AFTER the error happens. Plus, the existing pool may have running transactions and we can’t risk killing it as that would result in critical data loss.
  3. Restart the application. This is not an option, because we can’t be randomly blowing out users’ sessions, every few hours, and making our site inaccessible for minutes at a time.

Surely there must be a solution to this problem. Any ideas?

The easiest solution would really be to restart the application at some hour of the day where you know the application is not in use. But admittedly not every developer is lucky enough to have such a time window every day.

If you do not use automatic indexing (i.e. your indexes are only ever updated by your periodic, off-server reindexing process, but not directly by your application on entity changes) , you might twist the filesystem-slave directory provider to periodically copy the content of your “staging” folder into the “production” folder: it will take care of all the low-level Lucene operations that are necessary.
See https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#search-configuration-directory for more information about directory providers in general, and filesystem-slave in particular.

If you do use automatic indexing, I doubt the solution above will work, because it’s been designed for read-only indexes. You can try it, though. The most likely outcome is that writes from your application will be ignored after the content of “staging” is copied to “production” for the first time. If that happens, you can try setting hibernate.search.exclusive_index_use to false. This will likely lead to terrible performance, because the index writer will have to be reopened for each single transaction, but at least it should work…

If all of the above fails, or performance is not satisfying, then you will have to ensure indexes are not being used while you perform the rather brutal swapping of directories.

This implies a short period of time during which no search queries nor automatic indexing can be performed. Basically a lock-down of your application. However, if you only need to rename folders, the lock-down should be very short.

Implementing the lock-down will be on you: Hibernate Search does not support that. Essentially you will need to make HTTP requests either fail or wait if the lock-down is being enforced. Depending on your framework, there should be a number of ways to do that. You might want to exclude technical administration pages from the lock down, just in case…

As to the process, you will have to do this:

  1. Start enforcing the lock-down, preventing further HTTP requests from being processed.
  2. Wait for ongoing HTTP requests to be processed. After that, there should be no read lock on the indexes anymore (at least not with default settings).
  3. Release the write locks on the indexes:
    EntityManagerFactory entityManagerFactory = ...; // this can be either injected with @PersistenceUnit, or retrieved from an entity manager using .getEntityManagerFactory
    SearchIntegrator searchIntegrator = SearchIntegratorHelper.extractFromEntityManagerFactory( entityManagerFactory );
    for ( EntityIndexBinding value : searchIntegrator.getIndexBindings().values() ) {
        for ( IndexManager indexManager : value.getIndexManagerSelector().all() ) {
            // Flush pending index changes and release lock
            indexManager.flushAndReleaseResources();
        }
    }
    
    Be aware that this code uses SPIs, meaning you might experience incompatible changes in minor releases of Hibernate Search. If you’re ready to update your code when upgrading Hibernate Search, that should not be a problem.
  4. Replace the old index directories with the new ones.
  5. Stop enforcing the lock-down, allowing HTTP requests to be processed by your application. Hibernate Search will automatically lock the indexes again.

Thanks for the options. We do use automatic indexing, so most of your options aren’t feasible. The last seems like the most doable.

But, on a hunch, I explored a different approach. I wondered if I could simply migrate the write.lock files, during the renaming of “staging” to “production”.

So, I wrote this method and call it just before renaming from “staging” to “production”:

	private void migrateWriteLocksIfAny(File fromIndex, File toIndex) throws Exception {
		for (File file : fromIndex.listFiles()) {
			if (file.isDirectory()) {
				for (File entityContentsFile : file.listFiles()) {
					if (entityContentsFile.getName().equals("write.lock")) {
						String newFilePath = entityContentsFile.getAbsolutePath().replace(fromIndex.getAbsolutePath(), toIndex.getAbsolutePath());
						
						entityContentsFile.renameTo(new File(newFilePath));
					}
				}
			}
		}
	}

It seems to work. I don’t see any data losses, and the live site has stopped rejecting the new index.

What do you think? Am I inviting trouble?

I think you are. I’m just guessing, but here is what I would worry about:

  1. Lock files are there for a reason, and if you change the index files while Lucene is writing to the index, at best you’ll lose that data (might not matter if you do some “catch-up” reindexing afterwards), at worst you might introduce inconsistent data in the index (for example a duplicate document, because the document was marked for deletion in the old files and added to the new ones).
  2. Unless you set hibernate.search.exclusive_index_use to false (which is not great for performance), any further write to the index may still be directed to the old index files. That was the point of the loop calling flushAndReleaseResources: making sure the index writers will be re-opened later and will use file descriptors pointing to the new index.

But at that point we’re reaching very low-level parts of the Lucene integration. @Sanne might be of more help, if he’s available.

Calling flushAndReleaseResources, I suspect, may not work if the site has long-running database transactions (sometimes, we have to run administrative tasks that use db transactions that stay open from 5 minutes or longer).

In such scenarios, a site reboot would result in those database transactions failing (i.e., actual business data loss). And, pausing new connections, while these long-running transactions slowly finish, would result in a lot of frayed nerves by our users.

As crappy as it is, I think copying over the write.lock files may be the least-worst solution, in our highly-unusual situation. We don’t mind if the search index goes out of sync for a few hours, since that will always self-correct, when the next mass indexer cycle is completed in a few hours.