Lucene index corruption

Hi.
We are using Hibernate Search in our web app (with connection pooling), as well as separate scripts that run as crons.

We write to the index 3 different ways:

  1. realtime, in the wep app (with connection pool), when a user creates or edits an entity, by manually calling
    org.hibernate.search.jpa.Search.getFullTextEntityManager(entityManager).index(fooEntity);

  2. via cron, every 5 minutes, for entities awaiting indexing in our SEARCH_QUEUE_INDEX table;

  3. via cron, every 12 hours, where we rebuild the entire index from scratch, to ensure nothing fell through the cracks.

To ensure 2 processes don’t write to the index at once, we use a db table called SEARCH_QUEUE_INDEX, to manage our write locking, to ensure only one process is writing the index at once.

But, it seems that this isn’t working right. Our index is becoming corrupted.

We are seeing 2 types of errors in our logs:

  1. Unable to reopen IndexReader

  2. HSEARCH000058: HSEARCH000117: IOException on the IndexWriter

Here’s the stack traces for each:
1)

org.hibernate.search.exception.SearchException: Unable to reopen IndexReader
	at org.hibernate.search.indexes.impl.SharingBufferReaderProvider$PerDirectoryLatestReader.refreshAndGet(SharingBufferReaderProvider.java:243)
	at org.hibernate.search.indexes.impl.SharingBufferReaderProvider.openIndexReader(SharingBufferReaderProvider.java:74)
	at org.hibernate.search.indexes.impl.SharingBufferReaderProvider.openIndexReader(SharingBufferReaderProvider.java:36)
	at org.hibernate.search.reader.impl.ManagedMultiReader.createInstance(ManagedMultiReader.java:70)
	at org.hibernate.search.reader.impl.MultiReaderFactory.openReader(MultiReaderFactory.java:49)
	at org.hibernate.search.query.engine.impl.LuceneHSQuery.buildSearcher(LuceneHSQuery.java:482)
	at org.hibernate.search.query.engine.impl.LuceneHSQuery.queryResultSize(LuceneHSQuery.java:222)
	at org.hibernate.search.query.hibernate.impl.FullTextQueryImpl.doGetResultSize(FullTextQueryImpl.java:272)
	at org.hibernate.search.query.hibernate.impl.FullTextQueryImpl.getResultSize(FullTextQueryImpl.java:263)
	at foo.Search.executeSearch(Search.java:248)
Caused by: org.apache.lucene.index.CorruptIndexException: file mismatch, expected id=3vrtzq0x94ltu15kzym458w8p, got=1rows74yewkm4pwz102cqzr70 (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/path/to/lucene_index/FooEntity/_3j7.si")))
	at org.apache.lucene.codecs.CodecUtil.checkIndexHeaderID(CodecUtil.java:266)
	at org.apache.lucene.codecs.CodecUtil.checkIndexHeader(CodecUtil.java:256)
	at org.apache.lucene.codecs.lucene50.Lucene50SegmentInfoFormat.read(Lucene50SegmentInfoFormat.java:86)
	at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:362)
	at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:493)
	at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:490)
	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:731)
	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683)
	at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:490)
	at org.apache.lucene.index.StandardDirectoryReader.isCurrent(StandardDirectoryReader.java:344)
	at org.apache.lucene.index.StandardDirectoryReader.doOpenNoWriter(StandardDirectoryReader.java:300)
	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:263)
	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:251)
	at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:137)
	at org.hibernate.search.indexes.impl.SharingBufferReaderProvider$PerDirectoryLatestReader.refreshAndGet(SharingBufferReaderProvider.java:240)
	... 61 more
	Suppressed: org.apache.lucene.index.CorruptIndexException: checksum passed (9573fec2). possibly transient resource issue, or a Lucene or JVM bug (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/path/to/lucene_index/FooEntity/_3j7.si")))
		at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:379)
		at org.apache.lucene.codecs.lucene50.Lucene50SegmentInfoFormat.read(Lucene50SegmentInfoFormat.java:117)
		... 73 more
org.hibernate.search.exception.impl.LogErrorHandler - HSEARCH000058: HSEARCH000117: IOException on the IndexWriter
java.nio.file.NoSuchFileException: /path/to/lucene_index/FooEntity/_1s6.cfe
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
	at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
	at java.nio.channels.FileChannel.open(FileChannel.java:287)
	at java.nio.channels.FileChannel.open(FileChannel.java:335)
	at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:237)
	at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:109)
	at org.apache.lucene.codecs.lucene50.Lucene50CompoundReader.readEntries(Lucene50CompoundReader.java:105)
	at org.apache.lucene.codecs.lucene50.Lucene50CompoundReader.<init>(Lucene50CompoundReader.java:69)
	at org.apache.lucene.codecs.lucene50.Lucene50CompoundFormat.getCompoundReader(Lucene50CompoundFormat.java:71)
	at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:93)
	at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:65)
	at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145)
	at org.apache.lucene.index.BufferedUpdatesStream$SegmentState.<init>(BufferedUpdatesStream.java:390)
	at org.apache.lucene.index.BufferedUpdatesStream.openSegmentStates(BufferedUpdatesStream.java:422)
	at org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:267)
	at org.apache.lucene.index.IndexWriter.applyAllDeletesAndUpdates(IndexWriter.java:3172)
	at org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:3158)
	at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2814)
	at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2970)
	at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2935)
	at org.hibernate.search.backend.impl.lucene.IndexWriterHolder.commitIndexWriter(IndexWriterHolder.java:150)
	at org.hibernate.search.backend.impl.lucene.IndexWriterHolder.commitIndexWriter(IndexWriterHolder.java:163)
	at org.hibernate.search.backend.impl.lucene.PerChangeSetCommitPolicy.onChangeSetApplied(PerChangeSetCommitPolicy.java:29)
	at org.hibernate.search.backend.impl.lucene.AbstractWorkspaceImpl.afterTransactionApplied(AbstractWorkspaceImpl.java:98)
	at org.hibernate.search.backend.impl.lucene.LuceneBackendQueueTask.applyUpdates(LuceneBackendQueueTask.java:108)
	at org.hibernate.search.backend.impl.lucene.LuceneBackendQueueTask.run(LuceneBackendQueueTask.java:47)
	at org.hibernate.search.backend.impl.lucene.SyncWorkProcessor$Consumer.applyChangesets(SyncWorkProcessor.java:167)
	at org.hibernate.search.backend.impl.lucene.SyncWorkProcessor$Consumer.run(SyncWorkProcessor.java:153)
	at java.lang.Thread.run(Thread.java:745)

The relevant config, shared by the above 3 indexing methods is:

hibernate.search.indexing_strategy = manual
hibernate.search.default.locking_strategy = none
hibernate.search.exclusive_index_use = true

Any ideas what could be going on here?

In the index getting corrupted because 2 processes are accidentally writing to the same index (and the subsequent read errors are on account of that corruption)?

Would setting hibernate.search.exclusive_index_use = false fix this problem?

It could, but it would also likely decrease performance, maybe to a point it’s unusable.

Why are you writing to the same index through different processes (I assume different instances of Hibernate Search)? This whole setup could work fine with a single Hibernate Search instance.
If it’s a problem of connection pool, connection pools generally have options to configure a minimum and maximum number of connections, so you would not have to use many connections all the time, just when mass indexing.

Hi @yrodiere:
Sorry for the late reply. Since writing this post, I squashed the bug that was allowing 2 processes to accidentally write to the unlocked index at the same time. No more corruptions in the last few days.

Still… my db table solution, for handling locking, isn’t very satisfying, as there’s always a slight possibility of an index corruption from a misbehaving process.

Performance issues aside, would hibernate.search.exclusive_index_use = false guarantee no index corruptions, if 150 processes (each with their own unique instance of Hibernate Search) are simultaneously writing to the same entity class in the same index folder on the filesystem?

By the way, I don’t want to use a db table as my locking strategy. Ideally, I would like to eliminate my 3 different index writing strategies, and just let Hibernate Search handle it.

What settings and configuration do you suggest, if my single-instance JPA/Hibernate web app has these requirements:

  1. dedicated db connection pool, for hundreds of SIMULTANEOUS read and write operations on the same hibernate lucene index folders.
  2. near-realtime indexing of new entities and IndexedEmbedded/ContainedIn associations.
  3. near-realtime indexing of updated entities and IndexedEmbedded/ContainedIn associations.
  4. users’ write operations mustn’t corrupt the index or cause incomplete search results.
  5. users’ write operations mustn’t cause other users’ write operations to fail or not get indexed fully.
  6. users’ read or write operation mustn’t lock out another users’ simultaneous read or write operations.
  7. all write operations must be immediately accessible by other users that are querying that entity index without restarting the web app or destroying the connections in the db pool.

Hi @Dimitri,

First, let’s get this straight: these requirements are not what causes problems in your application; the additional crons that use separate instances of Hibernate Search are. I think you know that, but I prefer to be clear.

For a single-instance application, Hibernate Search should already address all requirements 2 to 7 by default. I’m not sure I understand the first requirement about having a dedicated db connection pool:

  • for on-the-fly indexing, Hibernate Search does not create extra connection: it re-uses connections from the Hibernate ORM session you created yourself.
  • for your “crons”, you should have control over how many connections you use and should be able to size your pool accordingly: if your webapp accepts 20 requests in parallel, and your crons require 10 connections, just create a pool that allows up to 30 connections, but scales down to 20 or less connections when idle. This would achieve the same (or better) than having two separate connection pools.

As stated above:

Why are you writing to the same index through different processes (I assume different instances of Hibernate Search)? This whole setup could work fine with a single Hibernate Search instance.
If it’s a problem of connection pool, connection pools generally have options to configure a minimum and maximum number of connections, so you would not have to use many connections all the time, just when mass indexing.

If this does not make sense in your case, please explain why. Maybe then I’ll understand and will be able to help.

Ok, I think I’m getting a clearer picture here.

If I had ONE standalone web application, Hibernate Search would support concurrent reading/writing to the lucene index, in near-realtime, without corrupting the index.
I’d simply call entityManager.commit() when I’m done with my creation/edit operations, and the Hibernate Search indexer would quietly index the entity in the background, in near-realtime.

So the only thing I’d need to worry about is the daily rebuilding of the index in a separate cron, so that DB edits (made outside the web app) will get added to the index. For THAT to work, I would need to

  1. run the indexer in a SEPARATE temp folder on the server
  2. briefly shut down the web app (or temporarily block all users from reading/writing to the hibernate search index)
  3. rename the temp index so it’s now the production index
  4. restart the app (or re-allow users to read/write).

Is that correct?

Exactly.

Just a snag: if you don’t restart the app, and simply stop writing to the index, then resume afterwards, you will need to make sure that Hibernate Search correctly flushes everything to the index before you start reindexing. That was the point of the flushAndReleaseResources call I mentioned in my answer to your other post.