Massindexer / purgeAll does not release IndexReaders

ReneM · March 17, 2024, 6:09pm

Hello Team Hibernate!
I am using Hibernate Search and really like how it integrates into my application!

For some Entities, I cannot use and do not need the automatic Indexing. I get new data delivered weekly, thats why i use MassIndexer to completely rebulid the Index on a weekly basis. I use purgeAllOnStart and optimizeAfterPurge to make sure the index is rebuilt completely fresh.

I ran into a problem where the index-folder size increases each time i rebuild, even if there are no additional documents.
It seems to me that, after purging, the to-be-deleted segements do not get deleted from my filesystem. Only when i restart my application the “cleanup” is taking place.

I debugged my application and have seen that when i call purgeAll, Lucene tries to delete old Index segments, but gets an IOException (File is in use)
Only when i shutdown my application, Lucene seems to do a “Cleanup” and is able to delete those files.

I use the default Reader Strategy and see that SharingBufferReaderProvider is used by Hibernate.
In the Code of SharingBufferReaderProvider it looks to me like there is at least one IndexReader open all the time and is shared across the application.
Could it be that those IndexReaders prevent Lucene from deleting the old Segments?

One thing i tried is to set the reader strategy to not-shared:
hibernate.search.reader.strategy = not-shared.

If i set this, my problem is solved. The deleted segments get cleaned up correctly.
However i am not sure how much this will be a problem for my filesystem, as i understand each query would open and close the index with this setting.

Is this a known issue and is there a workaround for this?

I use Hibernate Search 5.11.12-final with Spring Boot.

Thanks in advance!

Rene

yrodiere · March 18, 2024, 8:14am

That’s right, and from what I can see it’s designed that way for performance reasons.

That’s possible, yes. Though it depends on your operating system… Given the error, I would bet you’re on Windows?

Also, I wouldn’t have expected Lucene to even try to delete these segments until the reader gets closed – which should happen eventually, after you write to the index (e.g. after you perform a purge).

That’s right. This will mostly decrease performance, which is why this isn’t recommended, but depending on your use case the resulting performance might be enough.

This is not a known issue, but I’m afraid you’re using a very old version of Hibernate Search, so it’s unclear whether this would still be a problem with actively maintained versions of Hibernate Search – in particular because we’ve upgraded Lucene several times since then. And if it still is a problem, backporting a fix from recent versions to 5.11 would amount to rewriting the fix completely.

To get proper bugfixes I’d recommend upgrading to 6.0, then to 7.1, using the migration guides. The migration to 6.0 in particular will be a big step, the next versions not so much. Then if the problem still arises, you’ll need to open an issue on Jira and we can have a look.

Of course if you find a proper fix in 5.11 we can consider merging it – but it would need to be quite conservative, to avoid breaking other users currently stuck on 5.11.

You already found one workaround with the not-shared reader strategy, though as I mentioned before this will diminish performance.

Perhaps a better workaround would be to perform a query on the relevant index just after mass indexing (to get the reader closed and reopened, which should release the FS locks on the “deleted” segments), and then to trigger index “optimization” manually with optimize().

If that works, it probably means we need adjustments even in Hibernate Search 6/7+: we need to always perform a refresh before an “optimize”/“merge”, especially for mass indexing.

ReneM · March 18, 2024, 11:04am

Hello @yrodiere
Thank you very much for the quick and detailed response!

Actually when looking for a fix i found a lot of posts from you which helped identify the issue and led me here, ha

Anyway, i tried your suggestion of performing a query and optimizing afterwards and it seem to solve my problem! Really makes sense looking at the implementation of SharingBufferReaderProvider. As long as the Segment is considered “current” it will not drop the IndexReader. The check happens when reading from the index.

I will probably tune my MassIndexer and set optimizeOnFinish to false, run a query and then optimize manually.

yrodiere · March 18, 2024, 11:50am

Thanks for the feedback.

I created [HSEARCH-5107] - Hibernate JIRA to address the problem in Hibernate Search 7.2 or later.

Topic		Replies	Views
MassIndexer Changes in Hibernate 6 Hibernate Search	2	914	July 13, 2020
Using Lucene Directory/IndexReader direct w/o DB Session Hibernate Search	2	646	March 19, 2019
HS6: About massIndexer Hibernate Search	7	985	May 4, 2021
In case that purgeAllOnStart is false Hibernate Search	3	501	August 4, 2022
Lucene index corruption Hibernate Search	7	3427	May 2, 2019

Massindexer / purgeAll does not release IndexReaders

Related topics