HTTP request timeouts when using mass indexer

buncieboy · February 15, 2022, 1:13am

We are using hibernate search 6 and elastic search 7. When using the mass indexer to rebuild our full index we are getting timeouts. The code is the same (bar any class or method renaming) as we had in hibernate search 5.

The timeout occurs when we call the mass indexer when there is already existing data in the index. As expected the data starts to be deleted, but as we have around 3m records this takes around 10 minutes or so. During this time the request times out, the delete finishes, and there is no data in the index. From there we can call it again to re-index, which works as intended.

It looks like this timeout can be adjusted via the “hibernate.search.backend.read_timeout”, which was lowered from 60 to 30 seconds with hibernate search 6. However increasing it back to 60 seconds still leads to timeouts. It appears that this request (_delete_by_query) previously had a much longer or no timeout, as we had no issues with this before. Is there a change we can make to return to previous behavior, apart from setting the read_timeout to a much higher value (say 30 minutes or so)? If not would there be any risk in increasing said timeout far beyond the default value? Thanks in advance for any advice you can give.

yrodiere · February 16, 2022, 8:00am

I couldn’t say why this used to work in Hibernate Search 5. Maybe the automatic retry code led to the same delete-by-query being sent three times, and the last one succeeded because by then the first request was successfully handled and the index was empty? But this retry mechanism was supposedly deprecated in the REST client (which Hibernate Search depends on), and is no longer used.

Having requests fail when they exceed the read timeout is definitely the correct behavior; we don’t want to change that. I don’t know of any way to work around this timeout.

Maybe there should be a separate setting for read timeouts on (very) long-running requests such as this one? But then separating “legitimate” long-running operations from others will be a complex task…

Really, the better option for you here would be to use dropAndCreateSchemaOnStart(true); it will be much faster than a delete-by-query. Can you give it a try?

buncieboy · February 16, 2022, 8:57am

Just tested it out using dropAndCreateSchema and it worked exactly as intended. Thanks for the fix. We might have to increase the timeouts back to 60 seconds as we are sometimes getting timed out by the call to flush the “write” index, which was leading to some strange behavior.

Thanks again for your help, very appreciated

Topic		Replies	Views
MassIndexer SocketTimeOut on purge Index with 25'000'000 entites Hibernate Search	1	639	May 30, 2023
The operation failed due to the failure of the call to the bulk REST API Hibernate Search	9	2100	July 21, 2022
HSEARCH400590: Request exceeded the timeout of 60s, 0ms and 0ns: - Error Hibernate Search	5	1760	June 30, 2020
Request timeout during index validation Hibernate Search	26	1181	July 27, 2022
READ_TIMEOUT not working when using Environment Variable Hibernate Search	2	197	September 16, 2024

HTTP request timeouts when using mass indexer

Related topics