HTTP request timeouts when using mass indexer

We are using hibernate search 6 and elastic search 7. When using the mass indexer to rebuild our full index we are getting timeouts. The code is the same (bar any class or method renaming) as we had in hibernate search 5.

The timeout occurs when we call the mass indexer when there is already existing data in the index. As expected the data starts to be deleted, but as we have around 3m records this takes around 10 minutes or so. During this time the request times out, the delete finishes, and there is no data in the index. From there we can call it again to re-index, which works as intended.

It looks like this timeout can be adjusted via the “hibernate.search.backend.read_timeout”, which was lowered from 60 to 30 seconds with hibernate search 6. However increasing it back to 60 seconds still leads to timeouts. It appears that this request (_delete_by_query) previously had a much longer or no timeout, as we had no issues with this before. Is there a change we can make to return to previous behavior, apart from setting the read_timeout to a much higher value (say 30 minutes or so)? If not would there be any risk in increasing said timeout far beyond the default value? Thanks in advance for any advice you can give.

I couldn’t say why this used to work in Hibernate Search 5. Maybe the automatic retry code led to the same delete-by-query being sent three times, and the last one succeeded because by then the first request was successfully handled and the index was empty? But this retry mechanism was supposedly deprecated in the REST client (which Hibernate Search depends on), and is no longer used.

Having requests fail when they exceed the read timeout is definitely the correct behavior; we don’t want to change that. I don’t know of any way to work around this timeout.

Maybe there should be a separate setting for read timeouts on (very) long-running requests such as this one? But then separating “legitimate” long-running operations from others will be a complex task…

Really, the better option for you here would be to use dropAndCreateSchemaOnStart(true); it will be much faster than a delete-by-query. Can you give it a try?

Just tested it out using dropAndCreateSchema and it worked exactly as intended. Thanks for the fix. We might have to increase the timeouts back to 60 seconds as we are sometimes getting timed out by the call to flush the “write” index, which was leading to some strange behavior.

Thanks again for your help, very appreciated :slight_smile: