The error message does not exactly highlight the underlying cause of the failure (connection timeout, read timeout etc.), getting this sometimes when using with AWS RDS. What are the probable reasons for this error?
Please give the version of Hibernate Search you’re using, and copy/paste your logs here. Make sure to format logs properly, with a line containing only three backticks this before and after the logs:
``` logs go here ```
We are using 6.0.0.Beta7
Unable to perform afterTransactionCompletion callback: HSEARCH800024: Automatic indexing failed after transaction completion: HSEARCH800022: Indexing failure: HSEARCH400588: The operation failed due to the failure of the call to the bulk REST API.. The following entities may not have been updated correctly in the index:
I meant the full log, not the exception message. In Beta6 and above, you should get the exception thrown by the bulk work somewhere in your stack trace.
Ah right, I got it in the extended stacktrace:
org.hibernate.search.util.common.SearchTimeoutException: HSEARCH400590: Request exceeded the timeout of 60s, 0ms and 0ns
Well then here is your answer. The indexing request couldn’t be handled in less than 60s, so Hibernate Search gave up.
As to why the request couldn’t be handled in less than 60s… I don’t know.
Maybe your Elasticsearch cluster is too slow:
- the servers are undersized (not enough CPU, disks too slow, …).
- or your network connection has low bandwidth.
- or you’re just indexing too much at a time.
If the reason is one of the above, you can tune the indexing queues as explained in the documentation. Lowering
queue_count in particular may help, as you’ll send fewer bulks in parallel for each index. You can start with 1, and increase as needed to get more performance until you hit timeouts again.
Maybe your Elasticsearch cluster is just fine, but you don’t have enough connections to process all the indexing works in parallel.
If that’s the case, and you have more than one Elasticsearch node, I’d recommend letting Hibernate Search know of the other nodes. You can just configure multiple hosts or, if there are too many, enable automatic discovery.
@yrodiere do you recon we will always have to enable auto discovery when using AWS elasticsearch (with multiple nodes)?
Also, the IOPS consumed seem low enough, so maybe I will reduce the max_bulk_size and evaluate.
If you want your requests to be routed to all nodes, I imagine you will? Not sure, though. Maybe AWS the URL assigned to the service points to each node, one after the other, in some sort of round-robin, or uses some other kind of load-balancing strategy, in which case auto discovery is pointless. I guess you’ll have to refer to the documentation of AWS Elasticsearch Service, or just try and see if automatic discovery changes anything.
If the IOPS is low, then the problem might be analysis? Analysis of very large documents may be slow. In this case, then yes, lowering the bulk size should help. Depending on how the bulks are handled by Elasticsearch, lowering the queue count should help too.
Note all these changes will not really improve performance, they will just shift the queueing of indexing works from Elasticsearch to your application. But that could at least avoid the timeouts.
Still, reaching the 60s timeout is definitely unexpected, especially if you sized your cluster appropriately and used nodes with decent CPU/RAM/storage.
Please keep me updated on the results, I’m definitely interested.
Yes, I also imagined that the ES domain URL which AWS provides takes care of request handling for all the nodes under it.
This issue was happening quite rarely before. Then we made a change to an entity which resulted in a nested storage for two of its fields. This drastically reduced the mass indexing time for it. And the timeouts are mostly happening for this entity (timeouts are still rare, but still would like to avoid them).
Will update with the findings on resolution.