I’m currently running Hibernate Search 5.10.6.Final. When I run the mass indexer, I frequently get timeout exceptions in the error logs. Furthermore, I’ve seen really high CPU utilization on the server running hibernate on occasions. Could these be related? One question I have is for clarification on the following property:
max_total_connection_per_route <-- I’m not clear on what this is and I’ve also seen this referenced here:
Here’s one of the few references I’ve found on this:
Here’s my current settings (running in AWS along with Elasticsearch running in AWS):
In my QA and PRE environment, I have a cluster of 1 elasticsearch` server. In Production -I have 3 instances of Elasticsearch. Is the default value of 2 adequate for the max_total_connection_per_route? Why is the default 2, and what would be a reasonable value I should set it too (assuming this might be a bottleneck) assuming the search is used a fair amount in my application?
Timeouts may indeed be caused by the low default of 2 for
max_total_connection_per_route, as explained in the answer to the stackoverflow question you linked.
Having high resource utilization on your server while mass indexing is expected: that’s actually one of the reasons we recommend to take your application offline while executing the mass indexer. However, this high CPU utilization is unlikely to be the cause for your timeouts, assuming your application and ES instances live on separate servers. Most of the (potentially) CPU-intensive tasks in Hibernate Search happen before the request timeouts take effect.
max_total_connection_per_route… as mentioned in the documentation, it is the “maximum number of simultaneous connections to a single Elasticsearch server”.
So if you have 3 nodes in your cluster, and all three nodes can be reached by Hibernate Search, the default of 2 means that there will be 2 connections between your application and each Elasticsearch node, for a total of 6. Raising the setting to 10 means there will be 10 connections between your application and each Elasticsearch node, but then
max_total_connection (another setting) will start affecting you and limiting the total to 20 (which node will have fewer connections is undefined).
The default is 2 for historical reasons, and was raised to 10 in Hibernate Search 6 because it was too low. If your application is the only client to your Elasticsearch cluster, you can safely set this to 10.
max_total_connection to the maximum amount of connections you can afford from your application, and set
max_total_connection_per_route to the maximum amount of connections you can afford on each of your Elasticsearch instances.
Note however that, in order for your application to actually connect to all 3 of your nodes, you need to reference them all in the “host” property. So you need to set the
ELASTICSEARCH_URL environment variable to something like “http://es1.mycompany.com:9200 http://es2.mycompany.com:9200 http://es3.mycompany.com:9200” (using spaces as separators).
Excellent - thanks for the explanation and quick response!