We have used the mass indexer for a long time now with no issues. We are using the ElasticSearch implementation. We are in the process of scaling our data, and have started to notice some issues with the mass indexer. Before the holidays we had around 15m records, which would index in around 5 hours. Over the holidays we pulled in an additional 10m records and now the mass indexer causes out of memory exceptions and other issues.
From going through the debugger it appears like it is loading all primary keys (a unique 8-12 character string) into memory. To my understanding this should only be about 250Mb but JConsole is showing it at almost 3Gb of memory used. It takes about 10 minutes to get all of these, with a lot of GC runs in between. After that it starts indexing with the small amount of memory left. If we run it on our cloud environment it fails to index any, and normally just crashes after 10-20 minutes with an OOO exception.
Is there anyway we can get around having to load all of the primary keys into memory, as this will just become more of a problem as we scale further.
The call to the MassIndexer for reference
We have also tried setting all of the parameters to 1, but the same issue occurred.
Hibernate Search: 5.10.4.Final