MassIndexer issues when using @Fetch(FetchMode.SUBSELECT)

Trying to optimize the time it takes for us to build objects using Hibernate ORM. Moved from using a batch size of 10 on our collections to using fetch mode subselect. Worked great and it about halved our time to display results to the search.

However mass indexing is running at about half speed, and for certain batches can take several minutes, where as before it was usually sub second. One of our indexes fails completely at about 10% due to an SSL exception (SSL peer shutdown incorrectly) which looks like some kind of query timeout (either on our or the DB providers end).

After doing some research it looks like this may be related to an existing bug on hibernate ORM ([HHH-11466] limit not used for subselect - Hibernate JIRA). It appears that hibernate ignores LIMIT and other parts of the first query, thus loading more into memory than it needs. Could this explain the slower performance and eventual timeout?

Whether or not it is related to issues within Hibernate ORM, is there anything we can do to avoid these issues when massindexing, as I would really like to keep the performance benefit on our load times?

Hello,

First, if you’re using FetchMode.SUBSELECT, be sure to upgrade to 6.0.1.Final, where we fixed a bug related to that feature: [HSEARCH-4150] Entity loading fails when using FetchMode.SUBSELECT - Hibernate JIRA

Regarding performance… I’m not overly familiar with SUBSELECT, but from what I understand it’s essentially batch fetching on steroids. Instead of loading collections by batches of 10, ORM will load all collections of the same role [EDIT: that are currently held by entities in the session] in the same query. I can imagine scenarios where this is counter productive. To verify that, maybe try to go back to a batch size, and set it to something like 1000? If you get the same behavior, then the problem is probably that too many collections are being loaded at the same time. Reducing the fetch size in the mass indexer might help.

Regarding LIMIT, we don’t use that in the mass indexer unless you call limitIndexedObjectsTo(), which you probably don’t in a production environment.
The way it works is we continuously retrieve IDs using a scroll, and load entities by IDs, 10 at a time (customizable through batchSizeToLoadObjects()). We don’t need to specify a limit since we’re specifically asking for entities with given IDs.

Maybe in order to debug this, you could enable SQL logging in ORM and compare the queries being executed by the mass indexer before and after switching to FetchMode.SUBSELECT? Ideally you could also compare run time of queries on the database side to identify the problematic queries, but that tooling is vendor-specific.

Thanks for the heads up about 4150, can’t upgrade yet as it hasn’t been cleared yet to use internally, but will do ASAP.

With regards to SUBSELECT I thought it was the same as BatchSize, without the limit imposed. So if you run SELECT … FROM parent WHERE id IN (…), when it goes to select the child entities BatchSize will run SELECT … FROM child WHERE id in (?, ?, ? …), several times (selecting 10 at a time), where as subselect will run SELECT … FROM child WHERE id in (SELECT id FROM parent WHERE id IN (…)). I’m just reading up on this as if it loads all records from the DB that makes a lot more sense. We have several child entities, each with child entities of their own, so if SUBSELECT is loading all of these it would explain the massive drop off in performance. In testing on a smaller database it appears that my understanding was correct, which doesn’t help to explain the performance issues.

LIMIT not being an issue is odd, as I assumed the issue was HSEARCH was calling SELECT * FROM parent with some kind of limit, hence the calls to the child being SELECT * FROM child without a proper limit.

So far testing hasn’t showed any issues, however unfortunately this issue takes about an hour to show up on our fully scaled DB, which is in another region so is hard to test locally. I’ll try and recreate the issue with logging enabled and see what I can catch.

Just after I posted this I think I have noticed the issue, we appear to have a circular mapping in on of attributes. I believe we have set a cut off for this, but it means it creates a monster query where we have 10 queries embedded in the select, which is taking an awfully long time to run.

I’ll remove the bi-directional relations to make to avoid the circular mapping and see if that fixes the issue :slight_smile:

Realised couldn’t remove the bi-directional relations as we needed them to update our parent when the child gets modified, however changing the child side from SUBSELECT back to batch mode seems to have fixed the performance, whilst allowing us to keep some of the decrease in loading times.

@yrodiere Thanks again for you quick help on this issue, really appreciate it!

1 Like