recently our Hibernate version was updated to 6.2.7.Final. Parameters like hibernate.query.plan_cache_enabled, hibernate.query.plan_cache_max_size, hibernate.query.plan_parameter_metadata_max_size were not specified, so I assume default ones were in effect.
We observed higher memory usage leading to OOMs in production environment. Heap dump analysis indicated a lot of objects associated with QueryInterpretationCache weren’t garbage collected (screenshot #1).
E. g. you can see almost 400k LIRSHashEntry objects consuming estimated 2G of heap. There’s also similar number of AliasToBeanResultTransformer and SelectInterpretationsKey.
Further analysis showed that number of top-level (contained in table fields of Segment class) LIRSHashEntry was indeed below default bound (2048). However many of them consisted a head of linked list which contained thousands of entries via next field (screenshot #2).
All of linked entries had the same hash and contained exactly the same query. Their keys were SelectInterpretationsKey instances. Their state field was set to HIR_NONRESIDENT, which I believe is an indication that entry was evicted from BoundedConcurrentHashMap. Most of 400k entries have similar state (screenshot #3).
I couldn’t reproduce this behaviour in local environment, so maybe it’s linked with higher volume and variety of queries in production app.
I’d be grateful for any suggestion on what’s happening here and how we can make the cache respect the entries limit.
memory dump shows BoundedConcurrentHashMap using 57% of the memory
seemingly endless nested LIRSHashEntry objects for the same native SQL query
state field was set to HIR_NONRESIDENT
We don’t have a max size set but the number of LIRSHashEntry objects of 80k exceeds the default of 2048.
The hash code is identical between the cache entries. That is true for the hash on the key (org.hibernate.query.sql.spi.SelectInterpretationsKey) as well as the hash on the cached item itself (org.hibernate.internal.util.collections.BoundedConcurrentHashMap.LIRSHashEntry):
A new query plan was added to the cache each execution. This was observed while debugging the following stack which the root cause being that the equals method evaluates to false between the new key and the existing cache key:
The problem is that aliases is null (new key) but that.aliases is not (existing key).
The aliases are initialised in a call to transformTuple(…) which Hibernate calls after adding it to the cache. Which affects the equals method after the cache entry has been added.
Our long term solution is to migrate away from the AliasToBeanResultTransformer and use org.hibernate.query.spi.AbstractQuery#setTupleTransformer instead.
As a short term fix, we attempted to set query.setQueryPlanCacheable(false) but this option is ignored by native queries.
Instead, we extend AliasToBeanResultTransformer and override the equals method to ignore the aliases.
To me it seems there are three probably connected problems:
query plan cache growing until OOM crash
cache allows adding duplicate entries for the same query
QueryPlanCacheable flag ignored for native queries