HQL hibernate 6.0.final migration

Just running it once. It would run if the app ends abnormally. The app does not seem any slower.

Eclipse Mat is suggesting this (I am no expert):

The thread java.lang.Thread @ 0xc00c1680 pool-2-thread-1 keeps local variables with total size 1,043,591,552 (95.31%) bytes.

The memory is accumulated in one instance of java.util.HashMap$Node[] , loaded by , which occupies 934,359,304 (85.33%) bytes.

The stacktrace of this Thread is available.

Keywords

  • java.util.HashMap$Node[]
  • org.hibernate.query.sqm.internal.ConcreteSqmSelectQueryPlan.buildCacheableSqmInterpretation(Lorg/hibernate/query/sqm/tree/select/SqmSelectStatement;Lorg/hibernate/query/sqm/internal/DomainParameterXref;Lorg/hibernate/query/spi/DomainQueryExecutionContext;)Lorg/hibernate/query/sqm/internal/ConcreteSqmSelectQueryPlan$CacheableSqmInterpretation;
  • ConcreteSqmSelectQueryPlan.java:339

Tried it on java 17 (17.32.13-ca-jdk17.0.2-linux_x64), same results, java.lang.OutOfMemoryError: Java heap space.

I have dug a bit more and found the reason. The above query is OK. We loop all the entries and do another lookup in the loop, and it is the lookup that consumes the heap:

for (Clog clog : listObj) {
...
    Hclog entry = lookupMgr.getHByClog(clog);
    if (entry != null) {
        ....
    }
...
}

Here is the lookup code: lookupMgr.getHByClog(clog)

CriteriaBuilder builder = session.getCriteriaBuilder();
CriteriaQuery<Hclog> criteriaQuery = builder.createQuery(Hclog.class);
Root<Hclog> root = criteriaQuery.from(Hclog.class);
criteriaQuery.select(root);

// Initial
Predicate and = builder.equal(root.get(Hclog_.clog),clog);

Query<Hclog> query = session.createQuery(criteriaQuery.where(and));
Hclog obj = query.getSingleResult();
//Hclog obj = query.getSingleResultOrNull() // did not help

return obj != null ? obj : null;

####
Hclog class relation to Clog
private Clog cog = null;
@ManyToOne(fetch = FetchType.EAGER)
@JoinColumn(name = "clogPartnumber")
@NotFound(action = NotFoundAction.IGNORE)
public Clog getCog() {
	return clog;
}
####
mapping class:
Hclog_.clog
public static volatile SingularAttribute<Hclog, Clog> clog;

This is not new code, and has been running OK as we converted from Criteria some years ago.

We use a hibernate.current_session_context_class that uses a Session-per-operation anti-pattern (8.5.1. from the docs), ie we set the flush mode to manual and then at the end of the job flush the session.

Looking at BatchFetchQueue there is an additional field subselectsByEntityKey which keeps a history of all the selects in this session.

This map is going to get 50K entries in this session. The Mat leak report above points to this.

/**
 * A map of {@link SubselectFetch subselect-fetch descriptors} keyed by the
 * {@link EntityKey} against which the descriptor is registered.
 */
private Map<EntityKey, SubselectFetch> subselectsByEntityKey;

/**
 * Adds a subselect fetch descriptor for the given entity key.
 *
 * @param key The entity for which to register the subselect fetch.
 * @param subquery The fetch descriptor.
 */
public void addSubselect(EntityKey key, SubselectFetch subquery) {
	if ( subselectsByEntityKey == null ) {
		//subselectsByEntityKey = CollectionHelper.mapOfSize( 12 );
		subselectsByEntityKey = Collections
				.synchronizedMap(new LRUMap<EntityKey,SubselectFetch>12));
	}

	final SubselectFetch previous = subselectsByEntityKey.put( key, subquery );
	if ( previous != null && LOG.isDebugEnabled() ) {
		LOG.debugf(
				"SubselectFetch previously registered with BatchFetchQueue for `%s#s`",
				key.getEntityName(),
				key.getIdentifier()
		);
	}
}

As a test I swapped the
new CollectionHelper.mapOfSize( 12 )
for a fixed size (org.apache.commons.collections4.map.LRUMap)
new LRUMap<EntityKey,SubselectFetch>12)

Now the job completes normally.

Also, for some reason the subquery in the put is always com.sun.jdi.InvocationException: Exception occurred in target VM occurred invoking method, and there is never a getSubselect(…) or removeSubselect(…) from the map either.

That’s very helpful, thanks for the details!

We have to keep track of the source query for every proxy that we create when you use @Fetch( FetchMode.SUBSELECT ) so that we can then load a proxy by using the original query when accessed. My guess is, we currently don’t check if subselect fetching is even enabled for an association and just accumulate queries into a map here. The more you fetch, you were saying 50k individual fetches, the more memory is consumed.
We also can’t simply remove entries from that map as it is bound to objects loaded within the session, so the map will become unreachable when the session is closed.

Can you please post the definition of HClog? Also, I created an issue for this problem which you can track if you want: [HHH-15202] - Hibernate JIRA

I analyzed this now and found the issue. We were adding SubselectFetch without checking if the entity is even using subselect fetching and paired with your inefficient getHByClog this leads to a massive consumption of memory.

The method is inefficient because it creates a criteria query for which no execution plan is cached. Since this is a static query, you could use session.createQuery("from Hclog h where h.clog = :clog").setParameter("clog", clog).getSingleResult() instead which will properly cache the query plan. That way, the app will consume less memory overall and also be faster because it can reuse pre-computed query plans.

OK thanks.
For me there is nothing in the array, other than
com.sun.jdi.InvocationException: Exception occurred in target VM occurred invoking method

ie no sql statement.

… it will consume all your heap, if you have 1 million records you will get 1 million entries. For most this issue would be a breaker, you cannot traverse a large result set flushing on each read, it would take forever. For my 50K, I gave up after 10 minutes.

Here is the HClog class

@Entity
@Table(name = "hclog")
@Cache(usage = CacheConcurrencyStrategy.READ_WRITE)
public class HClog implements Serializable {
    private String id = UUIDGenerator.generateUUID();
    private HClog clog = null;
    public HClog () {}

    @Id
    @Column(name = "id", nullable = false)
    public String getId() {
       return id;
    }
    public void setId(String id) {
       this.id = id;
    }
    ..
    @ManyToOne(fetch = FetchType.EAGER)
    @JoinColumn(name = "clogPartnumber")
    @NotFound(action = NotFoundAction.IGNORE)
    public Clog getClog() {
        return clog;
    }
}

There is a little misconception here. The API you are using is stateful and the state is bound to a transaction within a session. If what you want to do is not stateful or doesn’t need a transaction boundary over the whole operation, you should be using the StatelessSession API and/or the scroll/stream API.

The statefulnes implies that every entity loaded will stay in the first level cache until detached somehow. Sounds to me like you simply don’t want that, so you should use a different API or clear the cache after the first loading.
Anyway, this discussion goes beyond the memory issue. We can continue that in a new topic though if you like. A last suggestion, instead of executing 1 query per Clog to fetch HClog, just join the HClog in the main query select clog, hclog from Clog clog left join HClog hclog on hclog.clog = clog

The job is building the main search index, so HClog is one of many lookups. Generally it is not possible (or wise) to create huge select statements joining all associations. Data comes from many tables. There are updates in the job so a StatelessSession API may not be appropriate.

Reading the docs (and your comment above) is related to annotation
@Fetch(FetchMode.SUBSELECT)

What we are doing is different, it is an addition query on the open session. Maybe be this was not considered when the mod was done?

If this annotation is stored maybe could be used to stop the call to BatchFetchQueue.addSubselect(…)

I have noticed that HHH-15202 is set as closed. So what is going to happen now. Just ignore it?

The job is building the main search index, so HClog is one of many lookups. Generally it is not possible (or wise) to create huge select statements joining all associations.

Why do you think that it is not wise to join all associations? Have you tried and had issues? Did you compare the result to what you are doing now? In general, I can tell you that joining and selecting all ToOne associations at once is going to be far more efficient as you only have the roundtrip to the database once and the database can access the data more efficiently. When doing individual lookups you are every time sending a fresh SQL statements and the database always has to do some lookup over and over again.

I don’t know how your full code looks like, the StatelessSession API was just a suggestion as it seemed to me that you could benefit from it when using join fetches.

I have noticed that HHH-15202 is set as closed. So what is going to happen now. Just ignore it?

HHH-15202 is not closed, it is marked as resolved as it was fixed in the upstream repository and will be part of the next bug fix release of 6.0.

Our search and suggester indexes includes alot of data, for example past orders, addresses, contacts etc. Over the years it has grown, so probably simpler to add an sql lookup rather than altering the classes and retesting everything.

I did modify the BatchFetchQueue subselectsByEntityKey to use the LRUMap(…) to avoid any possibility of running out of heap (invoicing :fearful:).

HHH-15202, I could not see how how it was resolved.

We have converted hundreds of @Entity’s and thousands of org.hibernate.* to javax.persistence.criteria.*, not an easy upgrade. It is working perfectly so good job.

Our search and suggester indexes includes alot of data, for example past orders, addresses, contacts etc. Over the years it has grown, so probably simpler to add an sql lookup rather than altering the classes and retesting everything.

Join fetching collections is tricky, so I understand that you probably don’t want to do that. I think you are misunderstanding the effects of join fetches though. The semantics shouldn’t change if you add root.fetch("association"), it’s just a different performance model due to the join fetch being used for fetching that association.

HHH-15202, I could not see how how it was resolved.

There will be no more SubselectFetch objects created in your case. You can see the fix in the commit linked to the JIRA issue: HHH-15202 Only create SubselectFetch if entity contains subselect fet… · hibernate/hibernate-orm@e5f28ae · GitHub

I tested the fix and it works great (completed normally in ‘92’ secs). Thanks.

I guess what ever you do in your sql is a trade off, easy maintenance vs complex joins. Also having too many @Entity’s has its associated maintenance also.

One gotcha we found, in your conversion, check all your HQL statements use the correct camel case and match the field names exactly on your class.

eg for
IllegalArgumentException: org.hibernate.query.SemanticException: Could not interpret path expression ‘invoiceperiod’

got missed and should have been ‘invoicePeriod’.

JPAMetaModelEntityProcessor is great on mappings, but overlooked in our HQL statements.

Not sure what you want to say here, but if you have a new question or problem that you need help with, please create a new discussion.

Observations, this case sensitive HQL change was new for v6, we had tested extensively on 5.6.7.Final prior to upgrade. All scripts have run now so happy days!

Not sure why 5.x accepted attribute names in a case insensitive fashion, but I’m pretty sure that’s something that wasn’t done intentionally. Anyway, happy to hear that all works fine now for you.