Difference between Query#getResultStream() / Query#getResultList() and JOIN FETCH

tomsontom · January 5, 2023, 9:58pm

If I use JOIN FETCH and DISTINCT Hibernate deduplicates the duplicates if I use getResultList() - my probably wrong expectation was that this would hold true for getResultStream() as well.

But as my tests show it is not. So my question are:

is expected that hibernate doesn’t do the deduplication also on getResultStream()
should i better use getResultStream().distinct().... oder getResultList().stream()...

tomsontom · January 8, 2023, 9:38am

It was pointed out at the twitter thread I started that this problem goes away by ordering the result by kl.id - this also explains to me that our Unit-Tests did not find this problem because they set up data in away (and I assume most tests do) that the naturaly order in the JOINED-Table matches exactly the one in the “klient”-Table.

I think this makes the behavior worse because there is some sort of deduplication going on but only if the entities are returned in order.

beikov · January 9, 2023, 9:31am

getResultStream() will make use of the JDBC scrolling API which is lazy i.e. fetches rows on demand. De-duplication usually requires that a set of objects is materialized to check for duplicates, which defeats the purpose of streaming, as you should only do streaming if chances are high, that you can’t fit all your results in memory.
I would advise you against using getResultStream() unless you are sure that memory consumption is an issue, as the JDBC scrolling API is usually implemented with database cursors, which comes with other possible issues, especially if the cursor is kept open for long.

I guess vertragsDaten is a collection? If so, then streaming/scrolling won’t work for you anyway, because that kind of requires that the entity cardinality matches the row cardinality i.e. join fetched collections are disallowed.

tomsontom · January 9, 2023, 10:57am

Yes vertragsDaten is a collection - your conclusion that FETCH JOIN are disallowed sounds reasonable - but I would expect if this is true that Hibernate throws an error or at least logs a warning.

As I stated in my other reply the main problem is that Hibernate does remove duplicates under some circumstances - so there a big likely hood that people get hit by this in production as their JUnit-Test setup most likely creates a system where hibernate deduplicates them.

beikov · January 9, 2023, 12:55pm

Hibernate 6 by default does deduplication for queries that select a single entity alias. Actually, it seems like there is special code for scrollable results when collection fetches are involved, though it doesn’t seem to handle deduplication. Also see FetchingScrollableResultsImpl. In that case, I’d classify this as a bug, so please create a JIRA issue and attach a reproducer test case.

Topic		Replies	Views
Why are object references duplicated for each row of ResultSet? Hibernate ORM	5	1140	September 26, 2022
Processing millions of records with query.getResultStream() Hibernate ORM	2	5904	July 15, 2020
Query is not being processed Hibernate ORM	1	366	November 14, 2018
Scrolling stored procedure cursors Hibernate ORM	3	1158	June 4, 2018
QueryImpl consuming more memory Hibernate ORM	2	881	February 21, 2018

Difference between Query#getResultStream() / Query#getResultList() and JOIN FETCH

Related topics