Optimize left join fetch

Hey,

I wonder if it could be possible to optimize some left join fetch queries
To explain my problem, I will give you a basic example (maybe stupid :wink: ).
We’ve a Person that have a OneToMany relationship to Cat and another OneToMany relationship to Dog

Imagine that I do a SELECT p FROM Person p LEFT JOIN FETCH p.cats
Right, in the first level cache, I’ve my Persons with their Cats

Now I want to complete my Persons with their Dogs, so I execute :

SELECT p FROM Person p LEFT JOIN FETCH p.dogs WHERE p IN :persons

The parameter :persons is set with my previous loaded persons

The problem is that all data of my persons will be loaded again and this is not necessary because I already have them

It would be nice in such case to only get the Person.ids and dogs data, it’s enough for Hibernate to retrieve the Persons from the cache, no ?
If my list of Person is huge and a Person has a lot of data it will drastically improve the request and data transfer from the database, no ?

Am I missing some thing, do i do it the wrong way ? (of course the solution here is not to FETCH both relationships in the same query)

For the evolution of the syntax, maybe that my second query could be something like

SELECT p FROM Person p.id LEFT JOIN FETCH p.dogs WHERE p IN :persons

The Person variable is p but specifying p.id means that only Person id column will appear + Dogs data
Of course if the Persons are not in the cache it raises an exception

You can specify to use subselect or batch fetching for the association. When you initialize one of the dog collections on the person, subselect fetching will trigger a query like select * from dog d where d.owner_id in (select p.id from person p ...) i.e. use the original query as subquery in the in-predicate. With batch fetching, it will do a query like select * from dog d where d.owner_id in (?, ?, ..., ?) and the amount of ? is equal to the configured batch size.
Depending on how complex your original query is, one or the other approach might be more favorable. Also see Hibernate ORM User Guide and Hibernate ORM User Guide

First, thanks for your reply
But it works only if the relationship is bi-directional, no ? And not sure the second query will link dogs and the persons…because there is no fetch clause

If Dog does not know the person ?

There is no need for a bidirectional relationship. The queries I posted are SQL and are only to help you understand what happens behind the scenes. Person and dog are either related through a join table or through a FK column e.g. owner_id as part of the dog table.

You can write:

List<Person> list = em.createQuery("from Person p left join fetch p.cats", Person.class).getResultList();
Hibernate.initialize(list.get(0).getDogs());

When you use subselect fetching, this will trigger at most 2 queries. The query you wrote yourself and an additional one for initializing all dogs for all previously loaded Person objects.
If you use select fetching with a batch size, this will trigger 1 + N/BATCH_SIZE queries, where N is the amount of Person objects…

Maybe I’m not explaining myslef clearly
I want to avoid N+1 SELECT problem

So, Hibernate.initialize, for all persons, is not a solution

What I use to do is : SELECT p FROM Person p LEFT JOIN FETCH p.dogs WHERE p IN :persons
with :person being the persons retrieved from my first SELECT. So 2 SELECT to retrieve 2 relationships for all my persons

But

If I do that (the second query for dogs), Hibernate loads all the columns of my persons again, which is not necessary

You are not understanding correctly :wink:

Re-read my last comment and also try to understand the examples in the documentation. If you configure batching, there is no N+1 loading, but rather N/BATCH_SIZE+1. If you configure subselect fetching, you’ll only see 2 SQL queries being executed.

Oh my god !
I found it so magic that I did not try your solution…now I have and it works
Thanks and sorry…

I test SUBSELECT. It means that Hibernate remembers the request used to load the parent of dogs !?..a little bit magic, no ?

Now I ask myself if it is performant because the first query (the one used to load the Persons) can take time, so using it again can cause performance problems. Would it be more efficient to pass only the ids of the previous loaded persons ? I think so

…and so my proposal cannot be a new feature ?

Thanks again

Hum…I made some tests and it seems that SUBSELECT, in my case is more efficient than JOIN FETCH with a request that send all previous loaded Ids of the persons. The problem of sending all Ids is that if you’ve lot of Ids, your’re obliged to split the Id list into chunks because of DB limitations. So you executes more than one request to load the Dogs

Sl, I don’t know where SUBSELECT is a bad solution, maybe if the first request takes time and return very few objects. In that case, sending the retrieved id can be more efficient…

Thanks again and again Christian !!!

I test SUBSELECT. It means that Hibernate remembers the request used to load the parent of dogs !?..a little bit magic, no ?

No magic involved, it’s just a nice feature :wink:

Now I ask myself if it is performant because the first query (the one used to load the Persons) can take time, so using it again can cause performance problems. Would it be more efficient to pass only the ids of the previous loaded persons ? I think so

Sl, I don’t know where SUBSELECT is a bad solution, maybe if the first request takes time and return very few objects. In that case, sending the retrieved id can be more efficient…

Deciding whether to use SELECT fetching with batching or SUBSELECT fetching is exactly based on the performance of the initial query. Since SUBSELECT will essentially re-execute that initial query, you have to decide this based on what you expect.

Thanks again
For my use case BatchSize could be the solution but the problem is its “random” choice of proxies to initialize. I need to explain
I load my persons then I calculate the persons for whom I will have to load their dogs.
If I use BatchSize, Hibernate can load some dogs that I’m not interested in

This is why I think that a request like this could be good for me:

FROM Person p.id LEFT JOIN FETCH p.dogs WHERE p IN :selectedPersons

p.id means: for Persons just retrieve its id, I’ve already all Person data in the cache

What do you think ?

Can’t you put that “calculation” into the query where predicate to only do a single query? Something like from Person p left join fetch p.dogs where p.age > 123

In some cases yes but others no, we need to do it at “Java level”

and of course the sample with Dogs is to explain my problem. The real problem is more complexe

Hi,

So, not interested by my proposal with something like p.id in HQL ?
This is the lack of such feature that make me use jdbc for some batchs… :frowning:

What you are proposing already exists and is called batch fetching in Hibernate. Read the documentation: Hibernate ORM User Guide

I still don’t really understand your use case, but sometimes using a native query is indeed easier. Don’t fight the tool if you can do it easier differently.

Let me try to explain my usecase more precisely

I’ve Application A, Application A loads some Persons
Then Application A sends the persons with their data to Application B to filter the list. The role of Application B is to filter the Person list using the person data plus other information known by Application B.
Application B returns a list of Person’s Ids
Application A needs to load all Dogs of the corresponding Ids

Now imagine that the loaded persons are persons whose id is : 1,2,3,…30_000
Application B returns the ids : 1, 100, 2_000, 20_000, 25_000 up to 30_000

If I use @BatchSize(size=50) for example (whatever the size is)
If now I get the persons filtered by Application B and I do a person.getDogs() because I only need the Dogs of the filtrede Persons, I do not have any control on the persons for which Hibernate will load the Dogs.
Maybe, if I do a person.getDogs() for person 1, Hibernate will load persons for id = 1 up to 50. The problem is that I’m not interested by Persons 2 up to 50. The same will happen for person id=100,…and so on.

The problem with BatchSize is that the application does not have any control on the choice done by Hibernate

This is why I’ve asked for a syntax like “FROM Person p.id LEFT JOIN FETCH p.dogs” (or any other solution that allows me to retrieve only the id of a previously loaded object)…and the reason is that a Person has lot of data and returning them multiple time when I have to LEFT JOIN FETCH is not efficient

Hope it will clarify my use case

You can do that, but will unnecessarily have to re-load the person data. So what you can do is from Person p left join fetch p.dogs where p.id in :personIds with the list of ids that is returned by Application B. It would indeed be nice if there was a way to query collections directly e.g. from Person.dogs d where d.owner.id in :personIds, but we currently don’t have that. You can follow Collection querying support · hibernate/hibernate-orm · Discussion #7068 · GitHub if you’re interested and report your use case there.

Ok, I will join the discussion