Modelling question: combine collection and inheritance

I have a modelling question: I’m looking for some kind @ManyToAny but where each of the Many always has the same kind of Any. I’ll clarify…

So I’m still working on the model I described in this topic. The relevant part is the relation between Tasks and (abstract) SubTasks, where a concrete SubTask can be an instance of any of a number of subclasses of SubTask. There are many of those subclasses, so I want to avoid single table inheritance. Each Task has a number of SubTasks, but these SubTasks are all of the same subclass per Task. That means there’s no strict need for a link table in the database: I can keep a discriminator column in the Tasks table, and for each of the SubTask tables there’s a foreign key to its Task.

The queries to fetch a Task’s SubTasks are easy enough, but they would preferably only select from the right SubTask table, determined at runtime. Is there perhaps an elegant way to express this kind of relation, hopefully without delving to deep into Hibernate internals? :thinking: The best solution I can think of right now is to manually fetch the SubTasks with criteria queries or something, where the target table is dynamically chosen. In my Java code, a Task has a fixed set of SubTasks so I guess that could do. But then I may be missing out on the expressive power and performance optimization that Hibernate has to offer, so any expert tips would be highly appreciated.

but they would preferably only select from the right SubTask table, determined at runtime

To do that, Hibernate would need to know the Task type for a one-to-many association. I think that maybe a query like select t from Task t join fetch treat(t.subTasks as Type1SubTask) should be able to do the trick, but I have no idea how your mapping looks like.

Maybe share the model here again so that we can talk about the same thing. Also, how does the query look like that you want to execute?

Hi. Thanks for the quick reply again.

The model/mapping is still in flux, I have to get it to work op the PHP side. When I have it working I’ll post the mapping and table structure, that will be easier indeed. The query you mention looks promising, but I’ll need to take a much closer look at HQL to judge that. :smiley:

So more details following soon but you may have already solved my question…

(EDIT: translate type names to English)

It took me some time, but I finally published a test case at https://codeberg.org/stevendh/hibdisctest . At the moment you’d have to run a MySQL/MariaDB server and run eduweb2.sql against it to play with it. Would it be useful for me to look for an in-memory database and port my example to it? If so, how? Also, would you run random Java code (mine) from the internet?

Perhaps some clarification for the Dutch class names in the code:

Dutch [code] English [this text] note
Taak Task
Opgave SubTask abstract class
OpgaveAanwezigheidCreeer SubTaskCreatePresence subclass of SubTask
OpgaveDubbeleInschrijvingRaadpleeg SubTaskConsultDoubleRegistration subclass of SubTask
Uitzondering Exception distinct from java.lang.Exception
Uitgezonderde Excepted interface, implemented by Task and Subtask

So the question I had revolves around Task.subTasks. As it is now, the first time the ‘subTasks’ are called, they’re fetched [code: Opgave.getOpgavenPerTaak] with a criteria query based on the discriminator value (subtask_type) kept in the Task table [db: dsc_taken]. The fetching of the Exceptions en the SubTask’s payload (e.g. PresenceData for SubTaskCreatePresence) is based on the list of Attributes defined in getFetchAttributes in the respective subclasses of SubTask. Of course this would produce an N+1 problem with more than one Task of the same subtask_type, but I don’t think that performance is terribly important here.

The persistence callback @PostLoad seemed interesting here, I tried it in the form of the commented-out method “Task.assignSubTasks” [code: Taak.assignOpgaven]. It seemed to work, but the docs warn against executing queries from a callback.

I was also wondering if it would be possible to implement the attribute exceptions at a higher level like SubTask or Excepted. This harks back to my first question here about an inverse ‘Any’ mapping, here for Exception.excepted. Now I have, in each Java type that has exceptions, a OneToMany mapping with a static @Where clause, which seemed to be the cleanest solution possible. If it were possible to specify a dynamic Where annotation, I could pull the exceptions stuff up to the Excepted type and avoid the current redundancy. Maybe I’d need to make Excepted an abstract class, and demote Timestamped to an Interface. Do you happen to have a nice solution in mind?

Also, I sometimes find strange results when reading Tasks from the database, specifically when there’s a Task that has multiple Exceptions. (One could execute [code:] BeginHier.toevoegingUitzondering a few times, and then look at the output of [code:] printBomen.) The SQL query of course obtains multiple lines for such a Task, but Hibernate should filter out the duplicates if I’m not mistaken. I had a look in a debugger and found that one Task was represented as just a Task, but the one with multiple Exceptions was present as a Task$HibernateProxy and also as an ordinary Task. Could that be a source of confusion for Hibernate? Is there perhaps something wrong with the primary key definition of Task? Observed with Hibernate 6.1.7.Final, but also with 6.2.0.CR3.

Oh, and a kind request: if you spot some obvious n00b mistakes, please let me know…

I’m really sorry, but could you please translate the names of the model to English in that text again? I’m having a hard time understanding what we are talking about, especially because pluralization in Dutch is not obvious to me :sweat_smile:

One way to model this is the following:

@Entity
abstract class Task<T extends SubTask> {
    protected Set<T> subTasks = new HashSet<>();

    public abstract Set<T> getSubTasks();
}
@Entity
class Task1 extends Task<SubTask1> {
    @OneToMany(mappedBy = "task")
    public Set<SubTask1> getSubTasks() {
        return subTasks;
    }

    public void setSubTasks(Set<SubTask1> subtasks) {
        this.subTasks = subtasks;
    }
}
@Entity
class Task2 extends Task<SubTask2> {
    @OneToMany(mappedBy = "task")
    public Set<SubTask2> getSubTasks() {
        return subTasks;
    }

    public void setSubTasks(Set<SubTask2> subtasks) {
        this.subTasks = subtasks;
    }
}

i.e. declare state in the parent class with a type variable, and expose the persistent property only in subclasses. I don’t know if that matches your envisioned model, but I guess that’s as good as it gets.

In queries, you’d then have to specify the type of the task when you want to fetch/join something:

select t from Task t join fetch treat(t as Task1).subTasks

Hm, yes, I was wondering how legible my comment was. And sometimes even native speakers get the distinct plural forms wrong! Edited now, should be easier.

I hadn’t thought of this parallel Task hierarchy next to the SubTask one, I’ll try it out. With the criteria query the treat ... as is unnecessary, and I wanted to avoid pasting together of queries. But instead I got some rather complex type juggling/casting instead, and also new (for me) adventures in type variables. Now I’m not sure which is the least inelegant…

Thanks!

I had a look in a debugger and found that one Task was represented as just a Task, but the one with multiple Exceptions was present as a Task$HibernateProxy and also as an ordinary Task.

When fetching an entity graph, Hibernate processes stuff row by row and constructs entities based on the result. As soon as the entity graph contains a lazy association to e.g. Task, it will create a proxy for that association, unless an object for Task with the same primary key is already part of the persistence context at this point.
Looking at your model, it seems that the order of processing potentially allows for such a scenario. So if you query from Task t join fetch t.XYZ and the type of XYZ contains a lazy association to Task, then this can happen if XYZ refers to a Task that is processed later.

Now I’m not sure which is the least inelegant…

I still don’t know what you are trying to model exactly, but since you mentioned PHP, I guess you are working with an existing database model?

How about making the SubTask/Opgave an abstract entity that uses @Inheritance(TABLE_PER_CLASS) instead? That way, you can just fetch subtasks with from Task t join fetch t.subTasks in a single query, which would roughly look like this:

select *
from task t
join (
  select * from dsc_opg_aanw_creeer
  union all
  select * from dsc_opg_2ins_raadpl
) subTask on t.id = subTask.task_id

Duplicate Tasks & proxy
Indeed, Exception has a lazy reference to its Excepted. But if I remove the lazyness from that relation I get a stack overflow, presumably because the Excepted (in this specific case: Task) also has an eager relation to its Exceptions. If I remove the eagerness there, it mostly works as expected, but I do see why I set the lazyness of Exception.excepted in the first place: after I issue
FROM School AS s LEFT JOIN FETCH s.tasks AS t LEFT JOIN FETCH t.exceptions
Hibernate still performs a new query for each of the School’s two Tasks. I don’t understand why, but it’s probably not important for my use. Something similar happens with one of the two SubTasks (which has an Exception), but not the other (which has none).

So I switched the eagernesses of these Any ⟷ OneToMany relations from their defaults and didn’t get what I expected. Is that because one is not mappedBy the other? Was it a bug, where Hibernate should detect that the HibernateProxy is actually the same as one of the Tasks and remove it? Or did I unwittingly expect something unreasonable from Hibernate?

Model
You’re right, I should have given some context. I’m working on a school information system written in PHP. There’s an old version we like to phase out, and a complete rewrite were we use Laravel. We need to register a lot of the information we process with the government, like registering new pupils, reporting their absence, diplomas obtained, et cetera. The relevant government agency exposes a SOAP interface for that, described by WSDL files. It would be nice if we could call these APIs from PHP, but my colleagues and I have not found a convenient way to do this. So the way we do this in the old version is that every piece of information to be sent to the government is recorded in a database, and from time to time we use a small Java program to read this new information from this database, put it in the appropriate XML format and send it to the SOAP server. We then record the results we get there in the database. The rewritten PHP code requires a rewritten Java program, and that’s what I’m working on. I picked Hibernate instead of the manually written queries we use now, which are error-prone, inflexible and don’t allow for much abstraction. It’s still a challenge, in part because I need to design a new database structure which can be worked with from both Laravel’s ORM and Hibernate, and also for my lack of experience with ORMs.

So the name Task is just a nod to our educational setting. Each Task corresponds with one SOAP call, and a SubTask is one item with an associated status and a data payload (e.g. the registration info of one pupil). The SOAP calls mostly follow a CRUD pattern, so there’s many types of SubTask. Currently we implement 41 but I expect that to increase over time, perhaps significantly so. As the Java program has limited functionality, it doesn’t need to be able to create new Tasks or SubTasks but it must be able to modify them. Perhaps some of these fiddly associations of mine are not even necessary (in which case I apologize for the noise). That’s also the reason there’s no Task.setSubTasks method.

Inheritance modelling
At the moment I have SubTask as an abstract MappedSuperclass. Table-per-class then allows to use SubTask (perhaps even Excepted?) as a query-able Entity, but there’s no differences/downsides compared to MappedSuperclass if I’m not mistaken. That would make the code a lot cleaner, at the cost of a huge UNION, right?

I’ll take a good look at your suggestions (again) and see if I can use something. “As good as it gets” seems to be the theme here, but perhaps I’ll get to something a little prettier than I have now.

So I switched the eagernesses of these Any ⟷ OneToMany relations from their defaults and didn’t get what I expected. Is that because one is not mappedBy the other? Was it a bug, where Hibernate should detect that the HibernateProxy is actually the same as one of the Task s and remove it? Or did I unwittingly expect something unreasonable from Hibernate?

It would be best if you provide code using the test case template (https://github.com/hibernate/hibernate-test-case-templates/blob/master/orm/hibernate-orm-6/src/test/java/org/hibernate/bugs/JPAUnitTestCase.java) that reproduces the issue, so that we can understand what it is that you are trying.

Table-per-class then allows to use SubTask (perhaps even Excepted ?) as a query-able Entity

correct

but there’s no differences/downsides compared to MappedSuperclass if I’m not mistaken. That would make the code a lot cleaner, at the cost of a huge UNION, right?

Correct, but since you are not running this very often, it might be good enough for your needs.

Thanks for the reply. I’ll see if I can make a minimal reproducer.

Quick question: does table_per_class require unique primary keys over the subclass tables? I tried it and the HQL query works (nice!), but if SubTasks of different classes have the same id they can link to the wrong Taak. It works exactly as you sketched out, there’s no check for the discriminator value Taak.subTask_type.

Also, the large UNION query doesn’t seem to fetch the eager associations I expected it to, like SubTaskCreatePresence’s data payload. If I take the slightly uglier route of manually fetching a Task’s SubTasks, I seem to have more control over the process. Or at least I understand more of it. :thinking:

Up next: trying to get my bug to reproduce in the test case template.

Yes, the primary key must always be unique across the whole hierarchy.

Also, the large UNION query doesn’t seem to fetch the eager associations I expected it to, like SubTaskCreatePresence ’s data payload.

It should if you mark the association as EAGER.

Finally, a test case: stevendh/hibernate-bidir-any: Tries to show bug in bidirectional Any-relation: extraneous query results if laziness is switched from default. - hibernate-bidir-any - Codeberg.org. It has nothing to do with the relation between Tasks and SubTasks, it’s related to my question about a bidirectional @Any mapping.

I hope the test case is useful. I tried to emulate other tests I found and to make the model minimal. SubTaskA is not necessary to trigger the bug, but it shows the intention of the polymorphic mapping.

Some other weird things I noticed:

  • Hibernate generates an SQL constraint on the DException table but it shouldn’t. I suppose this comes from the @OneToMany on Task.exceptions, if that’s processed before SubTaskA.exceptions?
  • Executing the HQL SELECT query generates a redundant WHERE-clause (so two times e1_0.excepted_type = 'Task').
  • Earlier I noticed that hibernate-jpamodelgen doesn’t process Exception_#excepted. A quick test seems to confirm it happening on the DException model in the test case as well.

And another thing (I do love tangents, right?): I think I hit a documentation bug. To circumvent the erroneous SQL constraint, I tried to use foreignKey = @ForeignKey(value = ConstraintMode.NO_CONSTRAINT) in the @JoinColumn, but that didn’t work initially. Old advice on the internet suggested I should apply this to both ends of the relation (or all ends of this polymorphic thing…) and that did the trick. But I can’t find this requirement in the documentation, neither with Hibernate nor in the Jakarta Persistence spec.

Well, the one-to-one relation from SubTaskCreatePresence to its payload PresenceData is eager (also by default). But when I run the HQL FROM Task t JOIN FETCH t.subTasks JOIN FETCH t.exceptions, Hibernate executes one large query (a join of the three types with a union-select for subtasks), and then a separate query for each Task and PresenceData. Looks like an N+1 I’d like to avoid, if easily possible.

Anyway, I don’t think there’s a practical way for me to assign unique primary keys to all different kinds of SubTask from the PHP code. At the moment I use UUIDs for that (I need them anyway for the SOAP calls), but apparently that’s a questionable choice for primary key. I’ll change that first on the PHP side and in my test code, and then revisit this question while actually implementing the little program we need.

Hibernate generates an SQL constraint on the DException table but it shouldn’t. I suppose this comes from the @OneToMany on Task.exceptions, if that’s processed before SubTaskA.exceptions?

Using @OneToMany will currently always cause a constraint to be created because there is no support for referring to a many-to-any yet. I’d say this is something that should be addressed via [HHH-15722] - Hibernate JIRA

Executing the HQL SELECT query generates a redundant WHERE-clause (so two times e1_0.excepted_type = 'Task').

Please create a JIRA issue for that.

Earlier I noticed that hibernate-jpamodelgen doesn’t process Exception_#excepted. A quick test seems to confirm it happening on the DException model in the test case as well.

Also, create a separate issue for this as well please.

And another thing (I do love tangents, right?): I think I hit a documentation bug. To circumvent the erroneous SQL constraint, I tried to use foreignKey = @ForeignKey(value = ConstraintMode.NO_CONSTRAINT) in the @JoinColumn , but that didn’t work initially. Old advice on the internet suggested I should apply this to both ends of the relation (or all ends of this polymorphic thing…) and that did the trick. But I can’t find this requirement in the documentation, neither with Hibernate nor in the Jakarta Persistence spec.

Another issue for which we’d appreciate you to create a separate JIRA ticket :slight_smile:

Well, the one-to-one relation from SubTaskCreatePresence to its payload PresenceData is eager (also by default). But when I run the HQL FROM Task t JOIN FETCH t.subTasks JOIN FETCH t.exceptions , Hibernate executes one large query (a join of the three types with a union-select for subtasks), and then a separate query for each Task and PresenceData . Looks like an N+1 I’d like to avoid, if easily possible.

That’s by design. We won’t add joins for eager associations to the SQL if you provide Hibernate a HQL query, as that would be kind of strange. If you want to eagerly load the presenceData, you will have to add a join fetch for that as well, or work with entity graphs.

I was wondering how legible my comment was. And sometimes even native speakers get the distinct plural forms wrong! Edited now, should be easier.

Thanks for the reply. .

.