Best practice for CRUD operations on @OneToMany relationships

Hello,

I would like to know the recommended approach for CRUD operations on @OneToMany relationships.

– TLDR –

I want to efficiently update a parent-child relationship, both in normal and bulk/batch operations (1-50K parents with up to 2x30 children), for changes to the entire parent/child complex as well as individual children. I want to minimize the amount of code/CRUD logic in the entities but also ensure that the parent and child are always in sync. If possible, I prefer not to rely heavily on annotation magic.

Here’s the full story:

In simplified terms, I have two entities:

@Table(name = "parent")
@Entity
public class ParentEntity{

@Id
@JdbcTypeCode(SqlTypes.*CHAR* )
@Column(length = 36)
@GeneratedValue
public UUID id;

@OneToMany(mappedBy = "parentEntity", cascade = CascadeType.*ALL* , orphanRemoval = true)
public List<ChildEntity> children = new ArrayList<>();

}

@Table(name = "child")
@Entity
public class ChildEntity{

@Id
@JdbcTypeCode(SqlTypes.*CHAR* )
@Column(length = 36)
@GeneratedValue
public UUID id;

@ManyToOne(fetch = FetchType.*LAZY* )
@JoinColumn(name = "parent _id")
public ParentEntity parentEntity;

}

So far, I have only dealt with the case where the entire parent entity needed to be updated (CRUD).

For that purpose, I created a ParentRepository. It first calculates (quite complex) which children need to be removed, updated, or added during an update. Then it calls parent.update(), where the changes are applied directly to the children in an operation on the parent. This ensures that the parent and children are always synchronized, and I never need to call persistAndFlush separately (Which I think is a big issue?). However, the downside is that the ParentEntity is bloated with CRUD operations, which I believe shouldn’t be there. Currently, the ParentEntity has around 170 lines, well beyond my “pain threshold” of 60-80 lines.

Now, the additional requirement is that I need to be notified of CRUD changes to the children at another part of the application. One quick solution would be to add an @EntityListeners annotation to the ChildEntity. However, I’m not a big fan of annotations because they can quickly become magical and difficult to understand. Therefore, I’m considering controlling the CRUD operations through a separate ORM service and then directly notifying a service injected there with a clear method call.

However, when I operate directly on the children, it seems that the parent entity doesn’t become aware of the changes, and I have to reload it from the database, resulting in unnecessary SELECT queries. This is particularly problematic for bulk operations involving 50K parent entities, each with up to 2x30 child entities.

Additionally, now there is a new case where I need to update only specific fields of the parent (CRUD). Loading the entire parent entity and processing all the fields appears to be too expensive. Therefore, I’m currently handling this in the ChildRepository. But now I have two separate places where a parent and its children can be updated, which makes it impossible to have a central place for making changes to the parent/child bundle and notifying various services about these changes.

Last but not least, if the bulk update operations become to resource hungry, actually there could even be up to 1 million parents in - EXTREME - cases - I would like to have the possibility to skip informing listeners about the changes, or send a consolidated “Bulk update info” to them only - that would be easy when calling a service from a dedicated, central ORM CRUD Service, but impossible - or I just have no idea how to - with an annotation based @EntityListeners solution.

Sorry, this is so complex that it’s challenging for me to explain concisely but clearly. I did my best. I hope you understand my problem.

Thank you very much for your time and thoughts on this.

I want to efficiently update a parent-child relationship, both in normal and bulk/batch operations (1-50K parents with up to 2x30 children), for changes to the entire parent/child complex as well as individual children. I want to minimize the amount of code/CRUD logic in the entities but also ensure that the parent and child are always in sync. If possible, I prefer not to rely heavily on annotation magic.

It feels to me that your fear of annotations and over-abstraction are very much complicating things, but I have a very hard time reading and understanding your scenario, so maybe I’m misunderstanding.

If you expect lots of children, you will have to manage (persist/remove) the “child” entities instead of working with the collections in the parent entity, given that you want these operations to be efficient in the sense that it doesn’t load the collections into memory.

For that purpose, I created a ParentRepository . It first calculates (quite complex) which children need to be removed, updated, or added during an update.

I can’t imagine what the purpose of that is. I suppose you have a HTTP endpoint that takes a list of child entities in form of a JSON payload. So let that endpoint first run a delete query delete from ChildEntity c where c.parent.id = :parentId and then insert the deserialized child entities. If you want some listener to be called, call that listener for every child. How you abstract that and where you put the logic is up to you.

you’d just delete all children and then recreate all?

if You have 30 children, and one added/updated/deleted, that sounds like a lot of unneeded operations, which is why I really check in detail - which children need to be deleted, which updated, which removed - but yes, that makes the code very complex and might cost some extra cpu cycles.

If you expect lots of children - maximum of 30 children / parent, but there are two ManyToOnes, so up to 2x30 children, usually I expect 0-2x5 maybe.

But Parents - I expect 1000-50k parents.

For the bulk operations, I don’t need to instantly see the final parent with all updated children, but for a single add/update operation, returning the “final end result” of parent and child would be good.

Is persistAndFlush() always needed when working with a ChildService / Repository that the parent delegates to, or is it considered a smell that shows something is wrong (to me it feels like the latter).

You can also run a query like delete from ChildEntity c where c.parent.id = :parentId where c.id not in (:childIds) and then EntityManager.merge the deserialized child entities.

If you expect lots of children - maximum of 30 children / parent, but there are two ManyToOnes, so up to 2x30 children, usually I expect 0-2x5 maybe.

30 children are no big deal, you can work with the collection directly if you want.

But Parents - I expect 1000-50k parents.
For the bulk operations, I don’t need to instantly see the final parent with all updated children, but for a single add/update operation, returning the “final end result” of parent and child would be good.

I don’t understand what sort of bulk operations you are trying to model, but if you have concerns about memory usage, then split the operation into parts. Only process e.g. 1000 elements at once and then do a persist and flush + clear to free memory.

Is persistAndFlush() always needed when working with a ChildService / Repository that the parent delegates to, or is it considered a smell that shows something is wrong (to me it feels like the latter).

I have no idea what this is “a ChildService / Repository that the parent delegates to”, so I can’t tell you what a smell is and what not, but usually you shouldn’t require explicit flushes.

How would you inform other services of changes to the child entities?

By Adding a @EntityListeners to the two child entities, or by delegating all changes to the children to two dedicated ORM services / Repostiroy classes, that have injected a service?

1)

@ApplicationScoped
public class ParentRepository implements PanacheRepositoryBase<ParentEntity, UUID> {

    private final ChildARepository childARepository;
    private final ChildBRepository childBRepository;

    @Inject
    public ParentRepository(ChildARepository childARepository, ChildBRepository childBRepository) {
        this.childARepository = childARepository;
         this.childBRepository = childBRepository;
    }

 public Parent  updateParent(Parent parent){
  childARepository.updateChildren(parent.getChildrendA());
  childBRepository.update children(parent.getChildrendB());
  ParentEntity parentEntity = findParentEntity(parent.id());
  parentEntity.update(parent);
   // now parent knows nothing about the fact that the children were changed

Parent updatedParent = ParentEntityMapper.map(parentEntity);
return updatedParent; // show the client the updated parent
}

@ApplicationScoped
public class ChildARepository implements PanacheRepositoryBase<ChildAEntity, UUID> {
//ChildBRepository in the same way
private final ChangePublisher changePublisher;

    @Inject
    public ChildARepository(ChangePublisher changePublisher) {
        this.changePublisher = changePublisher;
    }

public List<ChildA> updateChildren(List<ChildAEntity> children){
  //update children
   List<UUID> childrenIds;
 changePublisher.informOfChangedEntities(childrenIds);
 return ChildAEntityMapper.fromEntities(children);
}

OR solution 2:

@ApplicationScoped
public class ParentRepository implements PanacheRepositoryBase<ParentEntity, UUID> {
public Parent  update parent(Parent updatedParent){
ParentEntity parentEntity = getParentEntity();
parentEntity.updateParentAndChildren(updatedParent):
Parent updatedParent = ParentEntityMapper.map(parentEntity);
return updatedParent; // show the client the updated parent
}


@Table(name = "parent")
@Entity

public class ParentEntity  {

ParentEntity updateParentAndChildren(){
// update everything in the entity itself
return this;
}

@Table(name = "child_a")
@Entity
@EntityListeners(ChildAEntityListener.class) //Use an EntityListeners annotation to get informed globally of any changes to a child
public class ChildAEntity  {
}

@Table(name = "child_b")
@Entity
@EntityListeners(ChildBEntityListener.class)  //Use an EntityListeners annotation to get informed globally of any changes to a child
public class ChildBEntity  {
}

I don’t know what sort of events you are trying to publish here or what you are trying to achieve with that, so it’s hard for me to say what is better, given that there could be an entirely different solution to your problem.

Generally though, using an entity listener is IMO preferrable as you have no chance of forgetting to call it. On the other hand, the listener isn’t called if you run DML queries, so depending on how you want to design the persistence operations, calling into the listeners explicitly might be required either way. Still, I think I prefer the entity listener solution.

It isn’t run when I run DML queries(insert, update, delete)? Then how will it be usable at all? Or did you want to say DDL?

Ok, some more details. I am working on configurable rulesets, where the customer in the UI can define - WHEN - a child A (or B) was added/updated or deleted - - DO THIS.

In the given case “DO THIS” means in the first place, make an insert to another table.

You could now say then just use database triggers… but given that this rulset logic is a complex system of currently over 4k lines of code, and I want to any time be able to extend it with further rules, I guess keeping the solution in Java/Hibernate is better then using triggers (and I can’t say that the THEN part will definitely always be just another INSERT).

It isn’t run when I run DML queries(insert, update, delete)? Then how will it be usable at all? Or did you want to say DDL?

HQL/JPQL also has the concept of DML queries and since these queries are translated to DML SQL queries, it’s not really possible to call entity listeners as the entity objects might not exist in the persistence context.

Ok, some more details. I am working on configurable rulesets, where the customer in the UI can define - WHEN - a child A (or B) was added/updated or deleted - - DO THIS.

Depending on whether these rule actions should be run as part of the original transaction or not, I would recommend you to look into a tool like Debezium which allows you to decouple this handling from Hibernate APIs and the main transaction.