Bulk Deleting with Hibernate When Deleting Related Entities First

Hello,

We are developing a retention policy service that needs to clean up millions of rows in our database that are no longer needed. We’re using Spring Specifications to build our filters. However, we’ve run into an issue with how Hibernate is generating the delete queries, in our case when dealing with entities that have a tag entity associated with them.

Problem Description:

We have Entity entities that are associated with Tag entities via an entity_tag join table. When Hibernate generates the delete queries, they look something like this:

  1. Delete from entity_tag:
DELETE
FROM entity_tag to_delete_
WHERE to_delete_.entity_id IN (SELECT e1_0.id
                               FROM entity e1_0
                               WHERE e1_0.created <= '2024-02-12 08:37:18.475177+01'
                                 AND e1_0.id IN ((SELECT e2_0.id
                                                  FROM entity e2_0
                                                           LEFT JOIN entity_tag t1_0 ON e2_0.id = t1_0.entity_id
                                                  WHERE t1_0.tag_id IN ('5cf2da52-2433-413f-820f-347544c70d0f'))));
  1. Delete from entity:
DELETE
FROM entity e1_0
WHERE e1_0.created <= '2024-02-12 08:37:18.475177+01'
  AND e1_0.id IN ((SELECT e2_0.id
                   FROM entity e2_0
                            LEFT JOIN entity_tag t1_0 ON e2_0.id = t1_0.entity_id
                   WHERE t1_0.tag_id IN ('5cf2da52-2433-413f-820f-347544c70d0f')))

The issue is that in the first query, the entity_tag entries are deleted first. This means that in the second query, when Hibernate tries to delete from the entity table, the entity_tag relationship no longer exists, and as a result, the Entity itself is not being deleted.

What We Have Tried:

We attempted to mitigate the issue by streaming the Entity IDs in batches and deleting them using JPAs deleteAllByIdInBatch. Specifically, we use Spring Specifications to create predicates that filter the Entity IDs based on certain criteria. These IDs are then queried in batches and streamed for use in the delete queries. Unfortunately, this approach is too slow, achieving only about 40 deletes per second, which is not sufficient for our needs.

Stripped-Down Entity Definitions:

class Entity(
    @ManyToMany
    @JoinTable(
        name = "entity_tag",
        joinColumns = [JoinColumn(name = "entity_id")],
        inverseJoinColumns = [JoinColumn(name = "tag_id")]
    )
    val tags: Set<Tag>
)
CREATE TABLE entity_tag (
    entity_id UUID NOT NULL REFERENCES entity ON DELETE CASCADE,
    tag_id UUID NOT NULL,
    PRIMARY KEY (entity_id, tag_id)
);

Question:

What would be the preferred approach to efficiently delete these Entity entities along with their associated entity_tag entries without running into the issue described above? Additionally, is it possible to prevent Hibernate from deleting entries from the join table so that the database’s ON DELETE CASCADE could handle it instead? This might solve the issue by ensuring the Entity deletion isn’t dependent on the state of the entity_tag relationship.

Thank you for your help!

You could map the entity table again in a different entity without the collection association and delete that instead. This way, Hibernate ORM won’t cleanup the join table.