Hello,
We are developing a retention policy service that needs to clean up millions of rows in our database that are no longer needed. We’re using Spring Specifications to build our filters. However, we’ve run into an issue with how Hibernate is generating the delete queries, in our case when dealing with entities that have a tag entity associated with them.
Problem Description:
We have Entity
entities that are associated with Tag
entities via an entity_tag
join table. When Hibernate generates the delete queries, they look something like this:
- Delete from entity_tag:
DELETE
FROM entity_tag to_delete_
WHERE to_delete_.entity_id IN (SELECT e1_0.id
FROM entity e1_0
WHERE e1_0.created <= '2024-02-12 08:37:18.475177+01'
AND e1_0.id IN ((SELECT e2_0.id
FROM entity e2_0
LEFT JOIN entity_tag t1_0 ON e2_0.id = t1_0.entity_id
WHERE t1_0.tag_id IN ('5cf2da52-2433-413f-820f-347544c70d0f'))));
- Delete from entity:
DELETE
FROM entity e1_0
WHERE e1_0.created <= '2024-02-12 08:37:18.475177+01'
AND e1_0.id IN ((SELECT e2_0.id
FROM entity e2_0
LEFT JOIN entity_tag t1_0 ON e2_0.id = t1_0.entity_id
WHERE t1_0.tag_id IN ('5cf2da52-2433-413f-820f-347544c70d0f')))
The issue is that in the first query, the entity_tag
entries are deleted first. This means that in the second query, when Hibernate tries to delete from the entity
table, the entity_tag
relationship no longer exists, and as a result, the Entity
itself is not being deleted.
What We Have Tried:
We attempted to mitigate the issue by streaming the Entity
IDs in batches and deleting them using JPAs deleteAllByIdInBatch
. Specifically, we use Spring Specifications to create predicates that filter the Entity
IDs based on certain criteria. These IDs are then queried in batches and streamed for use in the delete queries. Unfortunately, this approach is too slow, achieving only about 40 deletes per second, which is not sufficient for our needs.
Stripped-Down Entity Definitions:
class Entity(
@ManyToMany
@JoinTable(
name = "entity_tag",
joinColumns = [JoinColumn(name = "entity_id")],
inverseJoinColumns = [JoinColumn(name = "tag_id")]
)
val tags: Set<Tag>
)
CREATE TABLE entity_tag (
entity_id UUID NOT NULL REFERENCES entity ON DELETE CASCADE,
tag_id UUID NOT NULL,
PRIMARY KEY (entity_id, tag_id)
);
Question:
What would be the preferred approach to efficiently delete these Entity
entities along with their associated entity_tag
entries without running into the issue described above? Additionally, is it possible to prevent Hibernate from deleting entries from the join table so that the database’s ON DELETE CASCADE
could handle it instead? This might solve the issue by ensuring the Entity
deletion isn’t dependent on the state of the entity_tag
relationship.
Thank you for your help!