Slow Hibernate Envers Query to Get Revision for Entity with OneToMany

I have an entity with a OneToMany relationship as a list. I want to get a specific revision of the entity. So that will include a list of other entities. So I’m getting the correct result, however the query is very slow (3s) because it’s doing a very complicated query containing max and <=. My native query that returns what I like takes 0.3s.

I have tried the RevisionRepository.findRevision method as well as trying the Audit Reader by doing things like

AuditReaderFactory.get(entityManager)
        .createQuery()
        .forRevisionsOfEntity(TaskDefinitionEntity.class, true, false)
        .add(AuditEntity.id().eq(taskDefinitionId))
        .add(AuditEntity.revisionNumber().eq(revisionId))

The query that Envers creates is this:

select
many_columns    
from pf.task_definition_seeds_revisions t1_0,pf.seed_revisions se1_0 
where 
t1_0.seed_id=se1_0.id 
and t1_0.task_definition_id='ea26372f-63e2-4310-bcbb-0a119a55e0cf' 
and se1_0.rev=
(
select max(se2_0.rev)
from pf.seed_revisions se2_0 
where 
se2_0.rev<='8228'
and se1_0.id=se2_0.id
) 
and t1_0.rev=
(
select max(t2_0.rev) 
from pf.task_definition_seeds_revisions t2_0 
where 
t2_0.rev<='8228'
and t1_0.task_definition_id=t2_0.task_definition_id 
and t1_0.seed_id=t2_0.seed_id 
and t1_0.order_id=t2_0.order_id
) 
and t1_0.rev_type<>'2'
and se1_0.rev_type<>'2';

I do not understand why it would need to include a select max and a lesser-than-equal revision comparison. When I update a main entity then I see the OneToMany entities all get marked as deleted in their revision tables and added again with the same revision as the new main entity revision. So a simple query should be possible. Is there a way to configure envers to create a simpler query for OneToMany relationships? Or should I write a native query?

There is no secret “make-envers-better” configuration option :wink:

We’re open to improvement suggestions though. Can you maybe share the query that you would want to write? Also, what is the query plan of this query? Maybe you’re missing an index?

I’m not an Envers expert, but maybe the org.hibernate.envers.find_by_revision_exact_match configuration option can help? Also see the other configuration options.

If I were to write a query, I would probably use something along the lines of this:

select many_columns    
from pf.task_definition_seeds_revisions  t1_0 
join lateral (
	select *
	from pf.seed_revisions se2_0 
	where se2_0.rev<='8228'
	and se2_0.rev_type<>'2'
	and t1_0.seed_id=se2_0.id
	order by se2_0.rev desc
	fetch first 1 rows with ties
) se1_0 on true
where t1_0.task_definition_id='ea26372f-63e2-4310-bcbb-0a119a55e0cf'
and t1_0.rev='8228'
and t1_0.rev_type<>'2';

You can even write that in HQL if you want, e.g.:

select t, s
from my.company.TaskDefinitionEntity_AUD t
join lateral (
	select s2.originalId
	from my.company.SeedEntity_AUD s2
	where s2.originalId.rev.id<='8228'
	and s2.originalId.rev.revtype<>DEL
	and t.seed.id=s2.originalId.id
	order by s2.originalId.rev.id desc
	fetch first 1 rows with ties
) s
where t.taskDefinition.id='ea26372f-63e2-4310-bcbb-0a119a55e0cf'
and t.originalId.rev.id='8228'
and t.originalId.rev.revtype<>DEL;

Not sure if your database supports this though and also don’t know the exact performance of this, so maybe you need something “simpler” e.g.

select t, s
from my.company.TaskDefinitionEntity_AUD t
join my.company.SeedEntity_AUD s on t.seed.id=s.originalId.id and s.originalId.rev.id = (
	select s2.originalId.rev.id
	from my.company.SeedEntity_AUD s2
	where s2.originalId.rev.id<='8228'
	and s2.originalId.rev.revtype<>DEL
	and t.seed.id=s2.originalId.id
	order by s2.originalId.rev.id desc
	fetch first 1 rows only
)
where t.taskDefinition.id='ea26372f-63e2-4310-bcbb-0a119a55e0cf'
and t.originalId.rev.id='8228'
and t.originalId.rev.revtype<>DEL;

I have tried the ‘find_by_revision_exact_match’ option, but this does not change the query for the entities from the OneToMany relationship. Looking at the code, I think the ‘isFindByRevisionExactMatch’ causes the exact revision match to be added for the main entity query only.

This is the native query I wrote

select rev_type, rev, task_definition_id, order_id, seed_id, url
from seed_revisions
join task_definition_seeds
on task_definition_seeds.seed_id = seed_revisions.id
where
task_definition_seeds.task_definition_id = 'ea26372f-63e2-4310-bcbb-0a119a55e0cf'
and
seed_revisions.rev_type <> '2'
and
seed_revisions.rev = '8228'
order by task_definition_seeds.order_id asc

It returns the same data as your lateral query and performs the same too.

So I could write that native query but I am afraid it is a lot of work an annoying to maintain if I change something in the entities. I’ll try to do it with an HQL query and come back to here.

This query could be wrong when the collection elements are “older”.

When I update a main entity then I see the OneToMany entities all get marked as deleted in their revision tables and added again with the same revision as the new main entity revision.

Please share the entity mapping that you’re using. I would like to understand if this recreation behavior might be related to your specific mapping, because usually, the collection should be maintained incrementally.

Here are my 2 entities and shows the relationship between them. It has more columns, but nothing special. I always save the task definition entity including seeds with the repository.save(entity) call. There is an @Audited annotation on the SeedEntity class, but I get the same behaviour if that annotation is not there.

@Entity(name = "task_definition")
@Audited
public class TaskDefinitionEntity {

  @Id
  @GeneratedValue
  private UUID id;

  @Column(nullable = false, length = 100)
  private String name;

  @OneToMany(cascade = CascadeType.ALL, orphanRemoval = true)
  @OrderColumn(name = "order_id", nullable = false)
  @JoinTable(name = "task_definition_seeds", inverseJoinColumns = @JoinColumn(name = "seed_id"))
  private List<SeedEntity> seeds = new ArrayList<>();
  
    public List<SeedEntity> getSeeds() {
    return seeds;
  }

  public void addSeed(SeedEntity urlSeed) {
    seeds.add(urlSeed);
  }
}


@Entity(name = "seed")
@Audited
public class SeedEntity {

  @Id
  @GeneratedValue(strategy = IDENTITY)
  private Long id;

  @Column(nullable = false, length = 2048)
  private String url;
}

Since Envers only knows ADD and DEL revision types, the only way AFAICT that you can have all collection elements be deleted and then re-added in the next revision is if you add an element to the start of the seeds list. If you add an element to the end of the list, only the new element will get the latest revision.

When I update a task definition, the controller receives a task definition DTO that includes the seeds DTOs. This will be converted into a task definition entity with seed entities. I will then have a task definition with an id value, but the seed entities will all have no id value. I think that is why envers sees that all the seeds are new, and all the old ones are deleted.
When I started with envers I expected that all the objects would get a new revision, so I never regarded this as strange. It is only recently that I noticed that the seeds were marked as deleted too between each revision.
I think I understand better now why it generated this complex query.

1 Like

If you’re using the collection this way, then why not use an @Embeddable instead? That way, you’d probably also benefit from slightly better performance due to fewer inserts.

Anyway, you wrote that the lateral query performs better than the original query. We’d be grateful if you want to work on this to improve Hibernate Envers and generate that query.

The reason I do not use @Embeddable, because I got the impression from the documentation, that you use this if you want the data of the embeddable to be part of the owner entity table.
I don’t see how that would work for a OneToMany relationship. I also don’t see any example in the documentation of using @Embeddable in such a way.
I considered mapping it to jsonb, but writing queries for those can be a hassle. I actually like the database schema I have. But perhaps I overlooked some way to tell Hibernate that they
are not considered entities with a separate lifecycle.

I also read about the @ElementCollection which sounded what I wanted, but the documentation tries to steer people away from that option and states
“Most likely, we’ll realize that we need to add a separate entity after all.”

I’m totally willing to try to change the code as long as I don’t need to overhaul the database schema completely.
Should I try @Embeddable and/or @ElementCollection?

I’m willing to work on the improved query. Do you want to me to use the Envers API to create this query?

You’d mark the SeedEntity as @Embeddable and the collection would become an @ElementCollection. The only thing that would change on the DB side, is that you don’t need a @Id anymore, since that is implicitly given by the task definition id + order column.
Cascading and orphan removal is also implicit, since the collection has a compositional dependency through the owning entity.

Should I try @Embeddable and/or @ElementCollection?

It’s up to you. If the current model works fine for you and you’re happy with it, then don’t bother changing it. I was simply mentioning it because it seems to be more along the lines of what you’re trying to model. Also, you’d avoid the unnecessary delete + add revision entries since you wouldn’t have to care about ids anymore.

I considered mapping it to jsonb, but writing queries for those can be a hassle. I actually like the database schema I have. But perhaps I overlooked some way to tell Hibernate that they

In Hibernate ORM 7.0, this will be easy if you already use a @ElementCollection-like mapping. You’d just have to replace @ElementCollection with @JdbcTypeCode(SqlTypes.JSON_ARRAY) and should be able to continue using it like you’re used to, but it will map the data to a single json typed column.

I’m willing to work on the improved query. Do you want to me to use the Envers API to create this query?

Ideally, you’d start with doing some analysis (capture database query execution plans) on your model and extract a simplified model our of that, which still reproduces the performance problem, by verifying the database query execution plans still run into the same issues you were facing on your original model. Also, it would be awesome if you could capture these query execution plans for the lateral queries that I proposed earlier. If you can come up with something that is even better, also capture that so we can see the different possible solutions and can discuss it.

Then comes the implementation within Hibernate Envers, which if you prefer you can also start with to get a feeling for the complexity. I guess you might know the involved classes better than me, but my guess is, that most changes would happen in DefaultAuditStrategy/ValidityAuditStrategy. You can checkout the Hibernate ORM repository, code your changes and run the tests via ./gradlew :hibernate-envers:test or also start tests from within your IDE.

I have changed the model to use an @ElementCollection for an @Embeddable list.

  @ElementCollection
  @CollectionTable(name = "seed", joinColumns = @JoinColumn(name = "task_definition_id"))
  @OrderColumn(name = "seed_index")
  private List<SeedEntity> seeds = new ArrayList<>();

When I request a specific revision of a task, Envers generates this SQL statement.

SELECT s1_0.rev,
       s1_0.task_definition_id,
       s1_0.seed_index,
       s1_0.rev_type,
       s1_0.includes,
       s1_0.url
FROM   pf.seed_revisions s1_0
WHERE  s1_0.task_definition_id=?
AND    s1_0.rev=
       (
              SELECT Max(s2_0.rev)
              FROM   pf.seed_revisions s2_0
              WHERE  s2_0.rev<=?
              AND    s1_0.task_definition_id=s2_0.task_definition_id
              AND    s1_0.includes IS NOT DISTINCT
              FROM   s2_0.includes
              AND    s1_0.url IS NOT DISTINCT
              FROM   s2_0.url
              AND    s1_0.seed_index=s2_0.seed_index)
AND    s1_0.rev_type<>?

I tried to understand why it tries to compare all columns. It’s not what I expected at all.
Furthermore, the includes field is a List mapped to a JSON field. Since it tries to compare JSON fields I receive this error:

org.postgresql.util.PSQLException: ERROR: operator does not exist: json = json

Not sure what to do now.

I guess it does that because Hibernate Envers doesn’t understand that the tuple (task_definition_id, seed_index, s2_0.rev) is unique. From what I understand, this is another missing optimization.

Furthermore, the includes field is a List mapped to a JSON field. Since it tries to compare JSON fields I receive this error

Sounds like a bug that you should report. I guess the only viable way to compare json data on PostgreSQL is by casting to text and compare that though.