Safe Envers queries when the audit history is incomplete


#1

We have added Envers to an existing codebase/database, which meant creating _aud postfixed tables for some entities. However, now we are trying to query them using AuditReaderFactory we are having problems.

The problem seems to be that AuditReaderFactory tries to read the full details of any associated entities. But, because Envers was added after the system was created, there may not be any associated revision for some entities.

E.g. we have two versioned entities: User and Group. If we add a new User entity, associated to a group, then there will only be an audit record for the User, and not for the group, because the group was created before Envers was added to the system. But when we use AuditReaderFactory to query for revisions of a User entity, Hibernate automatically tries to load the associated Group as well, even though the association is configured to be lazy. Because there is no revision for the Group, the whole load fails with an exception.

Therefore, how can we query using AuditReaderFactory in safe way, when there may not be anything in the DB for some associated entities? E.g. either not trying to load the Group, or falling back to the latest version of the Group from the main Group table, or just not erroring and just returning the revision data for User?


#2

Can you show us an example of the Envers queries you are running and the SQL queries that are generated?

As far as I know, you need to explicitly use the traverseRelation to create a JOIN between two AUD tables.

@Naros, you might be interested in this question too.


#3

Unfortunately you cannot just enable Envers in this way.

One example where this can backfire at just normal audit run-time is if you use ValidityAuditStrategy. This strategy maintains several metadata columns on each row in the audit schema. Part of the strategy’s logic is when a user updates an entity, not only does Envers insert a new row into the audit schema’s entity table but it also executes an update on a prior row with very specific predicate conditions. If the audit schema is empty or if the row with the expected predicate conditions isn’t found, Envers will rollback the transaction because its assumed that some data inconsistency has happened.

You’re going to have to manually populate the audit schema with a database script. This script simply needs to do the following:

  1. Create a new row in REVINFO.
    Basically here all you are doing is generating a new revision number and giving it a timestamp. All entries you pre-populate can use the same REV since this is simulating a bulk load.

  2. For each audit table, copy the data from your businsess table into the audit table columns.
    Here you will use the REV column value from step 1 in each of the REV fields in your audit tables. You will also want to make sure that you set the REVTYPE field with the value of 0, which means ADDED or INSERT.

There has been some discussion in HHH-8051 about how to improve upon this and it has been something that I have been building a proof of concept for Hibernate 6. The concern in that discussion is specically around how consumers of the query results intepret the result itself.

For example, lets say that we return to you a User instance but this instance does not have a Group, its null. Does this imply that the user simply doesn’t belong to a Group, has the prior Group been deleted potentially? Does it mean that the Group just does not exist in the audit schema?

If we take that a step farther and consider retreiving the real Group when we cannot find a Group in the audit schema, you can imagine how expensive that could be if you had some code like this:

List<?> results = AuditReaderFactory.get( session ).createQuery().
  .forRevisionsOfEntity( User.class, true, true )
  .addOrder( AuditEntity.property( "userName" ).asc() )
  .addOrder( AuditEnttiy.revisionNumber().asc() )
  .getResultList();

for ( Object object : results ) {
  User user = User.class.cast( object );
  Group userGroup = user.getGroup();
}

In this case, #getGroup() would potentially amount to 2 queries for each User instance returned. If you’re returning a lot of users, that can lead to a bit of overhead.

We realize that its less than ideal to ask users to pre-populate their audit schemas we’re looking at ways to remedy those requirements in future releases, but presently its a must.

I will say I welcome any ideas, suggestions, and pull requests :slight_smile: .


#4

Thanks, I’ll add the script to pre-populate the audit tables.

I’d note though that in our case it’s actually the .getResultList() that fails in our case, not .getGroup(). The code to retrieve the list of Users using AuditReaderFactory seems to eagerly query the Group table, even though Group is supposed to be lazy.


#5

So I take it your mappings look something like this

@Entity
@Audited
public class User {
  @ManyToOne(fetch = FetchType.LAZY)
  private Group group;
  ...
}

@Entity
@Audited
public class Group {
  @OneToMany(mappedBy = "group")
  private List<User> users;
}

Envers does not honor any of the lazy semantics imposed by Hibernate and JPA. You have to understand that what Envers returns is not entities but simply an instance of an entity-class that may be fully or partially populated with state based on your audit requirements. We could consider it, but it’d be a new feature we’d have to ask users to opt-into for backward compatibility if such a feature is really necessary.


#6

Yes, more or less.

I am trying to write the migration to create the fake revision / bulk insert. The problem is, AuditReaderImpl will not accept a revision number of 0, which is what I’m trying to use to ensure it doesn’t conflict with any existing revision that has already been created in environments where this has already been deployed.

I guess I’m going to have to try to fix-up every other entry in the revisions table, and modify every corresponding entry in every audit log table to match the new values.


#7

That’s one option.

I believe the > 0 restriction is only applied to a variety of AuditReaderImpl methods. What you could consider is introducing a new configuration property that allows the 0 revision number and have the AuditReaderImpl methods use a slightly different validation if that property is enabled. By default, that configuration property defaults to false (disabled) for backward compatibility.

I suspect that will be a far quicker and potentially easier solution. You could even send us a PR with the code and we could review and consider including that since that might ease the use case you face for other users.

Update

It also appears the #checkPositive() call is also used inside CrossTypeRevisionChangesReaderImpl and inside AuditQueryCreator. Since both of those classes have access to EnversService, they should be able to also check the state of the configuration property and branch accordingly.


#8

FWIW, I added this improvement JIRA HHH-12595. Let me know whether you want to work on this.


#9

Thanks. I’d like to have a go. Perhaps over this weekend.


#10

Alright, if you run into any issues, let me know.