Retention management for Envers audit table

Hello,

A few years ago go we used envers to have an audit log on some tables for our application.
At the time the question of “how long are we going to keep theses audit record ?” was not asked and since then the GDPR came in to play which forces us to think about it.

My first naive idea was to just drop records belonging to older revisions that have to be removed but this will eventually leave some records with no audit record linked to it.
I remember that, at the time, we had to populate the audit table with an initial snapshot of the table’s content because we had errors regarding missing initial records (even though we are not using the Validity Audit Strategy).

Therefore my question is: how can I implement a retention mechanism for audit table without breaking the application ?

1 Like

For simplicity for the moment, lets table GDPR.

Now since you are not using the ValidityAuditStrategy, would your retention policy be something along the lines of if the revision is older than X days, it should be discarded?

I’m sorry what do you mean by “lets table GDPR” ?

What I would like to achieve is indeed removing any revision (from revinfo table) and their related records (from *_aud tables) older than 2 month.
And I whish to run this removal on a daily basis.

What I meant was lets not consider it and just focus on retention first.

You have a few options and which you use depends how you want the data to look after you do this.

Remove all revisions prior to retention range

In this use case, you would basically examine the REVINFO table and find all entries with a timestamp value older than the intended retention range and from there iterate over all the tables removing any revisions that are less-than that revision.

You will need to iterate the _AUD tables in such a way that should constraints between the _AUD tables exist that you aren’t violating them so that the database operation succeeds.

What I personally find as a downside to this approach is you will eventually loose sight of when a row was actually first created. The history of a given row may be that the first insert is older than the retention range but the updates and/or removals are still relevant. If that’s of no concern for your organization, great.

Remove all update revisions prior to retention range, preserve insert/delete

In this use case, you would do precisely the same behavior as the other approach, but instead you would also examine the REVTYPE column value as well. The only time a row could be removed is if the value of that column represents the RevisionType.MOD value, which numerically is 1, again where the REV column’s revision number value are those with a timestamp less-than the retention range.

From an auditor perspective, its often less important to keep track of all the smaller incremental updates to a row over its lifetime, but having a clear picture when it started and when it ended can be far more valuable depending on the context of data being kept.

Typically most of the audit history maintained for rows in any system are where a row is updated, so by only focusing on those types of revision types, you can reduce the overall size of your audit data pretty substantially but maintaining a decent life-cycle history of the actual row itself.

HTH.
Chris

Very great answer thanks !

I did not think about using the REVTYPE as a filter but this will certainly help keeping the tables at a managable size.

It might not be enough for the GDPR constraint though because at some point I’ll have to erase any trace of some records, but I think of a way for doing it.

Many thanks again !

I’m not exactly well versed on all the permissible ways to work with GDPR or what you absolutely must do in order to remain compliant, specifically when a request is received to erase private data of individuals. That said, I believe the same steps you’d take in your main database hold true for the audit schema.

  1. Identify rows to be removed or made anonymous.
    Maintain this list of row primary keys per table.

  2. Use the primary keys of tables in step 1 to find audit rows for those identifiers.
    In this step you basically look at the associated _AUD tables based on the rows you’ve modified or deleted in step 1 and get all entries regardless of the revision. You would either elect to update columns to make the data anonymous (if that’s permissible, removing the person’s identity) or you would delete the rows entirely.

  3. For all rows deleted in step 2, maintain list of revision numbers to review.
    After all the _AUD_ tables have had their GDPR request rows purged, you’ll want to review the revisions that were touched to see if any valid foreign-key references still exist per revision.

    If a revision in REVINFO no longer has any rows in any corresponding _AUD table, you’ll likely want to consider deleting the row from the REVINFO table as well or perhaps use a custom revision entity for REVINFO where you add a new column that your purge process can mark accordingly so that you can see there was once entries associated to that revision but due to GDPR requests, those have since been purged from record. Knowing how strict certain audit requirements can be, doing the latter is probably the most ideal and optimal choice, but that’s entirely up to your situation.

Hope that sheds some light on it from the GDPR perspective at least.

Chris

Thanks again,

It gives very good directions I definitely will follow.