I am using Hibernate L2 cache in my project and using EHCache as the caching provider. Hibernate version is 4.3.11 and EHCache 2.10.5.
The question I have is:
Hibernate will take care of updating the L2 cache when writes (inserts and updates) are done through Hibernate itself. Please correct me if I am wrong.
However when the database is updated through some other means, by some other application thus bypassing the Hibernate in my application, then Hibernate has no way to know that data has changed in the backend. In this case, the other application that updated the backend can send a notification to my application which can then use Hibernate API to invalidate the cache?
If the DB is changed outside of Hibernate, you can use a CDC (Change Data Capture) approach using a tool like the open-source Debezium project to extract changes and propagate them to Hibernate.
Or, you can run some introspection queries periodically to verify the modification timestamps of the underlying cached entities.
Or, you can just set the cached entries TTL to a lower values so that cached results are short-lived and they can be updated frequently via re-fetching from the DB.
Thanksā¦ yes, currently we are using 3rd approach that is using TTL of 1 hour.
The 2nd approach will require modification to the application.
The 1st approach, I could understand how Debezium would extract database changes however the later part, can you please elaborate more on āpropagate them to Hibernateā. How will this happen since the hibernate is running inside my Java application process and Debezium would run in a different process I believe. How they will communicate?
If you are doing changes in Hibernate entities itself, you donāt have to do anything else to ensure the consistency of L2 cache, Hibernate will take care of it.
If you are doing changes via native queries, then explicitly mention which entities are affected, otherwise Hibernate will invalidate the entire second-level cache.
If you are changing data in the database from another process, then Hibernate is not aware of it, and you will have to define a strategy that best suits your requirements app (expiration policies, explicit invalidation called from the outside of the application, etc).
To elaborate on the suggested Debezium alternative, I think 2nd level cache invalidation is a great use case for leveraging Debeziumās embedded engine instead of running it via Kafka. In this mode of operation, Debezium runs within your application itself and a callback method is invoked whenever a change event arrives. This handler then would invoke the {{Cache#evict()}} method for the given entity id.
Iāve filed DBZ-991 in our tracker for creating a blog post on this, at it poses some interesting challenges, e.g. how to materialize the right id type from the change message. Not sure when weāll get to this, but I hope weāll find some time for writing this up soonish.
I was wondering if below could be a feature provided by Hibernate:
For Hibernate Query Cache, hibernate could integrate with in process Debezium and based on the events received, the Hibernate would invalidate the query cache internally.
I think this would be a very useful feature and would extend to architectures in which the database gets updated from other applications.
That could only be done via a separate Hibernate module like hibernate-debezium.
Using the embedded mode, it would be easier to implement it, but Iām not sure if Zookeeper is still needed to operate as well. Maybe @gunnar.morling can shed some light on this idea.
For Hibernate Query Cache, hibernate could integrate with in process Debezium and based on the events received, the Hibernate would invalidate the query cache internally.
Yes, the same approach discussed in the blog post could be used for not only invalidating items in the 2nd-level cache but for invalidating query cache regions, too.
Iām not sure if Zookeeper is still needed to operate as well
No, ZK is not needed when running Debezium in library mode. All you need is the Debezium core JAR and the Debezium connector for your database.
Essentially, itās āonlyā a matter of finding someone who is willing to spend the time to work on this, any contributions will be very much welcomed. Iāve added a comment to DBZ-911 which describes the steps needed to create a generic implementation of this approach.
I donāt think itās much more difficult. Essentially for the query results cache we just need to obtain the affected entity type and trigger invalidation via the TimestampsCache. So in fact itās even simpler than invalidating specific 2LC entries.
Would you perhaps be interested in giving it a try? The sample code from the blog post is here. It could be the starting point for a more generic, ready-made implementation.
Thanksā¦ however due to some other commitments i wonāt be able to spend enough time on this. Also, I am part of Hibernate user community and not the developer. I would recommend someone who has a good knowledge of Hibernate code to take this up.
Unfortunately, there are hundreds of issues to be fixed, new features to be implemented, documentation to be updates, answers to be given on the forum, so the Hibernate team is most of the time busy doing all these tasks.
This is an open-source project, so the community should also be interested in making it better, right?
Could I evict the 2level cache when updating the database in another process? For an example when I have a relationship:
Employee(1) - (n)DepartmentMemberships(n) - (1)Department
When I remove the department, the DepartmentMemberships will be removed as well (cascade Remove), then I want to the 2level cache of Employee will be invalidating(because the DepartmentMemberships are fetched with Employee).