Hibernate-Search 6 - Optimistic Locking Support

Hi again,

another question regarding the availability of a lovely feature called Versioning.
Our already existing distributed system utilizes that feature - therefore it would be necessary to have some kind of versioning support within hibernate-search too.

Can you tell me if there is already support for versioning? (I was not able to find an entry about optimistic locking / locking within the documentation).
As I did not find any appearances of ‘optimistic locking’ / ‘locking’ in the documentation, I hope it’s currently on some kind of roadmap?

Greetings and thanks in advance!

When indexing entities from a database, Elasticsearch’s versioning is unfortunately not enough.

Take for example an entity A with an @IndexedEmbedded to entity B and C.
In one node, the application loads A, B and C in the session, then changes B and thus triggers the reindexing of A.
In the other node, the application loads A, B and C in the session, then changes C and thus triggers the reindexing of A.
Each node will try to index a new version of A, but neither will have a right version of the document:
the first node will lack the update of C, and the second node will lack the update of B.
Using version numbers to favor one version or the other would not solve the problem: you would end up with incorrect data in the index in either case.

So for now we didn’t implement versioning as exposed by Elasticsearch: it would take time from us, and would not solve the problem (because of the specific constraints of Hibernate Search).

The plan is to solve the problem by moving reindexing to a separate session that would re-load data from the database. Essentially we will capture “entity change” events from all nodes, and send them to an event queue (in Kafka or other). Each application node will process events regarding a subset of all entities (making sure that each entity instance is always processed by the same node). Processing an event involves loading the entity and reindexing it.
This would solve the scenario mentioned above because, by the time the second event is processed, the changes to both B and C are in the database, and thus loading the entities will show a completely updated view of A. We will get eventual consistency.
(This is, of course, assuming there is no delay between the transaction commit and the visibility of changes. It could happen with a replicated relational database, but I don’t have a solution for this scenario at the moment)

That solution, however, involves quite a lot of work, and thus it’s currently only planned for 6.1.

Thanks for your explanation.

We did kind of a similar approach to target these issues - to give a short overview: whenever an entity should be updated on the index, we put it into a distributed key-value store (e.g. Redis).
A scheduled job then accesses these “update-information” and reindexes these entities (which involves loading the entity from the database + converting it into the document + indexing it).
As you already mentioned: this also allows us to get a completely updated view of any entity.

Hehe… We are currently thinking about publishing our update events to RabbitMQ + consuming them from a different service which is then responsible for indexing all updates. Therefore your plans would come in quite handy. I guess if that’s built very “open”, we could even consume that queue and process these updateEvents from a different service (as we have better possibilities of distributing load away from the main application). But okay, let’s see what’s coming :slight_smile:

Thanks for your detailed answer!