When indexing entities from a database, Elasticsearch’s versioning is unfortunately not enough.
Take for example an entity A with an
@IndexedEmbedded to entity B and C.
In one node, the application loads A, B and C in the session, then changes B and thus triggers the reindexing of A.
In the other node, the application loads A, B and C in the session, then changes C and thus triggers the reindexing of A.
Each node will try to index a new version of A, but neither will have a right version of the document:
the first node will lack the update of C, and the second node will lack the update of B.
Using version numbers to favor one version or the other would not solve the problem: you would end up with incorrect data in the index in either case.
So for now we didn’t implement versioning as exposed by Elasticsearch: it would take time from us, and would not solve the problem (because of the specific constraints of Hibernate Search).
The plan is to solve the problem by moving reindexing to a separate session that would re-load data from the database. Essentially we will capture “entity change” events from all nodes, and send them to an event queue (in Kafka or other). Each application node will process events regarding a subset of all entities (making sure that each entity instance is always processed by the same node). Processing an event involves loading the entity and reindexing it.
This would solve the scenario mentioned above because, by the time the second event is processed, the changes to both B and C are in the database, and thus loading the entities will show a completely updated view of A. We will get eventual consistency.
(This is, of course, assuming there is no delay between the transaction commit and the visibility of changes. It could happen with a replicated relational database, but I don’t have a solution for this scenario at the moment)
That solution, however, involves quite a lot of work, and thus it’s currently only planned for 6.1.