Hi Yoann,
I think that catching indexing errors will not help, because the catching logic may stop/fail (e.g. application restart/redeployment) and those errors get lost -> cluster out of sync. Anything done after the DB transaction commit may fail and would lead to out-of-sync situation.
The same applies to storing a queue of entities to index in Kafka - how are you going to sync that queue with the database? The transaction is done, push to Kafka fails -> cluster out-of-sync.
In order to make it safe, I see only one way: the fact, which entities need to be indexed, must be stored together with those entities in the same DB inside the same transaction used to update/insert/delete those entities (a separate table aka event sourcing). In this case it is guaranteed that the knowledge of which entities to index will not be lost. Storing that knowledge in any other system (Kafka or a file system) would require 2-phase commit, which is quite a pain and usually not an option. When automatic indexing succeeds it has to mark those entities/events as processed. If automatic indexing fails, those entities/events must be retried by an async process. Or do I miss something?
Best regards,
Sergiy