Wrong order of documents updated in ES and database

Hi,

I am observing sometimes strange behavior (inconsistency) when I do fast changes (milliseconds) in database for the same record.

  1. I create record in database that results in OpenSearch call:
    curl -iX POST ‘https://xxxx.es.amazonaws.com/_bulk’ -d '{“index”:{“_index”:“xxx-write”,“_id”:“T1_1”,“routing”:“T1”}}
    {“data”:“test1”, “_entity_type”:“Xxxx”,“_tenant_id”:“T1”,“_tenant_doc_id”:“1”}

    Response is:
    HTTP/1.1 200 OK
    {“took”:4,“errors”:false,“items”:[{“index”:{“_index”:“xxx-000001”,“_id”:“T1_1”,“_version”:1,“result”:“created”,“_shards”:{“total”:2,“successful”:2,“failed”:0},“_seq_no”:26463,“_primary_term”:1,“status”:201}}]}

  2. Update of the same record in database - change value to “test2”
    curl -iX POST ‘https://xxxx.es.amazonaws.com/_bulk’ -d '{“index”:{“_index”:“xxx-write”,“_id”:“T1_1”,“routing”:“T1”}}
    {“data”:“test2”, “_entity_type”:“Xxxx”,“_tenant_id”:“T1”,“_tenant_doc_id”:“1”}

    Response is:
    HTTP/1.1 200 OK
    {“took”:5,“errors”:false,“items”:[{“index”:{“_index”:“xxx-000001”,“_id”:“T1_1”,“_version”:3,“result”:“updated”,“_shards”:{“total”:2,“successful”:2,“failed”:0},“_seq_no”:26515,“_primary_term”:1,“status”:200}}]}

  3. Another update of the same record in database - change value to “test3”
    curl -iX POST ‘https://xxxx.es.amazonaws.com/_bulk’ -d '{“index”:{“_index”:“xxx-write”,“_id”:“T1_1”,“routing”:“T1”}}
    {“data”:“test3”, “_entity_type”:“Xxxx”,“_tenant_id”:“T1”,“_tenant_doc_id”:“1”}

    Response is:
    HTTP/1.1 200 OK
    {“took”:6,“errors”:false,“items”:[{“index”:{“_index”:“xxx-000001”,“_id”:“T1_1”,“_version”:2,“result”:“updated”,“_shards”:{“total”:2,“successful”:2,“failed”:0},“_seq_no”:26506,“_primary_term”:1,“status”:200}}]}

I have in database correct value “test3”.
But the issue is with value in OpenSearch where I have value “test2” instead of “test3”.

For some reason the request to OpenSearch for update in point 3) was processed before update in point 2).
It can be seen on version and _seq_no from the OpenSearch response.

I am using “hibernate.search.indexing.plan.synchronization.strategy”=“write-sync”.
“hibernate.core.version” = 6.6.1.Final
“hibernate.search” = 7.2.1.Final

This happens occasionally.
Do you please know what is the reason for this issue and how to fix it?

Thank you for help.

Kind Regards.

Hey,

This looks odd. A few questions:

  1. Your snippets mention curl -iX, why is that? Hibernate Search doesn’t use curl.
  2. Is this all happening in a single application instance?
  3. How do you execute these consecutive changes? Single transaction with multiple flushes? Multiple transactions? Can you show the code?
  4. Do the Hibernate Search logs show three separate _bulk requests, or a single one?
  5. Can you confirm you don’t use coordination?
  6. I see this in responses: total”:2,“successful”:2. Do you have multiple OpenSearch nodes, or just two shards (e.g. primary/replica) on the same node? What’s your OpenSearch setup exactly?

Hi,

  1. That’s what I see in log.
  2. It’s a single application that’s running in multiple nodes. So the call to database can be done from any node.
  3. It’s always a single transaction that can come from any node. So one update can come from node1 and the second one from node2 but those are always independent single transactions Resulting in correct values in database. I have constrains on table and a trigger that will rollback transaction if data are wrong (wrong version).
  4. Logs shows 3 separate _bulk requests.
  5. I am not setting “hibernate.search.coordination.strategy” at all. So no coordination.
  6. OpenSearch 2.11 - OpenSearch_2_11_R20241003, 3-AZ without standby, 3 data nodes. Index split into 64 shards.

Thank you.

Ok, well that’s not Hibernate Search logs. No idea where this comes from.

That’s your problem right there. If your second and third operations happen on different nodes, it’s totally possible that the indexing requests are sent to Elasticsearch out of order, because of e.g. garbage collector pauses or just thread scheduling. Or they are sent in order, but due to network latency variations, are received by Elasticsearch out of order.

With a single application node it’s not usually a problem, because there are in-JVM mechanisms to preserve operation order (ES requests are executed in the same order as the order of transaction completion). With multiple nodes, there’s no such protection.

The only solution right now is to use outbox-polling coordination.

In your particular case optimistic concurrency control could, maybe, have helped, though picking a suitable version number could be challenging. But anyway, nobody cared enough so far to contribute an implementation of that, so you can’t use it right now.

@mbekhta This limitation is maybe not as obvious as I thought, so we should probably document it in Hibernate Search 7.2.2.Final: Reference Documentation? This should probably even be the first limitation… And obviously we’d need to reference it from pros/cons of architecture examples, e.g. there: Hibernate Search 7.2.2.Final: Reference Documentation

1 Like

Thank you for your response.

I am not sure that I can use outbox-polling coordination because it needs a list of tenants up front in the configuration. And I have a solution where new tenants are added dynamically. Also each tenant has it’s own database.

This should not be a problem, as long as it works in Hibernate ORM.

This will be a problem indeed. Hibernate Search needs to start an agent for each tenant – especially if each has its own database. Without knowing the list of tenants, it can’t start agents and can’t process events.

Maybe there could be a new feature in a future version of Hibernate Search where applications can list tenants for Hibernate Search on startup (e.g. they retrieve them from a DB), and later “notify” Hibernate Search about newly added tenants, so that Hibernate Search starts agents accordingly. But so far nobody requested that feature, nor offered to work on it :slight_smile:

I have 2 possibilities to solve the issue:

  1. implement something like outbox-polling coordination that will be able to handle new tenants dynamically
  2. I will make sure, that all the database changes for a specific tenant are done from a single node.

I think I will go for solution 2.

Than you very much for helping me.

1 Like