MassIndexer delete data based on condition

Hello I’m tring to execute MassINdexer with purgeAllOnStart = false and with condition

Example

massIndexer.type( WorkflowDefinition.class ).reindexOnly( “type= :type” ).param(“type”,“SUB”);

When I do that will bre reindex all the entities with this condition, but it did not delete the entities that not exists on database.

For example we have 8 records, if I delete 1 record direct on database and do the reindex, it will keep on the elasticsearch the entity deleted

If set purgeAllOnStart = true, delete everything, but how we have a lot of records, it will take a long time

Is possible when purgeAllOnStart = false, the condition first delete on the index and then reindex the data?

Hello,

This is how it’s expected to work; see Hibernate Search 7.0.0.Final: Reference Documentation :

Even if the reindexing is applied on a subset of entities, by default all entities will be purged at the start. The purge can be disabled completely, but when enabled there is no way to filter the entities that will be purged.

What you’d need is either HSEARCH-3304 or HSEARCH-1032.

Unless you want to contribute one of these, and until someone else does, your can use the following workarounds:

  1. Avoid letting your index get out of sync in the first place; that could be possible if the cause is just some JPA batch job.
  2. Delete entities from the index manually; that probably only realistic if there is a small to medium number of entities.
  3. On Elasticsearch/OpenSearch only, retrieve the REST client and send a delete_by_query request to Elasticsearch/OpenSearch.

I would like to contribute , where I can found the steps to do that?

Thanks!

All necessary information should be here: hibernate-search/CONTRIBUTING.md at main · hibernate/hibernate-search · GitHub

Feel free to reach out to the team if you need help. In particular, it might be a good idea to discuss your idea for any new APIs on Zulip before spending too much time on it – that’ could spare you some wasted effort.

I updated the description of HSEARCH-3304 .

I’d recommend starting with that one, as it will most likely be necessary to implement HSEARCH-1032, which could also turn out to be much more complex.