Hello,
document says that
I don’t understand what “you will end up with duplicates in the index” means.
Autually, what I want to do is that
without purging index, partial entities (eg. entities which has specific category) is updated to index.
Does “duplicate” means that there exists 2 same data in the case of documents MassIndexer re-indexed? (not update but add?)
Then, is there any way to “update massively” about partial entities without purging index?
Yes, exactly.
Though the risk is mostly with the Lucene backend. In the case of an Elasticsearch backend, Elasticsearch will remove older versions of documents automatically, so this problem could only arise if you are using custom routing based on a property other than the ID, and that property changed.
Well, if you’re using the Elasticsearch backend and don’t do custom routing, you can just call purgeAllOnStart(false)
. You won’t run the risk of creating duplicates if you’re in that situation.
However, be aware that Hibernate Search will not know about entities that have been deleted from the database – unless that deletion was caught by automatic indexing, but since you’re doing mass indexing I assume you made some changes that were not caught by automatic indexing. So for example if you ran an SQL delete
manually, and run the mass indexer after that, the mass indexer will not notice an entity was deleted and will not remove the corresponding document from the index.
To be really safe you will have to delete the entities from the index manually before indexing them. Delete-by-query is not implemented yet, unfortunately, so you will have to rely on manual indexing APIs (call purge(<id>)
for each entity you want to reindex, basically).
Incidentally, there are plans to provide a more “direct” solution for what you’re trying to achieve: [HSEARCH-1032] - Hibernate JIRA . That will be in a future version, though
1 Like
Thanks for reply.
Same Delete issue can also happen in the case of automatic indexing normally.
So, I think that the Delete issue should also be handled in the case of mass indexing same way as automatic indexing.
If you mean that altering the content of the database through JQPL delete
statements or native SQL queries will lead to deletions being ignored by Hibernate Search’s automatic indexing, then yes, and that’s a documented limitation.
Except automatic indexing reacts to entity change events, while mass indexing does not work with events at all: it just looks at the current status of the database.
I appreciate that you want things to “just work”, but those two mechanisms are different and we can’t apply the same reasoning to both.
Anyway, like I said, it’s planned (see step 3 in that issue’s description), but we need to work on it.
1 Like