HS6: About massIndexer

Is it okay to run massIndexer with purgeAllOnStart false to index (indexing in realtime)?
purgeAllOnStart default value is true, so i’m hesitating to use massIndexer.
Should i use massIndexer only for empty index?
or could i use massIndexer for index (indexing in realtime)

The mass indexer is designed to initialize the index. It’s very hard on your resources (CPU, DB connections, …), and generally only suited for one-time indexing, for example the first time you deploy your application, or to reindex all your data after mapping changes.

To index entities automatically as they are modified, you should rely on automatic indexing, which is more lightweight and is enabled by default.

If you still want to run the mass indexer on a live application:

  • It’s generally discouraged to run the mass indexer on a live application, because it takes a lot of resources (threads, DB connections) and the application will not be as responsive as usual.
  • If you want to disable the initial purge, just know that the mass indexer will not remove outdated documetns from the index. So if the index contains documents for entities that no longer exist in the index, those documents will still be there after the mass indexer finishes.

I’m using elasticsearch for search storage purposes not for one-time index, so i already rely on automatic indexing.
And i regulary need work to re-indexing old entity class documents because of changing entity class (add new field to indexing field, add indexing dependency and so on)

Because massIndexer is re-indexing base on database, i think all documents in index will be updated to new documents.

I wonder if massIndexer is suitable in this situation.

The situation for resources is as follows.
One high performance server is ready for only re-indexing purpose.
And i think DB connections is also enough on the basis of first indexing work experience to empty index.

From what I understand, you want hot updates: you update your application, and you want users to still use it while you reindex the data.

There is no built-in feature for that (yet), but you can make it work with a bit of custom code. See the note in this section of the documentation:

In your case, you want to create the new index with an updated mapping. You will have to do that manually. I’d suggest to just let Hibernate Search generate the mapping and create the indexes in a development environment, read that mapping from Elasticsearch (GET /myIndex/_mapping) extract the mapping to a JSON file, and use that to create the index manually in production (PUT /myIndex/_mapping).

1 Like

Yes, i’m struggling to support hot updates without any user uncomfortable.

I am worried about users can’t search created or updated data while third process listed above (3. Reindex, for example using the mass indexer) because users watch myindex-000001 but new documents saved to myindex-000002

I think that fundamental solution is indexing to multiple indexes (both myindex-000001 and myindex-000002) when entity class is created or updated.

If massIndexer guarantees that database table’s all data is updated to a new documents without exception, i plan to re-indexing with massIndexer until this fundamental problem is handled.

Yes, the old index will effectively be frozen during the “hot update”, remaining in the state it was in before you started the update.

That would require Hibernate Search to deal with two mappings at the same time: the previous one (the one before your application update) and the new one (the one after your application update). Currently that’s obviously not possible, since your old mapping is no longer present in the application code.

That might be possible one day, in particular once we address [HSEARCH-3683] - Hibernate JIRA and [HSEARCH-3971] - Hibernate JIRA, and then some other ticket to index to two indexes simultaneously. But it will definitely require that you explicitly define both mappings. That won’t be pretty.

I’m sorry, I’m not following.

1 Like

I just wondering that in my situation(want hot updates), using massIndexer (with purgeAllOnStart false) is suitable (is massIndexer for this purpose originally?, is any expected exception case? and so on).
I just wanted to get some advice because i’m lack of all data indexing experience. That’s all.

I would say the best way to know for sure is to try.

is massIndexer for this purpose originally?

I’m afraid not :slight_smile:

is any expected exception case?

I just had a look, and I can say it won’t work with the Lucene backend, because of an optimization that assumes the document is not present in the index when we add it. There is no such optimization for the Elasticsearch backend though, so you should be fine.

As for other problems… as I said, the best way to know for sure is to try.

1 Like