HS6: About massIndexer

mason · May 4, 2021, 10:49am

Is it okay to run massIndexer with purgeAllOnStart false to index (indexing in realtime)?
purgeAllOnStart default value is true, so i’m hesitating to use massIndexer.
Should i use massIndexer only for empty index?
or could i use massIndexer for index (indexing in realtime)

yrodiere · May 4, 2021, 11:17am

The mass indexer is designed to initialize the index. It’s very hard on your resources (CPU, DB connections, …), and generally only suited for one-time indexing, for example the first time you deploy your application, or to reindex all your data after mapping changes.

To index entities automatically as they are modified, you should rely on automatic indexing, which is more lightweight and is enabled by default.

If you still want to run the mass indexer on a live application:

It’s generally discouraged to run the mass indexer on a live application, because it takes a lot of resources (threads, DB connections) and the application will not be as responsive as usual.
If you want to disable the initial purge, just know that the mass indexer will not remove outdated documetns from the index. So if the index contains documents for entities that no longer exist in the index, those documents will still be there after the mass indexer finishes.

mason · May 4, 2021, 11:54am

I’m using elasticsearch for search storage purposes not for one-time index, so i already rely on automatic indexing.
And i regulary need work to re-indexing old entity class documents because of changing entity class (add new field to indexing field, add indexing dependency and so on)

Because massIndexer is re-indexing base on database, i think all documents in index will be updated to new documents.

I wonder if massIndexer is suitable in this situation.

The situation for resources is as follows.
One high performance server is ready for only re-indexing purpose.
And i think DB connections is also enough on the basis of first indexing work experience to empty index.

yrodiere · May 4, 2021, 12:07pm

From what I understand, you want hot updates: you update your application, and you want users to still use it while you reindex the data.

There is no built-in feature for that (yet), but you can make it work with a bit of custom code. See the note in this section of the documentation:

Using aliases has a significant advantage over directly targeting the index: it makes full reindexing on a live application possible without downtime, which is useful in particular when automatic indexing is disabled (completely or partially) and you need to fully reindex periodically (for example on a daily basis).

With aliases, you just need to direct the read alias (used by search queries) to an old copy of the index, while the write alias (used by document writes) is redirected to a new copy of the index. Without aliases (in particular with the no-alias layout), this is impossible.

This “zero-downtime” reindexing, which shares some characteristics with “blue/green” deployment, is not currently provided by Hibernate Search itself. However, you can implement it in your application by directly issuing commands to Elasticsearch’s REST APIs. The basic sequence of actions is the following:

Create a new index, myindex-000002.

Switch the write alias, myindex-write, from myindex-000001 to myindex-000002.

Reindex, for example using the mass indexer.

Switch the read alias, myindex-read, from myindex-000001 to myindex-000002.

Delete myindex-000001.

Note this will only work if the Hibernate Search mapping did not change; a zero-downtime upgrade with a changing schema would be considerably more complex. You will find discussions on this topic in HSEARCH-2861 and HSEARCH-3499.

In your case, you want to create the new index with an updated mapping. You will have to do that manually. I’d suggest to just let Hibernate Search generate the mapping and create the indexes in a development environment, read that mapping from Elasticsearch (GET /myIndex/_mapping) extract the mapping to a JSON file, and use that to create the index manually in production (PUT /myIndex/_mapping).

mason · May 4, 2021, 12:59pm

Yes, i’m struggling to support hot updates without any user uncomfortable.

I am worried about users can’t search created or updated data while third process listed above (3. Reindex, for example using the mass indexer) because users watch myindex-000001 but new documents saved to myindex-000002

I think that fundamental solution is indexing to multiple indexes (both myindex-000001 and myindex-000002) when entity class is created or updated.

If massIndexer guarantees that database table’s all data is updated to a new documents without exception, i plan to re-indexing with massIndexer until this fundamental problem is handled.

yrodiere · May 4, 2021, 1:40pm

Yes, the old index will effectively be frozen during the “hot update”, remaining in the state it was in before you started the update.

That would require Hibernate Search to deal with two mappings at the same time: the previous one (the one before your application update) and the new one (the one after your application update). Currently that’s obviously not possible, since your old mapping is no longer present in the application code.

That might be possible one day, in particular once we address [HSEARCH-3683] - Hibernate JIRA and [HSEARCH-3971] - Hibernate JIRA, and then some other ticket to index to two indexes simultaneously. But it will definitely require that you explicitly define both mappings. That won’t be pretty.

I’m sorry, I’m not following.

mason · May 4, 2021, 2:31pm

I just wondering that in my situation(want hot updates), using massIndexer (with purgeAllOnStart false) is suitable (is massIndexer for this purpose originally?, is any expected exception case? and so on).
I just wanted to get some advice because i’m lack of all data indexing experience. That’s all.

yrodiere · May 4, 2021, 2:59pm

I would say the best way to know for sure is to try.

is massIndexer for this purpose originally?

I’m afraid not

is any expected exception case?

I just had a look, and I can say it won’t work with the Lucene backend, because of an optimization that assumes the document is not present in the index when we add it. There is no such optimization for the Elasticsearch backend though, so you should be fine.

As for other problems… as I said, the best way to know for sure is to try.

Topic		Replies	Views
MassIndexer Changes in Hibernate 6 Hibernate Search	2	914	July 13, 2020
In case that purgeAllOnStart is false Hibernate Search	3	501	August 4, 2022
MassIndexer parallel run with working indexes Hibernate Search	5	505	November 24, 2023
MassIndexer not working on latest SNAPSHOT Hibernate Search	11	928	March 13, 2020
Stop indexing everytime i start the application Hibernate Search	1	1321	March 14, 2019

HS6: About massIndexer

Related topics