HS 6 index prefrence

Do we have anything which tell HS to do indexing of a entity first and later it will do indexing for other entities?
I mean to say sequence maintain while doing the bulk index.

Also Do we have something to do a parallel indexing for the entities as it took like 8 hours for finishing indexing and In my opinion it is doing a sequential indexing.
current implementation is like below,

public void reindexData() {
CompletableFuture.runAsync(() → {

        synchronized (INDEXING) {
            if (INDEXING.get()) {
                log.warn("Still indexing : ");
                return;
            }
            INDEXING.set(Boolean.TRUE);
        }
        try {
            log.info("Indexing started : {}", new Date());
            SearchSession searchSession = org.hibernate.search.mapper.orm.Search.session(managerFactory.createEntityManager());
            SearchMapping searchMapping = org.hibernate.search.mapper.orm.Search.mapping(managerFactory);
            searchSession.schemaManager().dropAndCreate();
            searchSession.massIndexer().threadsToLoadObjects(6).startAndWait();
        } catch (InterruptedException ie) {
            log.info("Interrupted exception : {}", ie.getMessage());
            Thread.currentThread().interrupt();
        } finally {
            synchronized (INDEXING) {
                log.info("Indexing finished : {}", new Date());
                INDEXING.set(Boolean.FALSE);
            }
        }

    });
}

I don’t understand what you want. I’ll need more context. An example, maybe?

See Threads and JDBC connections

You’re entilted to your opinion. But if you look at the source code, you’ll see that when setting threadsToLoadObjets to 6, you’ll get 6 threads, loading and processing entities in parallel. As I mentioned above, see Threads and JDBC connections for advice on how to customize this more finely.

Similarly, indexing in the backend is done in parallel, and is configurable (here for Lucene, here for Elasticsearch).

The only part that is sequential is the loading of entity identifiers. That part will, indeed, load identifiers in sequence. But hopefully your IDs are light and fast to load, and thus loading them shouldn’t be the bottleneck. Also, if you index multiple types in parallel, then their respective IDs will also be loaded in parallel.

I would recommend looking into optimizing the loading of entities, as that’s most likely what is slowing down the process.

  • Ensure your connection pool has enough connections to allow the level of parallelism you want (in your example, at least 7).
  • Experiment with some of the mass indexer parameters, e.g. idFetchSize or batchSizeToLoadObjects. Sometimes a larger batch size is better, sometimes a smaller batch size is better. It depends on your setup.
  • Be sure to set idFetchSize to Integer#MIN_VALUE if you’re using MySQL, otherwise their JDBC driver will pre-load all identifiers in memory, which can lead to disastrous performance. That’s specific to MySQL though; other JDBC drivers behave correctly.
  • Look at the queries executed during mass indexing (when loading entities). Sometimes all it takes is a database index on a foreign key column to speed up query execution significantly.
  • Look at this advice about loading entities. In particular the part about leveraging batch fetching.

Also, if you’re using Elasticsearch, and you notice that Elasticsearch itself is the bottleneck, you might want to consider using multiple Elasticsearch nodes on independent hardware, and setting the number of replicas to 0 while mass indexing.