Some data is not indexed. The error name is HSEARCH400007

If i run creating indexing of total of 4 million data. but It will not be able to create index about 2,000.
(The number of failures is not regular.)
This is simple code to make indexing with es.

FullTextSession fullTextSession = Search.getFullTextSession(session);
SearchFactory searchFactory = Search.getFullTextSession(session).getSearchFactory();
fullTextSession
           .createIndexer(.######.class)
           .batchSizeToLoadObjects(1000)
//                    .cacheMode(CacheMode.NORMAL)
           .typesToIndexInParallel(1)
//                    .threadsToLoadObjects(5)
//                    .transactionTimeout(3000)
//                    .idFetchSize(200)
           .progressMonitor(new SimpleIndexingProgressMonitor() {
               ....
           })
           .startAndWait();
          

It not be indexed as below log occurs.
How do i get an failed indexed id list? And is it possible to automatically create a failed indexing?


2019-10-29 14:48:12.898 ERROR 23400 --- [Hibernate Search: Elasticsearch transport thread-2] o.h.s.exception.impl.LogErrorHandler     : HSEARCH000058: Exception occurred org.hibernate.search.exception.SearchException: HSEARCH400007: Elasticsearch request failed.
Request: POST /_bulk with parameters {refresh=false}
Response: null
Subsequent failures:
        Entity com....entity.######  Id 2248733  Work Type  org.hibernate.search.backend.AddLuceneWork
org.hibernate.search.exception.SearchException: HSEARCH400007: Elasticsearch request failed.
Request: POST /_bulk with parameters {refresh=false}
Response: null
        at org.hibernate.search.elasticsearch.work.impl.BulkWork.lambda$execute$1(BulkWork.java:77)
        at org.hibernate.search.util.impl.Futures.lambda$handler$1(Futures.java:57)
        at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
        at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at org.hibernate.search.elasticsearch.client.impl.DefaultElasticsearchClient$1.onFailure(DefaultElasticsearchClient.java:123)
        at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onDefinitiveFailure(RestClient.java:844)
        at org.elasticsearch.client.RestClient$1.retryIfPossible(RestClient.java:582)
        at org.elasticsearch.client.RestClient$1.failed(RestClient.java:561)
        at org.apache.http.concurrent.BasicFuture.failed(BasicFuture.java:137)
        at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.executionFailed(DefaultClientExchangeHandlerImpl.java:101)
        at org.apache.http.impl.nio.client.AbstractClientExchangeHandler.failed(AbstractClientExchangeHandler.java:426)
        at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.endOfInput(HttpAsyncRequestExecutor.java:356)
        at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:261)
        at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
        at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
        at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:121)
        at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
        at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
        at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.http.ConnectionClosedException: Connection is closed
        ... 12 common frames omitted

I am using springboot 2.1.5.RELEASE version and these hibernate dependencies

    <dependency>
      <groupId>org.hibernate</groupId>
      <artifactId>hibernate-core</artifactId>
      <version>5.4.1.Final</version>
    </dependency>
    <dependency>
      <groupId>org.hibernate</groupId>
      <artifactId>hibernate-search-orm</artifactId>
      <version>5.11.3.Final</version>
    </dependency>
    <dependency>
      <groupId>org.hibernate</groupId>
      <artifactId>hibernate-search-elasticsearch</artifactId>
      <version>5.11.3.Final</version>
    </dependency>

Why does this log happen?
I await your favorable reply.

Implement a custom ErrorHandler, and set the hibernate.search.error_handler configuration property to the fully qualified class name of your implementation.
The handle method will get an ErrorContext in parameter, whose getFailingOperations and getOperationAtFault methods return objects that expose a getId method. These are the IDs of the documents that couldn’t be indexed.
See Hibernate Search 5.11.12.Final: Reference Guide

I’m not sure I understand the question:

  • Are you asking how to reproduce the problem in an automated test? If so, I don’t know.
  • Are you asking how to throw an exception when at least one entity failed to index? If so, there’s no such built-in feature in Hibernate Search 5. Hibernate Search 6.0.0.Beta2 does that by default, and that could be an option if the cost of migrating your whole project to the new APIs is not too high (see here for more info).
    In Hibernate Search 5, you could implement the ErrorHandler mentioned above, and maintain an error count. If you notice that the count increased between the start of massindexing and the end, you’ll know that an error happened.

From the last line in your stack trace, the connection to Elasticsearch was closed while indexing. Probably a network problem. Or an Elasticsearch node was restarted during mass indexing, but last time I checked the REST client is supposed to send the request to another node when that happens.

You should investigate why the connection gets closed. I know some people have a router that closes long-running connections, so this might be something like that. If that’s the case, this might provide you with a solution, but the error wasn’t exactly the same…