MassIndexer : A lot of database records are not indexed

Hi searchers,

We received at work a new dump of database. This one is 2 gigabytes big. After restoring this dump, I attempted to index just one ORM entity : ActePrive.
As a result, below the number of document created in my elastic search backend server :

GET {{elserver}}/acteprive-read,actedocument-read/_count

output :

{
    "count": 51539,
    "_shards": {
        "total": 2,
        "successful": 2,
        "skipped": 0,
        "failed": 0
    }
}

GET {{elserver}}/acteprive-read/_count

output :

{
    "count": 3,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    }
}

So I’ve got 51539 documents indexed in all. I immediately check through a sql query :

-- Pure ActePrive
select count(ap.id) from acte_prive ap join acte a on ap.id = a.id where ap.id not in (select ad.id from acte_document ad join acte ac on ac.id = ad.id)
union
-- Pure ActeDocument
select count(ap.id) from acte_prive ap join acte a on ap.id = a.id where ap.id  in (select ad.id from acte_document ad join acte ac on ac.id = ad.id);

output : 

262575
242293

Here the JPA model ordered by inheritance chain (ActeDocument is the innermost class):

@Entity
@PrimaryKeyJoinColumn(name = "ID")
@Inheritance(strategy = InheritanceType.JOINED)
@Scope(proxyMode = ScopedProxyMode.TARGET_CLASS)
public class ActeDocument extends BasicActeDocument {
}

@MappedSuperclass
public abstract class BasicActeDocument extends ActePrive {
}

@XmlRootElement(name = "ActePrive")
@Entity
@PrimaryKeyJoinColumn(name = "ID")
@Inheritance(strategy = InheritanceType.JOINED)
@Scope(proxyMode = ScopedProxyMode.TARGET_CLASS)
public class ActePrive extends BasicActePrive implements ... {

@MappedSuperclass
public abstract class BasicActePrive extends Acte implements ... {
}

@Entity
@Inheritance(strategy = InheritanceType.JOINED)
@XmlSeeAlso({
		ActePrive.class,
		ActePublic.class
})
public class Acte extends BasicActe implements ... {
}

@MappedSuperclass
public abstract class BasicActe extends BusinessObject implements ...{
}

public abstract class BusinessObject implements Serializable {
}

I precise that only ActePrive is a @Indexed entity. Acte entity is only embedded for indexation (ActePrive indexes some data of Acte).
Over almost 500 000 records, only ten times less are indexed !
Note : I turned on <logger name="org.hibernate.search" level="trace"/> and no errors seems to be displayed during indexation process.

Any idea why this difference ? Thx

Hello,

How do you index these entities? Can you show me the code (with trace enabled for Hibernate Search)?

If you’re using the mass indexer, can you show me the logs?

Do you use a routing bridge, by any chance?

Which version of Hibernate Search / Hibernate ORM are you using?

We recently migrate all the annotations to HSearch fluent API indexation. At final the massIndexer is run as follows (with performance adjustments) :

massIndexer
				.idFetchSize(1)
				.batchSizeToLoadObjects(50)
				.threadsToLoadObjects(16)
				.purgeAllOnStart(false)
				.startAndWait();

Before running it, I delete all the indexes of course :

DELETE {{elserver}}/_all

Below a code snippet of ActePrive indexation mapping :

@Component("signatureOrmSearchMappingConfigurer")
public class SignatureOrmSearchMappingConfigurer implements HibernateOrmSearchMappingConfigurer {
private final ActePriveConfigurer actePriveConfigurer;
private final ActeConfigurer acteConfigurer;
....
@Override
 public void configure(HibernateOrmMappingConfigurationContext context) {
acteConfigurer.configure(context);
actePriveConfigurer.configure(context);
}

}

@Component
public class ActePriveConfigurer implements ... {
private final DateValueBinder dateValueBinder;
    private final ActeRoutingBinder acteRoutingBinder;


    public ActePriveConfigurer(DateValueBinder dateValueBinder, ActeRoutingBinder acteRoutingBinder) {
        this.dateValueBinder = dateValueBinder;
        this.acteRoutingBinder = acteRoutingBinder;
    }

@Override
    public void configure(HibernateOrmMappingConfigurationContext context) {
 ProgrammaticMappingConfigurationContext mapping = context.programmaticMapping();
        TypeMappingStep actePriveMapping = mapping.type(ActePrive.class);

//all fields indexes creation here

//DISABLING RoutingBinder to not have any difference with DB
actePriveMapping.indexed();//.routingBinder(acteRoutingBinder);
}
}

@Component
public class ActeConfigurer implements ... {

public ActeConfigurer(DateValueBinder dateValueBinder) {
        this.dateValueBinder = dateValueBinder;
    }

    @Override
    public void configure(HibernateOrmMappingConfigurationContext context) {

        ProgrammaticMappingConfigurationContext mapping = context.programmaticMapping();
        TypeMappingStep acteMapping = mapping.type(Acte.class);

//Declare field indexes here only, no acteMapping.indexed() is done
}
}

Hibernate Search / Hibernate ORM => 6.1.1.Final / 5.5.9.Final

Here the log file with trace mode enabled : Upload Files | Free File Upload and Transfer Up To 10 GB

Important note : Usual dumps used for tests don’t have this difference (ActePrive is well indexed).

This is really low and could result in poor performance. That probably doesn’t cause your problem, though, so let’s move on.

Do you re-create the indexes, though? If you don’t, Elasticsearch may create the indexes automatically and will try to guess the mapping, which it generally guesses wrong.

You may want to use .dropAndCreateSchemaOnStart(true) on the mass indexer, instead of deleting indexes manually. This will re-create the indexes as necessary.

Once again, that probably isn’t related to your problem, though.

I see this in your logs:

2022-04-08 11:15:07,099  c.f.signature.services.business.IndexBS  [E] Exception during reindex crpcen 
java.lang.NoSuchMethodError: org.hibernate.search.mapper.orm.common.impl.EntityReferenceImpl.<init>(Lorg/hibernate/search/mapper/pojo/model/spi/PojoRawTypeIdentifier;Ljava/lang/String;Ljava/lang/Object;)V
        at org.hibernate.search.mapper.orm.mapping.impl.HibernateOrmMapping.createEntityReference(HibernateOrmMapping.java:214)
        at org.hibernate.search.mapper.orm.mapping.impl.HibernateOrmMapping.createEntityReference(HibernateOrmMapping.java:77)
        at org.hibernate.search.engine.backend.common.spi.EntityReferenceFactory.safeCreateEntityReference(EntityReferenceFactory.java:34)
        at org.hibernate.search.mapper.pojo.work.impl.PojoDocumentContributor.contribute(PojoDocumentContributor.java:54)
        at org.hibernate.search.backend.elasticsearch.index.impl.ElasticsearchIndexManagerImpl.createDocument(ElasticsearchIndexManagerImpl.java:164)
        at org.hibernate.search.backend.elasticsearch.work.execution.impl.ElasticsearchIndexIndexer.index(ElasticsearchIndexIndexer.java:79)
        at org.hibernate.search.backend.elasticsearch.work.execution.impl.ElasticsearchIndexIndexer.add(ElasticsearchIndexIndexer.java:44)
        at org.hibernate.search.mapper.pojo.work.impl.PojoTypeIndexer.add(PojoTypeIndexer.java:66)
        at org.hibernate.search.mapper.pojo.work.impl.PojoIndexerImpl.add(PojoIndexerImpl.java:47)
        at org.hibernate.search.mapper.pojo.massindexing.impl.PojoMassIndexingEntityLoadingRunnable$IndexingBatch.startIndexing(PojoMassIndexingEntityLoadingRunnable.java:208)
        at org.hibernate.search.mapper.pojo.massindexing.impl.PojoMassIndexingEntityLoadingRunnable$IndexingBatch.startIndexingList(PojoMassIndexingEntityLoadingRunnable.java:155)
        at org.hibernate.search.mapper.pojo.massindexing.impl.PojoMassIndexingEntityLoadingRunnable$LoadingContext$1.accept(PojoMassIndexingEntityLoadingRunnable.java:125)
        at org.hibernate.search.mapper.orm.loading.impl.HibernateOrmMassEntityLoader.load(HibernateOrmMassEntityLoader.java:49)
        at org.hibernate.search.mapper.pojo.massindexing.impl.PojoMassIndexingEntityLoadingRunnable.runWithFailureHandler(PojoMassIndexingEntityLoadingRunnable.java:60)
        at org.hibernate.search.mapper.pojo.massindexing.impl.PojoMassIndexingFailureHandledRunnable.run(PojoMassIndexingFailureHandledRunnable.java:32)

Most likely you’re not using the same version of Hibernate Search for all your Hibernate Search dependencies. Make sure your dependencies are consistent.

Maybe, But my DB is MariaDB and I followed recommendations from hibernate search reference saying a tip about MySQL databases :

A note to MySQL users: the MassIndexer uses forward only scrollable results to iterate on the primary keys to be loaded, but MySQL’s JDBC driver will preload all values in memory.

To avoid this “optimization” set the idFetchSize parameter to Integer.MIN_VALUE

Thanks for the error in the log, it was old trace of a monkey patch (they love monkey patch at my work). After correction I managed to index about 246000 ActeDocument but still no ActePrive. I then look again in logs, and still indexing exceptions appeared like :

2022-04-11 15:37:24,326  o.h.e.internal.DefaultLoadEventListener allegoria [I] HHH000327: Error performing load command
org.hibernate.InstantiationException: Cannot instantiate abstract class or interface:  : com.allegoria.notariat.business.MentionPublication
	at org.hibernate.tuple.PojoInstantiator.instantiate(PojoInstantiator.java:79)
	at org.hibernate.tuple.PojoInstantiator.instantiate(PojoInstantiator.java:105)
	at org.hibernate.tuple.entity.AbstractEntityTuplizer.instantiate(AbstractEntityTuplizer.java:705)
	at org.hibernate.persister.entity.AbstractEntityPersister.instantiate(AbstractEntityPersister.java:5285)
	at org.hibernate.internal.SessionImpl.instantiate(SessionImpl.java:1627)
	at org.hibernate.internal.SessionImpl.instantiate(SessionImpl.java:1611)

Until here, I managed to index about 1000 ActePrive, and then it finishes indexation. I understand the errors, but normally, the process should not stop for whatever reason the indexation of a record failed, isn’t it ? I have a class for it :

public class HibernateSearchMassIndexingFailureHandler implements MassIndexingFailureHandler {

public void handle(MassIndexingEntityFailureContext context) {
        try {
            context.entityReferences().stream()
                    .map(object -> (EntityReference) object)
                    // on filtre les enregistrements déjà marqués à re-synchroniser
                    .forEach(entityReference -> {
                        routingService.setOutOfWebContextCrpcen(entityReference.tenant());
                        indexErrorBS.storeIndexError(entityReference);
                    });
            logger.warn("Erreur entité dans l'indexation de masse " + context.failingOperation().toString(), context.throwable());
        } finally {
            // Oubli du crpcen du scheduling
            routingService.setOutOfWebContextCrpcen(null);
        }
    }
}

Only log warn message in the FailureHandler.

For indexing it’s true, but in this case it’s not indexing that failed, it’s loading the entity from the database. You have a serious problem in your Hibernate ORM mapping and should solve this.

I suppose we could also call the mass indexing failure handler for exceptions thrown while loading entities, and continue indexing, because there might also be exceptions caused by temporary failure to communicate with the database. Though that would probably require creating a new Session… Maybe you could open a ticket on JIRA?