Indexation Post treatment

Hello everyone,

@Entity
@Indexed
public class MyEntity implement IsIndexable {
  private Date updatedAt;
  private Date indexedAt;
}

public interface IsIndexable {
  public Date getUpdatedAt();
  public Date getIndexedAt();
}

I would like to update the indexedAt field as post-treatment every time my entity is indexed, do you have a generic way to do that ?

What i expect to do :

  1. Update my entity with Hibernate (touch updatedAt)
  2. Automatic Indexation (thx to hibernate search)
  3. Second update after indexation Complete (touch indexedAt)

The idea is to implement an specific interface in my Indexable Entites with two methods, updatedAt & indexedAt in order to track the entity who are not indexed at a time (maybe with a scheduler) so the rule could be “if indexedAt < updatedAt then reindex”

Maybe something like HibernateSearch Interceptors ?

do you have any ideas ? Maybe it could be a new feature ?

thx :slight_smile:

Hello,

Hibernate Search itself treats all entities as read-only, always: it never calls a setter anywhere. So no, there’s no built-in mechanism for this exact solution. I’m not sure I would want to introduce this in Hibernate Search since it could have unexpected ramifications.

That being said, what you’re trying to achieve essentially amounts to building a persisted queue of “reindexing events”, to be processed asynchronously. This is something we’ve wanted to address for a long time (https://hibernate.atlassian.net/browse/HSEARCH-2364), and which will probably get solved when we start working on support for clustered applications (https://hibernate.atlassian.net/browse/HSEARCH-3281). It will happen for sure, but there are more pressing matters at the moment.

Implementing this on the user side will be challenging, mainly because of @IndexedEmbedded: an entity A index-embedding entity B may need to be reindexed when B is modified and A is not, so the condition may be much more complex than just “if indexedAt < updatedAt then reindex”.

On very simple mappings (without @IndexedEmbedded and without any complex, custom bridge), I suppose you could implement this in a relatively simple way. You will need to:

  1. Make sure updatedAt is correctly updated every time the entity is modified; you can take some inspiration from org.hibernate.search.mapper.orm.event.impl.HibernateSearchEventListener to do that: it’s a listener to Hibernate ORM change events.
  2. Periodically reindex using a SearchIndexingPlan as described here and here, updating “indexedAt” on your entities as you reindex them.

You will, however, need to rely on soft deletes in you database in order to detect deleted entities and delete the corresponding documents from the index.

On more complex mappings (with @IndexedEmbedded), you will have to use the same mechanism on every single entity that can possibly be indexed-embedded, even those not annotated with @Indexed. Then you will process them the same way as above, taking care to call indexingPlan.addOrUpdate/indexingPlan.delete even on non-@Indexed entities, at least on those that can possibly be indexed-embedded.

Note that a more efficient approach, if you don’t expect any transient failure when indexing, would be to store “indexedAt” globally instead of per-entity, e,g. in a dedicated utility table. You’ll update it once you’re done reindexing everything. This means you won’t have to write every entity to the database just to reindex them.

Going through your question a second time, you seem to want to enable automatic indexing.

In that case, I do not understand why you would need to detect all entities that are not indexed; supposedly they will all be indexed at all time?

In fact, in my case the automatic indexing mecanism could have failure, so i would like a simple way to identifiy entities who are not indexed at a time.

Ah well, that’s much easier.

There is a thing in Hibernate Search 6 called “AutomaticIndexingSynchronizationStrategy”. You can set your own in a given session by calling searchSession.setAutomaticIndexingSynchronizationStrategy(). See https://docs.jboss.org/hibernate/search/6.0/reference/en-US/html_single/#mapper-orm-indexing-automatic-synchronization for more information.

The interesting bit is that the synchronization strategy is able to set a callback to execute after indexing. That callback will be passed a list of references to all entities that could not be indexed properly.
Maybe you can take advantage of that?

Here is an example implementation:

public final class MyAutomaticIndexingSynchronizationStrategy
		implements AutomaticIndexingSynchronizationStrategy {

	private static final Log log = LoggerFactory.make( Log.class, MethodHandles.lookup() );

	public static final MyAutomaticIndexingSynchronizationStrategy INSTANCE = new MyAutomaticIndexingSynchronizationStrategy();

	private MyAutomaticIndexingSynchronizationStrategy() {
	}

	@Override
	public void apply(AutomaticIndexingSynchronizationConfigurationContext context) {
		// Request indexing to force a commit, but not necessarily a refresh.
		context.documentCommitStrategy( DocumentCommitStrategy.FORCE );
		context.documentRefreshStrategy( DocumentRefreshStrategy.NONE );
		context.indexingFutureHandler( future -> {
			// Wait for the result of indexing, so that we're sure changes were committed.
			SearchIndexingPlanExecutionReport report = future.join();
			for ( EntityReference failingEntity : report.getFailingEntities() ) {
				Class<?> entityClass = failingEntity.getType();
				Object entityId = failingEntity.getId();
				// TODO: do something with this class/ID. Add it to a queue somewhere for later reindexing?
			}
		} );
	}
}

You can’t set it as the “default” strategy at the moment, but you can set it explicitly on a given entity manager or session this way:

Search.session( entityManager ).setAutomaticIndexingSynchronizationStrategy( MyAutomaticIndexingSynchronizationStrategy.INSTANCE );

Note this must be done before any entity is modified, otherwise this won’t have any effect.

Thank you @yrodiere, i will look into that, it’s seems to be a good way to handle my problem.

I was working on this part of the code and got another idea: you can also simply use the “queued” synchronization strategy and define a custom failure handler. Since you don’t seem to care that indexing actually happens immediately, this may reduce latency in your application, on top of solving your problem.

Whenever automatic indexing fails for a given entity, the handle(EntityIndexingFailureContext context) method will get called. Since you’re using the ORM mapper, you can safely cast the entity references (provided as objects) to org.hibernate.search.mapper.orm.common.EntityReference.

Yeah that’s a very good idea, i will implement this today with a feedback here, thank you again @yrodiere

@yrodiere

when i execute the code below with an ElasticSearch cluster offline i got a org.hibernate.search.util.common.SearchException: HSEARCH400588 Error which is great

SearchSession searchSession = Search.session((EntityManager) sessionFactory.getCurrentSession());
SearchIndexingPlan searchWritePlan = searchSession.indexingPlan();
for (Map.Entry<String, List<Long>> entry : batch.entrySet()) {
			for (Long id : entry.getValue()) {
				searchWritePlan.addOrUpdate(rootRepository.get(getClassByName(entry.getKey()), id));
				logger.info("indexation de l'object {}:{}", entry.getKey(), id);
			}
		}
searchWritePlan.execute();

now, if i put on the async mode AND a custom FailureHandler

hibernateProperties.put("hibernate.search.automatic_indexing.synchronization.strategy", "async");
hibernateProperties.put("hibernate.search.background_failure_handler", "com.xxx.[...].HibernateSearchFailureHandler");

Then… i have an empty FailureContext :sob:

So the sequence seems to be like this in async mode :

  • searchSession error is hidden
  • spring close the transaction with the httpResponse
  • when the transaction close the async FailureHandler is triggered but there is no “dirty” object to index at this moment…

Maybe it’s happening only in my use case because i use two µServices, the first one is not connected to ElasticSearch because it’s too old (Hibernate 3) and i have an Hibernate Interceptors who trigger call to the up-to-date Backend (Hibernate 5) who triggers the final indexation
Both of the µServices share the same business model.
In some business case the new backend is just there to read and index.
In others, it update and trigger the automatic indexation.

here a quick draw to help you understand.

I would like to share my Failure Handler for both case, automatic and manual updates from my green µServices.

On the other hand, do you know a way to inject a Spring Service into the FailureHandler ?
I would like to queue my failed entities into a database.

Thank you :wink:

Okay… there’s a lot to unpack here.

This looks like a bug. You shouldn’t end up in this situation unless Hibernate Search failed to queue the indexing works, or failed to describe the failure, … in short, an internal error.

When this happens, what is the exception? Can you call context.getThrowable().printStackTrace() and copy the stack trace here?

The description of the failure (the list of entity references) is populated based on the feedback from the backend (which documents failed to be indexed), not based on the “dirty” objects in the Hibernate ORM session. So your use case should work fine regardless.

Just define your failure handler as a bean, give it a name, and use that bean name in your configuration properties. Hibernate Search should get the bean from Spring, so you’ll be able to use @Autowired.

Here’s the culprit:

Hibernate Search failed to convert the document ID to an entity ID, so it fell back to returning just the exception, without the entity ID.

I’ll change the behavior so that we just skip the document IDs that we cannot convert (and report a failure somewhere else): https://hibernate.atlassian.net/browse/HSEARCH-3851

Regarding the failure to convert the ID, I’m a bit confused… did you insert documents into elasticsearch manually, with IDs that are not longs as they should be? Or maybe you use multi-tenancy and you indexed some documents without a tenant ID?

EDIT: Ok, got it. Your tenant ID is the empty string, and there is a bug: https://hibernate.atlassian.net/browse/HSEARCH-3852

1 Like

My Hits look like this in ElasticSearch, the underscored ID seems to be technical field.
The “__HSEARCH_id” look good

"hits" : [
      {
        "_index" : "com.xxxx.business.clientphysique",
        "_type" : "_doc",
        "_id" : "_163",
        "_score" : null,
        "_source" : {
          "uuid" : "3585eed9-09b3-496d-ba6a-63f2d6a3055b",
          "acheve" : false,
          "modifieLe" : "2009-09-21T17:56:05.000000000Z",
          "reprise" : true,
          "supprimeLe" : null,
          "searchReferences" : "CLP/ACC",
          "search_reference_sort" : "CLP/ACC",
          "userIds" : [
            2,
            5
          ],
          "civilite" : "MONSIEUR",
          "dateNe" : "1959-07-20T23:00:00.000000000Z",
          "nom" : "xxxxx",
          "nomUsuel" : "xxxxxxx",
          "prenom" : "Gilles, Jacques, Marcel",
          "prenomUsuel" : "Gilles",
          "searchAddress" : "xxxxx",
          "search_address_sort" : "xxxxxx",
          "searchExtendedLabel" : "xxxx",
          "search_extended_label_sort" : "xxx",
          "searchLabel" : "xxxxx",
          "search_label_sort" : "xxxxx",
          "__HSEARCH_id" : "163",
          "__HSEARCH_tenantId" : ""
        },
        "sort" : [
          "xxxxxxxxx",
          "xxxxxx"
        ]
      }

You don’t need your own references.

Just pass the name of the bean:

hibernateProperties.put("hibernate.search.background_failure_handler", "myReferenceName");

FWIW, there’s a detailed documentation here: https://docs.jboss.org/hibernate/search/6.0/reference/en-US/html_single/#configuration-bean

Yeah, i tried that too :stuck_out_tongue:
here what i got before, then i switch to the BeanReference

Caused by: org.hibernate.search.engine.environment.classpath.spi.ClassLoadingException: HSEARCH000530: Unable to load class [myReferenceName]

and if i specify the full qualified name of the class, HSearch Instanciate the class himself

This means either the bean wasn’t found by Spring, or the Spring integration in Hibernate ORM is not active.

How do you start Hibernate ORM? Access to Spring from Hibernate ORM (and thus from Hibernate Search) requires specific bits that are included in Spring by default, but if you customized how you start ORM, well…

If you do instantiate Hibernate ORM in a custom way and need a pointer, here is the class that enables Hibernate ORM to talk to Spring. Not sure where it’s used exactly.

I use a LocalSessionFactoryBean in a @Configuration annotated class

    @Bean
    @Primary
    public LocalSessionFactoryBean sessionFactory() {
        LocalSessionFactoryBean sessionFactory = new LocalSessionFactoryBean();
        sessionFactory.setPackagesToScan("xxxxx", "xxxx.business");
        sessionFactory.setHibernateProperties(hibernateProperties());
        sessionFactory.setPhysicalNamingStrategy(new ImprovedNamingStrategy());
        sessionFactory.setAnnotatedPackages("xxxxx.business");
        sessionFactory.setMultiTenantConnectionProvider(multiTenantConnectionProvider());
        CurrentTenantIdentifierResolver currentTenantIdentifierResolver = new CurrentTenantIdentifierResolverImpl(routingService);
        sessionFactory.setCurrentTenantIdentifierResolver(currentTenantIdentifierResolver);
    }

Ok ! I figure out the problem with some breakpoints inside SpringBeanContainer ^^

Here the explanation from my perspective :

when you end up here at the HibernateSearch failure handler creation

My bean is correctly referenced inside the beanFactory
image

But if you evaluate partially the line who get the bean you get the true UnsatisfiedDependencyException:

In fact, at this time the RootRepository i tried to Inject into my FailureHandler wasn’t ready but the Exception was hidden by the fallbackProducer included who seems by default, try to cast the declared string in the background_failure_handler configuration into a class

HSEARCH000529: Unable to find org.hibernate.search.engine.reporting.FailureHandler implementation class: toto

My mistake was to not be in debug log level but this kind of Exception is very hidden ^^

Anyway, i solved it by adding @Lazy

@Component("toto")
public class HibernateSearchFailureHandler implements FailureHandler {
	@Autowired
	@Lazy
	private IRootRepository rootRepository;

Thanks for the help :wink:

That’s right… unfortunately we can’t do much about it, given the current integration of Spring/CDI into ORM.

One solution would be to disable the fallback completely, and always delegate to Spring/CDI when it’s available. Not sure that would be practical, though.

Another solution would be to require String references to classes to be formatted differently from String references to beans. For example “classpath:…” vs. “bean:…”. Then we could fail properly in your case, instead of trying to fall back to reflection. But that wouldn’t solve the problem when a class is passed as a property value (are we expected to use reflection, or Spring/CDI)?

I think this could still be better than now :stuck_out_tongue: Anyway, i will switch to debug next time :rofl:

@Alexis_Cucumel I just merged a fix for https://hibernate.atlassian.net/browse/HSEARCH-3851 and https://hibernate.atlassian.net/browse/HSEARCH-3852, the issues that prevented you from getting the correct list of entities in the failure handler. A new snapshot version including these fixes will be published automatically in about 30 minutes. Please let me know if something is still amiss!