How to define a common index for multiple entities inheriting from one abstract class?

Hello everyone,

I am facing a challenge regarding the definition of a common index for multiple entities inheriting from a single abstract class, using OpenSearch, Spring Boot version 3.2, and Hibernate Search 7.1.0 Final with programmable mapping. When configuring the mapping using the configure(HibernateOrmMappingConfigurationContext hibernateOrmMappingConfigurationContext) method, I encounter an issue with creating the entityManagerFactory bean. The error indicates that multiple entity types are mapped to one index, which is against the requirements. Each indexed type must be mapped to its own, dedicated index.

An example of the error I receive is:

HSEARCH000588: Multiple entity types mapped to index 'example-index-v1': 'EntityA', 'EntityB'. Each indexed type must be mapped to its own, dedicated index.

Is there a way to configure Hibernate Search to allow different entities inheriting from a common abstract class to share one index? How can I address this issue in the context of Spring Boot, Hibernate ORM, and Hibernate Search 7.1.0 Final?

Thank you in advance for your help.

Hey, thanks for reaching out. I don’t think that it is currently possible to do so.
Maybe you could describe your use case a bit more, i.e. why there’s a constraint of a single index, etc.

The only thing that comes to mind right now is to use a standalone mapper and keep the index up to date manually (i.e. not relying on listener-triggered indexing … ).

Hey @mbekhta,

Thank you for your response. We’re considering consolidating our data into a single index to enhance search efficiency in OpenSearch. Our database model is structured around an abstract entity utilizing the JOINED inheritance strategy. As you might guess, each subclass derived from our base class incorporates unique custom fields. Additionally, we have entities in one-to-one relationships with these subclasses, which also extend from base entities. We believe that a single index approach will significantly expedite our search operations. We have already designed a schema for this index and feel that this configuration would be most effective for our needs.

The database model is centered around a base class named Order. This class is abstract and defines an abstract method getResult() that returns an object of type Result. The Result class acts as a base for 7 different subclasses, each extending Result with specific attributes and behaviors. These subclasses are denoted as ResultSub1 through ResultSub7.

Inheriting from the Order class are 11 distinct subclasses. Each of these subclasses, referred to as SubOrder1 through SubOrder11, overrides the getResult() method to return an instance of one of the Result subclasses. Thus, each SubOrder class is associated with a specific type of Result, although different SubOrder classes may be associated with the same Result subclass.

Regarding your suggestion to use a standalone mapper and manually keep the index up to date, we appreciate the insight. However, it’s crucial for us that the indexing process occurs automatically rather than manually.

Hey, @mbekhta, with the additional information that I’ve provided, is there a way to achieve my goals?

hey @Radol ,

Yeah, I’ve thought about this over the weekend; and couldn’t come up with a good solution. If multiple entities target the same index, then when a user tries something like dropping and creating the schema for a single entity (or any other similar action), then that single index will be dropped and we’d lose the data for other entities, which wouldn’t be obvious from the user action itself.

If multiple entities/indexes need to be searched, then a SearchScope against those entities can be created. In such case, Search is using Search multiple data streams and indices | Elasticsearch Guide [8.13] | Elastic (I couldn’t find the link for the same info on OpenSearch doc pages). Have you tried such an approach?

Maybe @yrodiere would have an idea on this…

Hi,

This capability was present in older versions of Hibernate Search (5) and was stripped off because:

  1. It was work to maintain.
  2. It made configuration more complex: no more 1-1 mapping between entity type and index => some combinations of configuration became invalid.
  3. It made implementation more complex, e.g. index-wide operations such as purge became tricky since we had to make sure we wouldn’t erase data from other entity types in the same index.
  4. It did not, in fact, bring much performance wise.

About that last item in particular, you may already know it, but even when you have a single index, Lucene may (and in most cases, will) split it into multiple segments, which are themselves basically an index that contains a partition of your documents. Lucene has actually optimizations in place that actively take advantage of this structure to deliver better performance than if it was using a single segment.
For very large datasets, the case for a single index is even weaker, since one solution to scale search on very large datasets is actually to use sharding, which basically backs a single index with multiple “shards”, which are themselves Lucene indexes.

I don’t know what causes performance problems in your case exactly, but I’d try to pinpoint the exact problem before trying to shove everything in one index, because this seems unlikely to help. And, considering the work involved, that’s unlikely to be supported in Hibernate Search anytime soon.
Is really search the problem, or is it indexing? If it’s search, are you sure it’s the OpenSearch query taking time to execute, not the loading of entities from the database? If it is OpenSearch queries, does profiling point at specific queries being much slower than others? Etc.

If you just want a consistent schema across indexes (which, depending on what you’re doing currently, could indeed help), know you can get that without using a single index: OpenSearch (and Hibernate Search) allows searching across multiple indexes in a single query, as long as fields with the same name in each index are compatible (i.e. their type and relevant characteristics are identitcal in all targeted indexes).