Query across multiple Indexes (Sub Classes)

With Hibernate Search 5 I had no problem to search for an Entity because all sub Classes of the queried Entity were stored in the same index. Now with Hibernate Search 6 I face the problem, that a query like: title:mytitle AND description:mydescription not work anymore, if they are not stored in same index.
(It’s not a problem if I use OR)
Is there a way to join all Indexes, so I can still query across multiple Indexes with AND?

You can just pass multiple types to the search() method to target multiple types. Having multiple indexes does not change much for you (and, to be honest, it doesn’t change much for Lucene either):

SearchResult<Person> result = searchSession.search( Arrays.asList( 
                Entity1.class, Entity2.class
        ) )
        .where( f -> f.bool()
                .must( f.match().field( "title" ).matching( "mytitle" ) )
                .must( f.match().field( "description" ).matching( "mydescription" ) ) )
        .fetch( 20 ); 

See this section of the documentation.

Judging from how you talk about OR behaving differently from AND, I suspect you’re building your predicate differently, so the problem may be elsewhere? How do you build your predicate?

Ty for the fast answer!
I want to query all indexed Fields of an indexed “parent entity”, so that a User can enter keyword or a Lucene Query.
I am using the Lucene Multifieldparser (The user should be able to enter Lucene Queries) to enable search across all fields and it’s working fine if I search only for a keyword or if I use a Query Like: mytitle OR myDescription.

If it helps. The Code:

org.apache.lucene.queryparser.classic.MultiFieldQueryParser parser =
			    new MultiFieldQueryParser(new String[]{"string","givenName",
			    		"sureName","country","zip","addressLine","legalName",
			    		"id","identifier","mimeType","checkSum","algorithm",
			    		"size","language"}, analyzer.get());
		parser.setDefaultOperator(QueryParser.OR_OPERATOR);
		SearchResult<MyUntypedData> searchResult = null;
			try {
				org.apache.lucene.search.Query luceneQuery = parser.parse(keyword);

				searchResult = ftSession.search(MyUntypedData.class)
				        .extension( LuceneExtension.get() ) 
				        .where( f -> f.fromLuceneQuery( 
				                luceneQuery ))
				        .fetch(200);

What’s analyzer.get() here? Where does the analyzer come from? Because that’s the crux of the problem.

Preferably, I’d suggest moving to the simple-query-string predicate, which does what you want but handles analyzers correctly (and without you needing to do anything):

		String[] fields = new String[]{"string","givenName",
			    		"sureName","country","zip","addressLine","legalName",
			    		"id","identifier","mimeType","checkSum","algorithm",
			    		"size","language"};
		SearchResult<MyUntypedData> searchResult = null;
			try {
				searchResult = ftSession.search(MyUntypedData.class)
				        .where( f -> f.simpleQueryString().fields(fields).matching(keyword)
			                       .defaultOperator( BooleanOperator.OR ) ) // Alternatively, BooleanOperator.AND
				        .fetch(200);

See this section of the documentation.

I retrieve the Analyzer like this:

Optional<? extends Analyzer> analyzer = luceneBackend.analyzer( "default" );

Every field is with analyzer=“default” annotated.

I will try your Suggestion. :slight_smile:

Ok, if every field uses the default analyzer, this should work indeed… Maybe the problem was simply that you were targeting a single class (MyUntypedData) instead of both classes?

I.e. replace this:

				searchResult = ftSession.search(MyUntypedData.class)

with this:

				searchResult = ftSession.search(Arrays.asList(MyUntypedData.class, MyOtherType.class))

Note that the simpleQueryString syntax may not match what you’re used to exactly. Feel free to pre-process the query string as necessary (replace AND with +, …).

Okay, with your suggestion and an unprocessed Lucene Query String x AND y I get Results but it looks like the Resultset of an x OR y Lucene Query. If I process the Query -> +x +y I get no results.

The other solution was to target multiple Classes but that didn’t help either. Do you have any other idea what the problem could be?

  1. Yes, you need to pre-process the query to replace AND/OR with the correct symbols (+/|). Note that + is not a prefix operator here, but an infix operator. So you’d have x + y, not +x +y. Which should help with pre-processing the query, by the way :slight_smile:
  2. My example was explicitly defining the default operator as OR with .defaultOperator(BooleanOperator.OR). Did you change that to .defaultOperator(BooleanOperator.AND)?

I don’t have enough information to help further. I’d need a dataset in particular. Maybe if you created a reproducer? You can fork our test case templates and work on this template.

With correct pre-processing it’s still not working unfortunately. With BooleanOpeartor.AND it’s the same.

I never created a reproducer but I will look into it.

Thx for your help and fast replies. :slight_smile:

Greetings, I created a reproducer and made a Pull Request. I hope it gives you enough details for better understanding.

Hello Eric,

I had a look, and as far as I can tell, you’re getting the expected results. There’s nothing wrong here.

When you write this:

				List<UntypedDataEntity> hits = searchSession.search( UntypedDataEntity.class )
						.where(f -> f.bool()
				        		.must(f.match().fields( "subject" ).matching( "Hibernate"))
				        		.must(f.match().fields( "name" ).matching( "John")))
				        .fetchHits(20);

… you’re asking Hibernate Search to find an entity whose subject field is equal to Hibernate and whose name field is equal to John.

There is no such entity in your dataset: you have one entity whose subject field is equal to Hibernate, and another whose name field is equal to John, but they are distinct entities. There is no single entity whose subject field is equal to Hibernate and whose name field is equal to John.

To get a match, you need an “OR” operator, which can be achieved with should clauses:

				List<UntypedDataEntity> hits = searchSession.search( UntypedDataEntity.class )
						.where(f -> f.bool()
				        		.should(f.match().fields( "subject" ).matching( "Hibernate"))
				        		.should(f.match().fields( "name" ).matching( "John")))
				        .fetchHits(20);

I really can’t imagine how you could have had a different result with Hibernate Search 5 with this setup. Maybe you missed something when migrating the code?

Okay, I was afraid that this is the expected result. I don’t think that I missed something when migrating but my “testcase” that I used with Hibernate Search 5 was probably wrong. My Understanding of the search was wrong.

I will shortly explain why I want to be able to search for distinct Entities.

The User searches for Meta Data Objects. And it’s important that he can search for
fields that are related (implicit relation because they are simply stored in the same MetaData Object, but this Object itself is not indexed).
So if I search for x AND y I need to find the UntypedData Entity, that belong to the MetaData Object, that originally held distinct Entities with x and y.
In the application I than map the untyped Data Object with ID’s to it’s MetaData Object.

Is there someway to achieve this or do I have to store some kind of Collection Object, that holds all searchable fields?

I suppose you want to know which “Data Object” matched exactly, and that’s why you’re not just indexing the “MetaData Object”? Because if you queried the “Metadata Object”, you wouldn’t know which of the “Data Object” matched?

And I suppose that, when matching a “Data Object”, you want to ignore fields that are not available on this “Data Object”? The default behavior of Hibernate Search (and Lucene, and Elasticsearch for that matter) is to consider that fields that don’t exist in a particular index will never match. You want them to always match, instead?

And I suppose you have more than just one field per entity type, so using an “OR” operator like I suggested above wouldn’t work?

If I understand correctly, then I think the only way to do what you want is to build one query per type of “Data Object”:

Add a “type” field to your Data Object classes:

public class UntypedDataEntity {

     // ... existing code ...

    @Transient
    @KeywordField
    @IndexingDependency(reindexOnUpdate = ReindexOnUpdate.NO) // The value of this field cannot change
    public String getType() {
        return Hibernate.getClass( this ).getSimpleName();
    }

}

Then build one predicate per entity type:

				List<UntypedDataEntity> hits = searchSession.search( UntypedDataEntity.class )
						.where(f -> f.bool()
							.should(f.bool()
									// SubjectEntity
									.must(match().field("type").matching("SubjectEntity"))
									.must(match().field("subject").matching("Hibernate"))
							)
							.should(f.bool()
									// PersonEntity
									.must(match().field("type").matching("PersonEntity"))
									.must(match().field("name").matching("John"))
							)
						)
						.fetchHits(20);

If you need the user to be able to provide its own query string, then I’m afraid you’ll have to implement a parser in order to build a different predicate for each entity type, ignoring the fields that don’t exist for each entity type.

Thx for your reply.

The splitting of the Meta Data into multiple Entities is mainly for oop reasons.
And it’s only important to know which Entity matched, because of it’s ID that I need to find the correct Meta Data Object.

Yes, I need to ignore every field, that didnt’t match and yes not existing fields should always match. Can I configure this behaviour?

Some Entity Types only have one field + id (LanguageEntity for example, that only holds a Locale Object) and some have multiple Fields like a PersonEntity.
The OR Operator does work, but the User should also be able to search for exact matches. :frowning:

Have you considered adding @Indexed to MetaData, and adding @IndexedEmbedded to Metadata.data? Then you should be able to get a list of matching MetaData directly.

No, this is not configurable.

Well in that case, what doesn’t work? If they want approximate matches, they use “OR” and everything works, and if they want exact matches, they use “AND”, and then obviously entities that do not match one of their criteria won’t be returned.

Yes I considered it. The Problem there is, that the Entites are no fields of the MetaData Object. Instead it holds a HashpMap that stores the different Entities. And I don’t know how I should Index this Map or if it’s possible at all.

The problem here is what you already mentioned. The User should be able to search across distinct entities with exact matching fields, so he retrieves a matching MetaData Object in the end. But the user shouldnt need to consider every criteria.

It is possible: https://docs.jboss.org/hibernate/search/6.0/reference/en-US/html_single/#mapper-orm-indexedembedded

The only problem I can see here is that your have a map of UntypedDataEntity, which does not expose all the index fields you need. You will probably need to declare abstract getters in UntypedDataEntity for all your indexed properties.

I don’t understand what you mean by “the user shouldnt need to consider every criteria”. What forces them to “consider every criteria” your the current solution?

Maybe you could explain what’s the user input? There’s clearly a missing piece of information here.

As far as I understand one of your other replies:

This Behaviour prevents the User to find a MetaData Object, because not every field exist in every Index? Btw, the User input is a simple keyword, a phrase or a Lucene Query. So for example “Hibernate Search” AND John

It really depends how you build your predicates…

  • If you ask Hibernate Search to find all entities where subject matches Hibernate AND the name field matches `John:
      			List<UntypedDataEntity> hits = searchSession.search( UntypedDataEntity.class )
      					.where(f -> f.bool()
      			        		.must(f.match().fields( "subject" ).matching( "Hibernate"))
      			        		.must(f.match().fields( "name" ).matching( "John")))
      			        .fetchHits(20);
    
    
    … then yes, entities where the subject does not exist will not match. But that’s what you asked for…
  • If you ask Hibernate Search to find all entities where either the subject field or the name field matches Hibernate AND `John:
      			List<UntypedDataEntity> hits = searchSession.search( UntypedDataEntity.class )
      					.where(f -> f.simpleQueryString()
      			        		.fields("subject", "name")
      			        		.matching( "Hibernate + John"))
      			        .fetchHits(20);
    
    
    … then yes, entities where the subject does not exist may match, if their name field contains both Hibernate and John. Which, again is what you asked for.

But your pull request showed two predicates with two separate values.
Can you show me how you are really building your query?

In my opinion, and based on what you said so far, I think you should be doing something like this:

				List<UntypedDataEntity> hits = searchSession.search( UntypedDataEntity.class )
						.where(f -> f.simpleQueryString()
				        		.fields("subject", "name", ... /* add all other fields you want users to be able to filter on */)
				        		.matching(userInput))
				        .fetchHits(20);

This should do what you want, assuming the user writes + to represent a “AND” operator and | to represent an “OR” operator. If they want to write AND/OR, just do a search and replace before passing the user input to Hibernate Search.