QueryBuilder minimum number of matching should clauses

Is there a way to set the minimum number of matching should clauses on org.hibernate.search.query.dsl.QueryBuilder?

...
SearchFactory searchFactory = fullTextEntityManager.getSearchFactory();
QueryBuilder queryBuilder = searchFactory.buildQueryBuilder().forEntity(SomeEntity.class).get();

queryBuilder.setMinimumNumberShouldMatch(1); <- ?

queryBuilder.bool().should(...).should(...).must(...)....createQuery();
Query fullTextQuery = fullTextEntityManager.createFullTextQuery(searchQuery, SomeEntity.class);
...

In short: no, the QueryBuilder does not allow that yet. I just created a ticket so that we add this feature in the next version of Hibernate Search.

In the meantime, you can fall back to native queries to achieve what you want:

  • If you are running Hibernate Search in embedded Lucene mode (the default), you can use new org.apache.lucene.search.BooleanQuery.Builder().should(...).should(...).must(...).setMinimumNumberShouldMatch(...).build() to build the boolean query. Note that the sub-queries you will pass to should/must can still be built using the Hibernate Search QueryBuilder.
  • If you are using the experimental Elasticsearch integration, you can use org.hibernate.search.elasticsearch.ElasticsearchQueries.fromJson( ... ). You will have to write the whole query as JSON, though, and will not be able to take advantage of the Hibernate Search QueryBuilder at all.

Thanks, @yrodiere. I’m indeed using Elasticsearch integration, so looking forward to the next Hibernate Search version.

@sant0s We are about to backport the feature to Search 5. Could you confirm that the following syntaxes would cover all of your use cases, please? Thanks.

// Example 1: at least 3 "should" clauses have to match
booleanContext1.minimumShouldMatchNumber( 3 );
// Example 2: at most 2 "should" clauses may not match
booleanContext2.minimumShouldMatchNumber( -2 );
// Example 3: at least 75% of "should" clauses have to match (rounded down)
booleanContext3.minimumShouldMatchPercent( 75 );
// Example 4: at most 25% of "should" clauses may not match (rounded down)
booleanContext4.minimumShouldMatchPercent( -25 );

Yes, it seems to cover all the use cases. Thanks, @yrodiere.

By the way, I’ve been using the below (with Elasticsearch) which doesn’t require writing the query in JSON:

QueryBuilder queryBuilder = searchFactory.buildQueryBuilder().forEntity(...).get();
BooleanQuery.Builder booleanQueryBuilder = new BooleanQuery.Builder().setMinimumNumberShouldMatch(...);
                .add(queryBuilder...createQuery(), BooleanClause.Occur.SHOULD)
                .add(queryBuilder...createQuery(), BooleanClause.Occur.SHOULD)
                ...;
FullTextQuery fullTextQuery = fullTextEntityManager.createFullTextQuery(booleanQueryBuilder.build(), ...);

Thanks, we’ll release it in 5.10.2.Final then.

Be careful, what you’re doing will be ignored by our Elasticsearch integration. There is some conversion logic involved to convert from Lucene Queries to Elasticsearch JSON, and currently it doesn’t take minimumShouldMatch into account. So your code will have the exact same effect as using the DSL and not providing a minimumShouldMatch at all, unfortunately.

I see.

One can still achieve meeting more than 1 should condition by combining must and should:

BooleanQuery.Builder queryBuilder = new BooleanQuery.Builder().setMinimumNumberShouldMatch(2)
	.add(queryBuilder....createQuery(), BooleanClause.Occur.SHOULD)
	.add(queryBuilder....createQuery(), BooleanClause.Occur.SHOULD)
	.add(queryBuilder....createQuery(), BooleanClause.Occur.SHOULD);

won’t work but

org.apache.lucene.search.Query query1 = queryBuilder....createQuery();
org.apache.lucene.search.Query query2 = queryBuilder....createQuery();
org.apache.lucene.search.Query query3 = queryBuilder....createQuery();

BooleanQuery.Builder queryBuilder12 = new BooleanQuery.Builder()
	.add(query1, BooleanClause.Occur.SHOULD).add(query2, BooleanClause.Occur.SHOULD);
BooleanQuery.Builder queryBuilder13 = new BooleanQuery.Builder()
	.add(query1, BooleanClause.Occur.SHOULD).add(query3, BooleanClause.Occur.SHOULD);
BooleanQuery.Builder queryBuilder23 = new BooleanQuery.Builder()
	.add(query2, BooleanClause.Occur.SHOULD).add(query3, BooleanClause.Occur.SHOULD);

BooleanQuery.Builder combinedQueryBuilder = new BooleanQuery.Builder()
	.add(queryBuilder12.build(), BooleanClause.Occur.MUST)
	.add(queryBuilder13.build(), BooleanClause.Occur.MUST)
	.add(queryBuilder23.build(), BooleanClause.Occur.MUST);

will (or at least it seems to). Is this approach safe to use?

I think it should work, though obviously it should only be used as a last resort. With more than 3 conditions, it will get out of control really fast.

Not sure about how that would affect scoring, but I guess some unwanted boost is a small price to pay if the query is at least filtering the results correctly.

I’ll try to release 5.10.2 in the next few days so that you can avoid that though, I really don’t want to force anyone to do such thing.

5.10.2.Final is there, with minimumShouldMatch constraints: http://in.relation.to/2018/06/22/hibernate-search-5-10-2-Final/

HTH

Good news.

There seems to be a mismatch in the query variable name in the code snippet at http://in.relation.to/2018/06/22/hibernate-search-5-10-2-Final/

Fixed, the post should refresh soon. Thanks!