HS6 sort by score different than Lucene for the same query

We are experiencing undesirable search results in Hibernate Search 6.0.6, and we are troubleshooting the issue. Hibernate Search 5.11 was working very well for us.

I plugged the same Lucene query string generated by the Hibernate Search query into Luke, with the same index files open, and notice that the search results we expect are returned.

I tried using the same default Analyzer and Similarity on both sides of the test. The documents I expect to come up first are seen at the top of the query results in Luke, and the same records are far down the list on the Hibernate Search side.

Our goal is to see the same search results that Luke is providing, in Hibernate Search. Could Hibernate Search be inadvertently modifiying the sort sequence of records returned from Lucene, and thereby corrupting the sort, such that lower scoring documents appear first?

The users are expecting similar search results to those seen in Hibernate Search 5.11. Our migration from 5.11 to 6.0.6 is the only change we have intentionally made. The migration was a great deal of effort, so we have tried to eliminate any changes caused by our migration by testing with the same query string in Luke.

Which default ones? BM25Similarity is the new default in Search 6, which I suppose could explain the difference you’re seeing, if Luke uses ClassicSimilarity by default.

See this section of the docs if you need to use another Similarity.

That’s unlikely… But I would have an easier time giving a definite answer with more information, ideally a reproducer. Do you think you could craft one, for example with this test case template?

BM25Similarity is the default in Luke as well. Do you use Luke? It now ships with Lucene. I run a search in our application, and see poor results. Just before the query runs, I print the string that represents the query for Lucene. I copy and paste the query string into Luke, and get much better results returned with Luke than I do in our application.

I have a hunch that boost is not being applied in HS6, because I have tried to boost things to try to fix the results, and it seems like boost of terms and phrases does nothing, even through the query string does includes the syntax for boost. Either that, or maybe BM25Similarity is not actually being used in HS6, even though it is meant to be?

Since this test is highly dependent on our complex model and the data which populates it, I thought it might be easier for me to fix the bug myself than to write a test case that reproduces my environment. If we had a way to share a test database that was populated with enough variety of data, and the model had enough complexity, it might be achievable.

Initially I thought I would talk through it with you to see if we could come up with a plan based on our observations. I assume you have a test environment all ready to go that allows you to see if boost is being applied or not?

I dumped out the lucene scores of the top items in the results coming back from HS6 using the explain functions, and see scores which are very low at the top, such as 3, when in Luke the top score after being sorted descending is 56. This is a huge discrepancy. I expected to see identical scores between HS6 and Luke if searching the same index with the same query string.

Remember that I’m using the same index files for both HS6 and Luke, so we know that the indexing process can not implicated.

When I need to, yes. Not often, though.

Be careful about that: Luke will not necessarily execute the same query.

For example the string representation of queries is not unique. A TermRangeQuery may have the same representation as a PointRangeQuery (numeric range). The query parser used by Luke will assume all your range queries are TermRangeQuery, IIRC, even if the query was originally a PointRangeQuery. This may lead to different results, and hide other problems (like a field encoded as a string while you’re executing a numeric query).

Also, analysis in Lucene (and thus Hibernate Search) happens before the Query object is created: a string is parsed, then broken down into Terms, then the Query objects are created.
So the string that you’re copying to Luke is the representation of a query that has already been parsed/analyzed, and Luke will parse it and analyze it all over again. Depending on the parser and analyzer used, that might very well result in a completely different query.

Sure, you can’t just copy/paste your application, it’s too complex. The point was to simplify the problem to the point it can be shared. Surely an hypothesis such as “boosting doesn’t work at all” should be rather easy to test with a much simpler reproducer?

In any case, here are my two cents; the things I would have a look at.

  • What’s the query you’re copy-pasting to Luke exactly?
    • Could it be that Luke executes something different than Hibernate Search, as explained above? That would be very helpful to pinpoint what doesn’t work exactly.
  • How you build your query (DSL code). Which predicates are you using exactly; simpleQueryString? match + bool? Native Lucene queries?
    • Be aware that the match predicate, just like keyword in Search 5, does not understand Lucene Query syntax (^, …). Only the simpleQueryString predicate does.
    • If you’re building queries using a native Lucene parser, and its configuration is incomplete (especially the analyzer), that might be part of the problem.
    • Some non-string fields types are encoded in a different way in Hibernate Search 6 than in Hibernate Search 5, so native queries that used to work in 5 might not work in 6 anymore.
  • Which fields you’re targeting, and how they are declared.
    • Query analysis depends on your mapping configuration, too, so it’s possible that something wrong in the mapping would lead to correct indexing (so Luke wouldn’t be affected) but incorrect query analysis (so Hibernate Search would be affected).

That’s certainly very suspicious. But as explained above, this might just be that your query string is analyzed a second time by Luke, resulting in a different Query object being executed.

So, most likely, some things are not parsed/analyzed as you expect. We need to find out what exactly: which predicate, with which input, and with which target field.

If Luke can display the query that it executed, it might be interesting to compare it to the query string passed as input.

There are the test case templates, like I mentioned before, but those are generic; you need to setup the mapping/data yourself. I still think this would be the easiest way for you to isolate the problem, but

Of course, we have integration tests. But because there are lots of them (thousands of test method executions), they are organized in a very specific way to make sure they execute quickly. As a result, they are not exactly easy to work with, but if you want to give it a try:

Thank you very much for your reply.

Yes, I’ve overrode the analyzer in Luke to be the WhitespaceAnalyzer to make sure this doesn’t happen, and I’m using only fields which are whitespace analyzed in the index on both sides. :slightly_smiling_face:

Yes, that is excellent to point out, and I’ve been careful not to get caught by this.

I finally found the cause of this issue, and it was my own fault. We had several custom scoring classes in HS5, which were designed to be a secondary sort with less weight than the relevancy score. I converted these to straight sorting in HS6, but forgot about one annotation that was added at the top of this entity class 6 years ago.

This annotation was off the radar, and caused our own data framework to impose a sort at the end of this search that I totally missed was happening, which used to be a slight score modification in HS5. So this was my fault. The sortation wasn’t printed in the lucene query, and I missed that the sort clause wasn’t null, so it was being passed to the query at the end via a pathway I didn’t have a breakpoint on in order to catch.

Sorry for the false alarm! The annotation was removed, and the search results look perfect now.

1 Like