Debugging my code using the Luke index toolbox I saw, that I can get the expected result when toggling the default query operator of the ‘org.apache.lucene.analysis.standard.StandardAnalyzer’ from AND to OR…
Can anybody explain me how I can get the expected (OR-like) resultset?
The queries generated by Luke are simply not the same queries as what you’re using here.
First, consider whether you really need the wildcards (*). Believe me, they will make your life harder, for the simple reasons that terms are not analyzed by this query: they are not broken down into words, they are not lowercased, …
Have a look at simpleQueryString, in particular .withAndAsDefaultOperator().
This query does a bit more than what you want, since it accepts a simple query syntax with various operators, but it’s the only one that allows using the AND operator by default.
If the order of results is not satisfying, here are a few things you can do to change it:
Assign a boost to some fields: .onField("title").boostedTo(5f). A boosted field will have a greater influence on the score when it matches, so e.g. you might want to boost a “title” field because a document whose title match is more likely to be relevant.
If you have custom analyzers, tune them. For example if an analyzer removes diacritics, such as “résumé” => “resume”, you might want to preserve the original token so that when someone is really looking for a “résumé”, their query will give a better score to documents containing “résumé” (meaning “list of previous jobs”) than those just containing “resume” (meaning “continue”). Generally, there are options to preserve original tokens in token filters.
I don’t recommend displaying the score to your users, unless they are technical users familiar with Lucene, because they will likely not like what they see:
The score is not just affected by the content of a document. It’s also affected by the query (obviously) but also by the content of other documents: if you add more documents with the term “cat”, then “cat” becomes less significant overall, and matching the term “cat” will have a lower impact on the score… even for pre-existing documents. This behavior is often surprising to end users.
If you want to see the score for debugging purposes, though, you can still use the score projection.
If you want more details about scoring, still for debugging purposes, you can also ask for an explanation of the score computation; see here.
I understand that it does not make sense to display scoring results to the end users.
For debugging purpose, i would like to have a short inspection on my results though. You pointed me to the score projection page, but sorry, I don’t understand how to apply the projection on my code: