.fromLuceneQuery(…) not matching @GenericField fields

I am currently upgrading our server-application from HS-5.9.2-Final to HS-6.1.4-Final and i am stuck.
Constraints are:

  • HS-6.1.4-Final with Lucene-backend
  • we need to make all fields of our entities available for searches
  • mostly for historic reasons searches come as Lucene-query strings (e.g. “firstName:John”)

I made a small POC and was able to search successfully on all entity-fields annotated with KeywordField and FulltextField.
But so far no luck searching fields annotated with GenericField (e.g. int-types). I tested my lucene-queries using luke and had no problem searching over numeric- or text-fields.

My entity looks like:

public class Customer extends AbstractEntity {
    @GenericField(sortable = Sortable.YES)
    int age;
    Gender gender;
    @KeywordField(name = "firstName_sort", sortable = Sortable.YES)
    private String firstName;
    @KeywordField(name = "lastName_sort", sortable = Sortable.YES)
    private String lastName;
    @KeywordField(name = "identification_sort", sortable = Sortable.YES)
    @Column(unique = true)
    private String identification;
    @KeywordField(name = "email_sort", sortable = Sortable.YES)
    private String email;
    @OneToMany(mappedBy = "customer", cascade = CascadeType.ALL)
    private List<Book> books;

Searches i can run successfully are:

            "firstName:Johnny,1",                           // exact field-match
            "firstName:Johnny OR firstName:Ma*,15",         // exact field-match
            "lastName:\"van Es\",2",                        // exact field-match with space
            "firstName:Mar*,6",                             // wildcard search
            "firstName:N?ra,2",                             // wildcard search
            "firstName:Mar* AND lastName:Spencer,1",        // wildcard search
            "email:martin.spencer@fa.hu,1",                 // email
            "identification:\"df5521d7-b689-4ef5-9902-40b66b21eded\",1",     // KeywordField
            "identification:df5521d7*,1",                   // full wildcard on FieldName, partial wildcard on KeywordField
            "gender:MALE,70"                                // ENUM
    void testFieldSearch(String query, long expectedHits) throws ParseException {
        Pageable pageable = PageRequest.of(0, 20);
        SearchRequest<Customer> searchRequest = customerSearchRequestFactory.getSearchRequest(query, pageable, Customer.class);
        Page<Customer> customers = customerSearchService.search(searchRequest);
        assertThat(searchRequest.toString(), customers, is(notNullValue()));
        assertThat(searchRequest.toString(), customers.getTotalElements(), is(expectedHits));

If i try to add search-strings for the “age” field (e.g. “age:6” or “age:[4 TO 72]”) the test fails because i get no hits.
At the heart of the SearchService i use:

    private Page<T> runLuceneQuery(SearchRequest<T> searchRequest) throws ParseException {
        log.debug("run LuceneQuery for: {}", searchRequest);
        AtomicReference<SearchSession> searchSession = new AtomicReference<>(Search.session(entityManager));
        QueryParser parser = new MultiFieldQueryParser(searchRequest.getFieldNames().toArray(new String[0]), new StandardAnalyzer());
        Query query = parser.parse(searchRequest.getQuery());
        SearchResult<T> result = searchSession.get().search(searchRequest.getEntityClass())
                .where(f -> f.fromLuceneQuery(query))
                .fetch((int) searchRequest.getPageable().getOffset(), searchRequest.getPageable().getPageSize());
        log.debug("results found: {}", result);
        return getPage(searchRequest, result);

I am probably missing something very fundamental here, but what?
Any help is appreciated greatly.
Best regards


Lucene has different indexes for different data types. In particular, text data and numeric data are indexed completely differently. And querying those completely different indexes requires completely different queries.

QueryParser creates queries designed to target text fields only. Those queries simply won’t work on numeric fields, whatever you try. When you write age:[4 TO 72], QueryParser interprets that as the age string must be between the string "4" and the string "72". An indexed number, any number, cannot match that query, because a number is not a string.

By the way, even string values will match weirdly, because text range queries use lexicographical order: the string “488” would match that query, because it’s after “4” and before “72”.

So, your only solutions:

  1. Index your numeric fields as strings, and accept the limitations (such as range [4 TO 72] matching the string 488). You’ll need to apply a custom value bridge to your numeric fields for that, one that turns numbers into strings. Be aware you have the ability to set the default value bridge for a given type.
  2. Stop using the query parser and switch to the Hibernate Search predicate DSL, which will create the right Lucene queries under the hood, based on the type of your fields. You will lose the ability to pass query strings, though, unless you use the simpleQueryString predicate, but it can only target text fields and has more limited syntax than QueryParser (no ranges, in particular).
  3. Extend QueryParser to correctly deal with numeric fields. For example, you could override org.apache.lucene.queryparser.classic.QueryParserBase#getRangeQuery to build the range query with the Hibernate Search Predicate DSL, which will build a numeric range query for a numeric field. EDIT: You may need to use the Hibernate Search metamodel to determine the expected type of arguments for a given field, and pick the appropriate way to parse the string. See org.hibernate.search.engine.backend.metamodel.IndexValueFieldTypeDescriptor#dslArgumentClass in particular.
  4. Implement your own query parser that correctly deals with numeric fields. Depending on the syntax you need, and if you use the appropriate tools, this might not be too hard, and might result in more satisfying experience for your end users. You will probably want to use the Hibernate Search metamodel to determine the expected type of arguments for a given field and the Predicate DSL to build search predicates/queries.

If, in Hibernate Search 5, your numeric fields were annotated with @FieldBridge(impl = IntegerBridge.class), you were effectively implementing solution 1, but with a built-in bridge instead of a custom one. Indeed, the javadoc of IntegerBridge states: Bridge an {@link Integer} to a {@link String}. That built-in bridge’s name was deceptive, which is why it was removed in Hibernate Search 6.

Thanks a lot for this quick clarification!!
From all the options you listed #3 seems most promising to me.

Ok. Just something that I forgot to mention: you may need to use the Hibernate Search metamodel to determine the expected type of arguments for a given field, and pick the appropriate way to parse the string. See org.hibernate.search.engine.backend.metamodel.IndexValueFieldTypeDescriptor#dslArgumentClass in particular.

Also… the only way to convert a Hibernate Search SearchPredicate to a Lucene Query at the moment is to call org.hibernate.search.backend.lucene.search.spi.LuceneMigrationUtils#toLuceneQuery.

That’s a SPI though, so it may disappear in a future version. If you want to use this in your application, I’d suggest requesting an actual API, so that it’s better tested and guaranteed to remain available in future versions.

Relatedly, I opened the following tickets to address these problems one day:

HSEARCH-4558 simpleQueryString for numeric/date fields

HSEARCH-4563 queryString predicate for advanced, Lucene-syntax query strings

Don’t hold your breath though, I already have a lot on my plate :slight_smile: Unless someone else contributes the features, it will likely take time until I can give this a try.

1 Like