.fromLuceneQuery(…) not matching @GenericField fields

bitpunk · May 3, 2022, 5:06pm

I am currently upgrading our server-application from HS-5.9.2-Final to HS-6.1.4-Final and i am stuck.
Constraints are:

HS-6.1.4-Final with Lucene-backend
we need to make all fields of our entities available for searches
mostly for historic reasons searches come as Lucene-query strings (e.g. “firstName:John”)

I made a small POC and was able to search successfully on all entity-fields annotated with KeywordField and FulltextField.
But so far no luck searching fields annotated with GenericField (e.g. int-types). I tested my lucene-queries using luke and had no problem searching over numeric- or text-fields.

My entity looks like:

...
@Entity
@Indexed
@Getter
@Setter
@ToString
@RequiredArgsConstructor
public class Customer extends AbstractEntity {
    @GenericField(sortable = Sortable.YES)
    int age;
    @FullTextField
    @Enumerated(EnumType.STRING)
    Gender gender;
    @FullTextField
    @KeywordField(name = "firstName_sort", sortable = Sortable.YES)
    private String firstName;
    @FullTextField
    @KeywordField(name = "lastName_sort", sortable = Sortable.YES)
    private String lastName;
    @FullTextField
    @KeywordField(name = "identification_sort", sortable = Sortable.YES)
    @Column(unique = true)
    private String identification;
    @FullTextField
    @KeywordField(name = "email_sort", sortable = Sortable.YES)
    private String email;
    @OneToMany(mappedBy = "customer", cascade = CascadeType.ALL)
    @JsonBackReference
    @ToString.Exclude
    private List<Book> books;
    ...
}

Searches i can run successfully are:

@ParameterizedTest
    @CsvSource({
            "firstName:Johnny,1",                           // exact field-match
            "firstName:Johnny OR firstName:Ma*,15",         // exact field-match
            "lastName:\"van Es\",2",                        // exact field-match with space
            "firstName:Mar*,6",                             // wildcard search
            "firstName:N?ra,2",                             // wildcard search
            "firstName:Mar* AND lastName:Spencer,1",        // wildcard search
            "email:martin.spencer@fa.hu,1",                 // email
            "identification:\"df5521d7-b689-4ef5-9902-40b66b21eded\",1",     // KeywordField
            "identification:df5521d7*,1",                   // full wildcard on FieldName, partial wildcard on KeywordField
            "gender:MALE,70"                                // ENUM
    })
    void testFieldSearch(String query, long expectedHits) throws ParseException {
        Pageable pageable = PageRequest.of(0, 20);
        SearchRequest<Customer> searchRequest = customerSearchRequestFactory.getSearchRequest(query, pageable, Customer.class);
        Page<Customer> customers = customerSearchService.search(searchRequest);
        assertThat(searchRequest.toString(), customers, is(notNullValue()));
        assertThat(searchRequest.toString(), customers.getTotalElements(), is(expectedHits));
    }

If i try to add search-strings for the “age” field (e.g. “age:6” or “age:[4 TO 72]”) the test fails because i get no hits.
At the heart of the SearchService i use:

    ...
    private Page<T> runLuceneQuery(SearchRequest<T> searchRequest) throws ParseException {
        log.debug("run LuceneQuery for: {}", searchRequest);
        AtomicReference<SearchSession> searchSession = new AtomicReference<>(Search.session(entityManager));
        QueryParser parser = new MultiFieldQueryParser(searchRequest.getFieldNames().toArray(new String[0]), new StandardAnalyzer());
        Query query = parser.parse(searchRequest.getQuery());
        SearchResult<T> result = searchSession.get().search(searchRequest.getEntityClass())
                .extension(LuceneExtension.get())
                .where(f -> f.fromLuceneQuery(query))
                .fetch((int) searchRequest.getPageable().getOffset(), searchRequest.getPageable().getPageSize());
        log.debug("results found: {}", result);
        return getPage(searchRequest, result);
    }
    ...

I am probably missing something very fundamental here, but what?
Any help is appreciated greatly.
Best regards
Jochen

yrodiere · May 4, 2022, 6:59am

Hi,

Lucene has different indexes for different data types. In particular, text data and numeric data are indexed completely differently. And querying those completely different indexes requires completely different queries.

QueryParser creates queries designed to target text fields only. Those queries simply won’t work on numeric fields, whatever you try. When you write age:[4 TO 72], QueryParser interprets that as the age string must be between the string "4" and the string "72". An indexed number, any number, cannot match that query, because a number is not a string.

By the way, even string values will match weirdly, because text range queries use lexicographical order: the string “488” would match that query, because it’s after “4” and before “72”.

So, your only solutions:

Index your numeric fields as strings, and accept the limitations (such as range [4 TO 72] matching the string 488). You’ll need to apply a custom value bridge to your numeric fields for that, one that turns numbers into strings. Be aware you have the ability to set the default value bridge for a given type.
Stop using the query parser and switch to the Hibernate Search predicate DSL, which will create the right Lucene queries under the hood, based on the type of your fields. You will lose the ability to pass query strings, though, unless you use the simpleQueryString predicate, but it can only target text fields and has more limited syntax than QueryParser (no ranges, in particular).
Extend QueryParser to correctly deal with numeric fields. For example, you could override org.apache.lucene.queryparser.classic.QueryParserBase#getRangeQuery to build the range query with the Hibernate Search Predicate DSL, which will build a numeric range query for a numeric field. EDIT: You may need to use the Hibernate Search metamodel to determine the expected type of arguments for a given field, and pick the appropriate way to parse the string. See org.hibernate.search.engine.backend.metamodel.IndexValueFieldTypeDescriptor#dslArgumentClass in particular.
Implement your own query parser that correctly deals with numeric fields. Depending on the syntax you need, and if you use the appropriate tools, this might not be too hard, and might result in more satisfying experience for your end users. You will probably want to use the Hibernate Search metamodel to determine the expected type of arguments for a given field and the Predicate DSL to build search predicates/queries.

If, in Hibernate Search 5, your numeric fields were annotated with @FieldBridge(impl = IntegerBridge.class), you were effectively implementing solution 1, but with a built-in bridge instead of a custom one. Indeed, the javadoc of IntegerBridge states: Bridge an {@link Integer} to a {@link String}. That built-in bridge’s name was deceptive, which is why it was removed in Hibernate Search 6.

bitpunk · May 4, 2022, 8:54am

Thanks a lot for this quick clarification!!
From all the options you listed #3 seems most promising to me.

yrodiere · May 4, 2022, 9:22am

Ok. Just something that I forgot to mention: you may need to use the Hibernate Search metamodel to determine the expected type of arguments for a given field, and pick the appropriate way to parse the string. See org.hibernate.search.engine.backend.metamodel.IndexValueFieldTypeDescriptor#dslArgumentClass in particular.

yrodiere · May 4, 2022, 9:30am

Also… the only way to convert a Hibernate Search SearchPredicate to a Lucene Query at the moment is to call org.hibernate.search.backend.lucene.search.spi.LuceneMigrationUtils#toLuceneQuery.

That’s a SPI though, so it may disappear in a future version. If you want to use this in your application, I’d suggest requesting an actual API, so that it’s better tested and guaranteed to remain available in future versions.

yrodiere · May 4, 2022, 10:13am

Relatedly, I opened the following tickets to address these problems one day:

HSEARCH-4558 simpleQueryString for numeric/date fields

HSEARCH-4563 queryString predicate for advanced, Lucene-syntax query strings

Don’t hold your breath though, I already have a lot on my plate Unless someone else contributes the features, it will likely take time until I can give this a try.

Topic		Replies	Views
@GenericField doesn't work as expected Hibernate Search	4	575	April 5, 2023
Searching for Long numbers in @GenericFiled Hibernate Search	15	1224	February 17, 2022
@GenericField and full-text search Hibernate Search	5	838	January 16, 2023
Migrating @FieldBridge to v6.1 Hibernate Search	11	1339	July 21, 2022
Lucene With Special Characters Hibernate Search	2	445	September 13, 2023

.fromLuceneQuery(…) not matching @GenericField fields

Related topics