Hibernate Search 6 TokenFilterDef, Could i use `token_chars` of NGramFilterFactory to define ngram?

yrodiere · October 1, 2019, 10:09am

Two things:

There is no direct equivalent to LIKE in Lucene/Elasticsearch, or at least not one that performs well enough to be considered. Lucene/Elasticsearch are about full-text search, so they take a different, more sensible approach: you don’t match “substrings”, you match “tokens” (words). There’s a decent introduction to full-text search in Hibernate Search 6’s documentation: Hibernate Search 6.0.11.Final: Reference Documentation Hibernate Search 6 is still in beta and APIs are vastly different but the concept is the same.
A single filter will not address all your problems. The idea is to find the right tokenizer/filter for each problem, and to combine them.

So, about the filter…

To make the query “nike” match the indexed text “nike 1234”, use a tokenizer. The whitespace tokenizer will tokenize on spaces only, the standard tokenizer will have more advanced behavior that tokenizes on punctuation, which may be what you’re looking for.
To make the query “Nike” match the indexed text “nike” (i.e. to get case insensitivity), use the lowercase filter.
To make the query “1234” match the indexed text “12345” (match the beginning of words), use an edge-ngram. Not ngram, edge-ngram. Check out the javadoc, they are different.
To make the query “1234” match “1243” or “4123”, use an ngram. It’s really about matching parts of tokens instead of the whole token, but it’s not exactly the same as LIKE in SQL. The query “nike 1234” with a min gram size of 2 and a max gram size of 3 will match “ni 123”, for example. You will have to rely on scoring (sort hits by relevance), which is enabled by default, to get the best matches first.
To handle Japanese/Chinese, I suppose you’ll have to rely on the ICU tokenizer and CJK filter, but frankly I don’t have a clue how these work (I’ve never had to work with Japanese/Chinese).

Topic		Replies	Views
How to use Hibernate Search to instead SQL like in a faster way? Hibernate Search	10	4219	May 10, 2019
Slop does not work for <any word> Hibernate Search	15	389	May 9, 2024
Can Someone Please help me out? I am stucked at wildcard search with special characters using StandardTokenizerFactory Hibernate Search	28	2201	August 19, 2020
High performance autocomplete optimization Hibernate Search	9	247	July 10, 2024
Equivalent WhitespaceTokenizerFactory in HS 6 Hibernate Search	4	525	February 28, 2022