Which analyzer(s) / tokenizer(s) for specific ID?

pablo_jaska · June 21, 2021, 9:23am

Hello,
I have a problem (I’m sure there is a solution).
I would like to be able to search for a specific ID.
How do I have to analyze the field so that I can search for the ID?
Example:
ID number 1:
2021-006
I want to perform the following search:
2021-*
2021*
2021-006*

ID number 2:
TEST #: 2021-006011-26
I would like to perform the following search:
test*
test #*
test #:*
test #: 2021*
test #: 2021-*
TEST #: 2021-006011-26*

Any help would be greatly appreciated.
Greetings!

yrodiere · June 21, 2021, 12:25pm

Hello,

So essentially you simply want case-insensitive prefix search?

What was wrong with my answer to your previous, similar question?

It should work similarly for an ID field. You will just have to declare new analyzers where you remove unnecessary analysis components, such as the tokenizer (replace it with a KeywordTokenizer) and the SnowballPorterFilterFactory (remove it, you don’t need it for IDs).

Alternatively, you can use a @KeywordField with a simple “lowercasing” normalizer (similar to the one named lowercase in this example), and rely on a wildcard predicate. It’s less flexible, but should work for simple use cases.

pablo_jaska · June 21, 2021, 12:33pm

Hello, as always thanks for the quick reply.
I would like to be able to analyze the fields with “-” , “#” , “:” to be able to analyze and also search with or without these special characters.
My problem is, unfortunately I don’t know which analyzer/tokenizer to use at this point.
It should be indexed like this.
EXAMPLE: TEST #: 2021-006011-26
=> “TEST”, “#”, “TEST #:”, “2021” “2021-” and so on.

yrodiere · June 21, 2021, 12:48pm

Your examples in the initial post don’t seem to require this indexing.

From what I can see in these examples, you simply need to not tokenize at all, i.e. TEST #: 2021-006011-26 would be indexed as:

t
te
tes
test
test
test #
test #:
test #:
test #: 2
etc.

What is the exact code you’re using to search, what is your mapping, and what are the exact search strings that do not match as expected?

Topic		Replies	Views
Can Someone Please help me out? I am stucked at wildcard search with special characters using StandardTokenizerFactory Hibernate Search	28	2200	August 19, 2020
Changing the analyzer used by @FullTextField by default Hibernate Search	1	791	September 23, 2021
Search Returns No Results Hibernate Search	11	1905	August 17, 2020
Filter out document by ID Hibernate Search	2	596	October 11, 2018
Hibernate Search on special characters Hibernate Search	7	3795	January 27, 2021

Which analyzer(s) / tokenizer(s) for specific ID?

Related topics