Lucene With Special Characters

Tony · September 13, 2023, 9:43am

Hello,
i have a class

public class ObjectA {

    @Column(name = "routesortlist", length = 100)
	@FullTextField
	@GenericField(name = "routesortlist_sort", sortable = Sortable.YES)
	private String routesortlist;
}

In my Postgre database there are many records of ObjectA with code e.x “001-Γ” etc.
Searching with 001-Γ

searchPredicate = pf.wildcard().field(“routesortlist”).matching(* + 001-Γ + *).toPredicate();

the result is always empty.
It works only if i tokenize the above String but its not desired.

Any help?
Thank you !

yrodiere · September 13, 2023, 12:47pm

Hey,

The wildcard predicate doesn’t analyze the string you pass. Indexing does, so the token in the index is not the same you’re passing.

See https://docs.jboss.org/hibernate/search/6.2/reference/en-US/html_single/#search-dsl-predicate-wildcard

You should either use a different predicate, or configure your field differently.

In this case I’d say if you don’t want tokenization, don’t use a full-text field whose main purpose is actually tokenization. Use @KeywordField instead; see Hibernate Search 6.2.1.Final: Reference Documentation

If you want tokenization I’d recommend using a @FullTextField (not a @KeywordField) with:

an analyzer that does some normalization then uses an n-gram filter to generate many tokens that are actually just substrings
a searchAnalyzer that does the normalization part, but does not use the n-gram filter.

It’s somewhat similar to what’s explained in the last few paragraphs here.

Tony · September 13, 2023, 1:36pm

Replace @FullTextField with @KeywordField and create a class MyLuceneAnalysisConfigurer and now it looks that it works fine.

@Column(name = "routesortlist", length = 30)
@KeywordField(normalizer = Constants.NORMALIZER_LOWERCASE)
@GenericField(name = "routesortlist_sort", sortable = Sortable.YES)
private String routesortlist;

public class MyLuceneAnalysisConfigurer implements LuceneAnalysisConfigurer {
	
	@Override
	public void configure(final LuceneAnalysisConfigurationContext context) {
		context.normalizer(Constants.NORMALIZER_LOWERCASE).custom()
				.tokenFilter(LowerCaseFilterFactory.class)
				.tokenFilter(ASCIIFoldingFilterFactory.class);
	}
}

Thank you again for your help !

Topic		Replies	Views
Can Someone Please help me out? I am stucked at wildcard search with special characters using StandardTokenizerFactory Hibernate Search	28	2199	August 19, 2020
Hibernate Search on special characters Hibernate Search	7	3787	January 27, 2021
Hibernate Search 7.0: new field is ignored in wildcard()-search Hibernate Search	1	46	December 4, 2024
@GenericField and full-text search Hibernate Search	5	839	January 16, 2023
Hibernate Search with special characters Hibernate Search	6	1133	January 28, 2021

Lucene With Special Characters

Related topics