Lucene With Special Characters

i have a class

public class ObjectA {

    @Column(name = "routesortlist", length = 100)
	@GenericField(name = "routesortlist_sort", sortable = Sortable.YES)
	private String routesortlist;

In my Postgre database there are many records of ObjectA with code e.x “001-Γ” etc.
Searching with 001-Γ

searchPredicate = pf.wildcard().field(“routesortlist”).matching(* + 001-Γ + *).toPredicate();

the result is always empty.
It works only if i tokenize the above String but its not desired.

Any help?
Thank you !


The wildcard predicate doesn’t analyze the string you pass. Indexing does, so the token in the index is not the same you’re passing.


You should either use a different predicate, or configure your field differently.

In this case I’d say if you don’t want tokenization, don’t use a full-text field whose main purpose is actually tokenization. Use @KeywordField instead; see Hibernate Search 6.2.1.Final: Reference Documentation

If you want tokenization I’d recommend using a @FullTextField (not a @KeywordField) with:

  • an analyzer that does some normalization then uses an n-gram filter to generate many tokens that are actually just substrings
  • a searchAnalyzer that does the normalization part, but does not use the n-gram filter.

It’s somewhat similar to what’s explained in the last few paragraphs here.

1 Like

Replace @FullTextField with @KeywordField and create a class MyLuceneAnalysisConfigurer and now it looks that it works fine.

@Column(name = "routesortlist", length = 30)
@KeywordField(normalizer = Constants.NORMALIZER_LOWERCASE)
@GenericField(name = "routesortlist_sort", sortable = Sortable.YES)
private String routesortlist;

public class MyLuceneAnalysisConfigurer implements LuceneAnalysisConfigurer {
	public void configure(final LuceneAnalysisConfigurationContext context) {

Thank you again for your help ! :slight_smile: