Lucene With Special Characters

Hello,
i have a class

public class ObjectA {

    @Column(name = "routesortlist", length = 100)
	@FullTextField
	@GenericField(name = "routesortlist_sort", sortable = Sortable.YES)
	private String routesortlist;
}

In my Postgre database there are many records of ObjectA with code e.x “001-Γ” etc.
Searching with 001-Γ

searchPredicate = pf.wildcard().field(“routesortlist”).matching(* + 001-Γ + *).toPredicate();

the result is always empty.
It works only if i tokenize the above String but its not desired.

Any help?
Thank you !

Hey,

The wildcard predicate doesn’t analyze the string you pass. Indexing does, so the token in the index is not the same you’re passing.

See https://docs.jboss.org/hibernate/search/6.2/reference/en-US/html_single/#search-dsl-predicate-wildcard

You should either use a different predicate, or configure your field differently.

In this case I’d say if you don’t want tokenization, don’t use a full-text field whose main purpose is actually tokenization. Use @KeywordField instead; see Hibernate Search 6.2.1.Final: Reference Documentation

If you want tokenization I’d recommend using a @FullTextField (not a @KeywordField) with:

  • an analyzer that does some normalization then uses an n-gram filter to generate many tokens that are actually just substrings
  • a searchAnalyzer that does the normalization part, but does not use the n-gram filter.

It’s somewhat similar to what’s explained in the last few paragraphs here.

1 Like

Replace @FullTextField with @KeywordField and create a class MyLuceneAnalysisConfigurer and now it looks that it works fine.

@Column(name = "routesortlist", length = 30)
@KeywordField(normalizer = Constants.NORMALIZER_LOWERCASE)
@GenericField(name = "routesortlist_sort", sortable = Sortable.YES)
private String routesortlist;

public class MyLuceneAnalysisConfigurer implements LuceneAnalysisConfigurer {
	
	@Override
	public void configure(final LuceneAnalysisConfigurationContext context) {
		context.normalizer(Constants.NORMALIZER_LOWERCASE).custom()
				.tokenFilter(LowerCaseFilterFactory.class)
				.tokenFilter(ASCIIFoldingFilterFactory.class);
	}
}

Thank you again for your help ! :slight_smile: