High performance autocomplete optimization

ohh ok so for the Lucene you’d want to implement LuceneAnalysisConfigurer see this example to get started.

As for the filters in the Lucene case you just have to look through the Lucene packages, in your case these are the ones you were looking for:

  • org.apache.lucene.analysis.ngram.NGramFilterFactory : nGram
    • parameters: minGramSize / maxGramSize
  • org.apache.lucene.analysis.ngram.EdgeNGramFilterFactory : edgeNGram
    • parameters: minGramSize / maxGramSize
  • org.apache.lucene.analysis.miscellaneous.LimitTokenCountFilterFactory : limitTokenCount
    • parameters: maxTokenCount / consumeAllTokens

and it’ll look something like the example below (hibernate.search.backend.analysis.configurer):

Create your configurer (don’t forget to pass it to Search, see the link above):

public class YourAnalysisConfigurer implements LuceneAnalysisConfigurer {
	@Override
	public void configure(LuceneAnalysisConfigurationContext context) {
		context.analyzer( "someAnalyzerName" ).custom()
				.tokenizer( WhitespaceTokenizerFactory.class )
				// add some filters to clean up your text:
				.tokenFilter( StopFilterFactory.class )
				.tokenFilter( LowerCaseFilterFactory.class )
				// after these ^ filters you are expecting to have the words you want to auto-complete on
				// now add the limit filter to only keep two words
				.tokenFilter( LimitTokenCountFilterFactory.class )
				.param( "maxTokenCount", "2" )
				// add the ngram / or edgengram filter to generate the tokens
				.tokenFilter( EdgeNGramFilterFactory.class )
				.param( "minGramSize", "2" )
				.param( "maxGramSize", "15" );

		// same as above config but without the ngram filter, you'd use this one to be applied as a search analyzer
		context.analyzer( "someAnalyzerNameSearch" ).custom()
				.tokenizer( WhitespaceTokenizerFactory.class )
				// add some filters to clean up your text:
				.tokenFilter( StopFilterFactory.class )
				.tokenFilter( LowerCaseFilterFactory.class );
	}
}

and use apply these analyzers to your entity:

@Entity
@Indexed
public class MyEntity {

	// ... other things

	@FullTextField(analyzer = "someAnalyzerName", searchAnalyzer = "someAnalyzerNameSearch")
	private String myText;

}
2 Likes