High performance autocomplete optimization

I have a database with roughly 20k entries. I want to add an autocomplete for one field only.

Here’s the rough version of the autocomplete right now:

entity.keyword().fuzzy().withEditDistanceUpTo(1).onField("title").matching(userInput)

Is there a better way to do this?

I want to optimize the autocomplete so that only the first two words in the field are returned. I know I could trim off any remaining words manually, but I’m hoping there’s a better built-in solution.

To elaborate, I want the autocomplete to match up to two words.

Usually, for auto-complete kind of scenarios edge ngram or ngram filters are used at index time.

As to this part:

I’m not sure I understand what you are trying to achieve, but if you want the search to be performed only on the first two words of the title, e.g., for a title High performance autocomplete optimization, only autocomplete on High performance. If that’s so - you can add a limit token count token filter. Just make sure that the limit filter is added before the ngram filter.

And just in case, here’s how to configure the analyzers with Hibernate Search:

That looks good! You know what I’m looking for :slight_smile:

Do you have the relevant documentation of ngram and the token limit for Lucene? The closest thing I could find it @KeywordField.

ohh ok so for the Lucene you’d want to implement LuceneAnalysisConfigurer see this example to get started.

As for the filters in the Lucene case you just have to look through the Lucene packages, in your case these are the ones you were looking for:

  • org.apache.lucene.analysis.ngram.NGramFilterFactory : nGram
    • parameters: minGramSize / maxGramSize
  • org.apache.lucene.analysis.ngram.EdgeNGramFilterFactory : edgeNGram
    • parameters: minGramSize / maxGramSize
  • org.apache.lucene.analysis.miscellaneous.LimitTokenCountFilterFactory : limitTokenCount
    • parameters: maxTokenCount / consumeAllTokens

and it’ll look something like the example below (hibernate.search.backend.analysis.configurer):

Create your configurer (don’t forget to pass it to Search, see the link above):

public class YourAnalysisConfigurer implements LuceneAnalysisConfigurer {
	@Override
	public void configure(LuceneAnalysisConfigurationContext context) {
		context.analyzer( "someAnalyzerName" ).custom()
				.tokenizer( WhitespaceTokenizerFactory.class )
				// add some filters to clean up your text:
				.tokenFilter( StopFilterFactory.class )
				.tokenFilter( LowerCaseFilterFactory.class )
				// after these ^ filters you are expecting to have the words you want to auto-complete on
				// now add the limit filter to only keep two words
				.tokenFilter( LimitTokenCountFilterFactory.class )
				.param( "maxTokenCount", "2" )
				// add the ngram / or edgengram filter to generate the tokens
				.tokenFilter( EdgeNGramFilterFactory.class )
				.param( "minGramSize", "2" )
				.param( "maxGramSize", "15" );

		// same as above config but without the ngram filter, you'd use this one to be applied as a search analyzer
		context.analyzer( "someAnalyzerNameSearch" ).custom()
				.tokenizer( WhitespaceTokenizerFactory.class )
				// add some filters to clean up your text:
				.tokenFilter( StopFilterFactory.class )
				.tokenFilter( LowerCaseFilterFactory.class );
	}
}

and use apply these analyzers to your entity:

@Entity
@Indexed
public class MyEntity {

	// ... other things

	@FullTextField(analyzer = "someAnalyzerName", searchAnalyzer = "someAnalyzerNameSearch")
	private String myText;

}
2 Likes

This looks perfect! How would I perform the search? Is it as simple as entity.match().field("myText").matching(userInput)? (Sorry this is all very new to me :stuck_out_tongue: )

OK, I got the autocomplete to work thanks to you, ty! I have another question though: is it possible to conditionally apply the analyzer on certain queries?

For instance, I want the autocomplete to use the analyzer, but I want the actual search to use the default FullTextField analyzer with fuzzy search. Should I create two columns with the same value, and apply the analyzer to one of them?

It is common to have multiple index fields with different analysis configuration derived from the same entity field. In other words, you can just add multiple full-text annotations to it like:

@Entity
@Indexed
public class MyEntity {

	// ... other things

	// Use this one for autocomplete query 
	// (make sure it has a name specified so that each field has a unique one).
	@FullTextField(name = "myTextAutocomplete", analyzer = "someAnalyzerName", searchAnalyzer = "someAnalyzerNameSearch")
	// Use this field for a search queries where autocomplete is not needed.
	@FullTextField(analyzer = "someOtherAnalyzerNameOrNoneIfDefaultIsGood")
	private String myText;

}

What about autocomplete on multiple fields (product name, manufacturer name, provider name, provider city, provider country)?

Right now I have a String property “autocomplete” inside product entity and I populate it during entity persisting process.
Later I use MySQL stored procedure which uses LIKE operator.

So if you have a product “Almond Milk” from manufacturer “Alpro” sold in a store “Aldi” in Killorglin, Ireland - you can get this product using first letters of any of these data (“alp”, “ire” and so on).
When you type another word, stored procedure adds AND operator and another LIKE.

Is there a way to get the same behaviour by Lucene?

Hey @horvoje

you could have these fields you are combining annotated with a @FullTextField and use ngram filter in the analyzer for them, e.g. something like:

@FullTextField(name = "productAutocomplete", analyzer = "someAnalyzerName", searchAnalyzer = "someAnalyzerNameSearch")
String product;
@FullTextField(name = "manufacturerAutocomplete", analyzer = "someAnalyzerName", searchAnalyzer = "someAnalyzerNameSearch")
String manufacturer;
.... other fields

And then use aquery string predicate targeting multiple fields (Hibernate Search 7.1.1.Final: Reference Documentation):

List<Product> hits = searchSession.search( Product.class )
        .where( f -> f.simpleQueryString()
                .field( "product" ).field( "manufacturer" )
                .matching( "alp" ) )
        .fetchHits( 20 );

for this part:

stored procedure adds AND operator and another LIKE

you’d just use the AND operator option on a simple query string predicate (Hibernate Search 7.2.0.Alpha2: Reference Documentation):

List<Product> hits = searchSession.search( Product.class )
        .where( f -> f.simpleQueryString()
                .field( "product" ).field( "manufacturer" )
                .matching( "alp" )
                .defaultOperator( BooleanOperator.AND ) )
        .fetchHits( 20 );

Hello,
Thanks for sharing, this is very helpful for me.