Autocomplete with Hibernate Search 6

pablo_jaska · February 2, 2021, 8:38am

Hello with the last Hibernate version we implemented the autocomplete method as follows:

public List<String> getAllSuggestions(final String searchString) {
        IndexReader reader = null;
        try {
            final FullTextEntityManager fullTextEntityManager = Search
                .getFullTextEntityManager(emf.createEntityManager());
            reader = fullTextEntityManager.getSearchFactory().getIndexReaderAccessor().open(Entity.class);

            final Terms firstTerms= SlowCompositeReaderWrapper.wrap(reader)
                .terms("firstEntityField");
            final Terms secondTerms= SlowCompositeReaderWrapper.wrap(reader)
                .terms("secondEntityField");
            final Terms thirdTerms= SlowCompositeReaderWrapper.wrap(reader)
                .terms("thirdEntityField");

//All words are written in a set so that there are no duplicate entries (Helper-Method below)
            final Set<String> allTerms = new HashSet<>();
            addTermsToSet(firstTerms, allTerms);
            addTermsToSet(secondTerms, allTerms);
            addTermsToSet(thirdTerms, allTerms);

            final List<String> suggestions = allTerms.stream().sorted(Comparator.naturalOrder())
                .collect(Collectors.toList());
    
            return suggestions.stream().filter(s -> s.startsWith(searchString) && !s.equalsIgnoreCase(searchString))
                .limit(SUGGESTION_LIMIT)
                .collect(Collectors.toList());
        } catch (final Exception e) {
            LOG.warn("Terms for autocomplete function couldn't be loaded.");
        } finally {
            if (reader != null) {
                try {
                    reader.close();
                } catch (final Exception readerException) {
                "..."
                }
            }
        }
        return new ArrayList<>();
    }

// Helper Method for Set
    private void addTermsToSet(final Terms terms, final Set<String> set) throws IOException {
        final BytesRefIterator iterator;
        iterator = terms.iterator();
        BytesRef byteRef = null;
        while ((byteRef = iterator.next()) != null) {
            set.add(byteRef.utf8ToString());
        }
    }

Unfortunately I couldn’t find any further information on how I could do something like this with
Hibernate-Search 6.

Do you have any idea?
Thank you!

yrodiere · February 2, 2021, 8:56am

Hello,

As far as I can tell, SlowCompositeReaderWrapper no longer exist in Lucene 8.

I’ve never used something like this to implement autocomplete, I generally run a query against user-provided terms to retrieve “approximate” matches and return the top matches to the user. Here is an example for Hibernate Search 5, it would be rather similar in Hibernate Search 6 except that you can take advantage of the searchAnalyzer.

We can consider exposing readers again in Hibernate Search 6.1 (to be released in alpha in the next few months), but in order to do that right, can you tell us what are you using the Terms for exactly?

pablo_jaska · February 2, 2021, 11:09am

Hello and thanks for the fast reply. I edited the method above.
What we want is access to individual words from the index of several fields combined.

yrodiere · February 2, 2021, 11:33am

Okay then… this may work for small indices, but it will perform rather badly on large ones.

In general, I’d recommend a different approach involving a query. The main difference is that it doesn’t suggest terms, but entities. So if you type “jo” you won’t get “John” as a suggestion, but the entity representing the person “John Smith”. Then you can decide to display that entity as you see fit.

If you’re interested by this approach, read on.

If you really want to use Terms to collect indexed terms with Hibernate Search 6, you will have to wait until HSEARCH-4065 gets fixed (or come and discuss on Zulip how to contribute a patch to fix it ).

Alternatively, you can also switch to Elasticsearch, where you can use suggesters, which I covered briefly here.

Now, to implement autocomplete with queries:

Declare an appropriate analyzer in your analysis configurer:

public class MyAnalysisConfigurer implements LuceneAnalysisConfigurer {
	@Override
	public void configure(LuceneAnalysisConfigurationContext context) {
		context.analyzer( "autocomplete-indexing" ).custom()
				.tokenizer( StandardTokenizerFactory.class )
				.tokenFilter( LowerCaseFilterFactory.class )
				.tokenFilter( SnowballPorterFilterFactory.class )
						.param( "language", "English" )
				.tokenFilter( ASCIIFoldingFilterFactory.class )
				.tokenFilter( EdgeNGramFilterFactory.class )
						.param( "minGramSize", "3" )
						.param( "maxGramSize", "7" );

		// Same as "autocomplete-indexing", but without the edge-ngram filter
		context.analyzer( "autocomplete-query" ).custom()
				.tokenizer( StandardTokenizerFactory.class )
				.tokenFilter( LowerCaseFilterFactory.class )
				.tokenFilter( SnowballPorterFilterFactory.class )
						.param( "language", "English" )
				.tokenFilter( ASCIIFoldingFilterFactory.class );
	}
}

Declare full-text fields with the appropriate analyzers:

@Indexed
public class MyEntity {

    @FullTextField(name = "firstEntityField_autocomplete,
            analyzer = "autocomplete-indexing", searchAnalyzer = "autocomplete-query")
    private String firstEntityField;

    @FullTextField(name = "secondEntityField_autocomplete,
            analyzer = "autocomplete-indexing", searchAnalyzer = "autocomplete-query")
    private String secondEntityField;

    @FullTextField(name = "thirdEntityField_autocomplete,
            analyzer = "autocomplete-indexing", searchAnalyzer = "autocomplete-query")
    private String thirdEntityField;

    // ... getters, setters ...
}

Reindex your data
Query like this to retrieve only entities that match the given terms:

String terms = ...; // User input
List<MyEntity> hits = Search.session( entityManager )
		.search( MyEntity.class )
		.where( f -> f.simpleQueryString()
				.fields( "firstEntityField_autocomplete",
						"secondEntityField_autocomplete",
						"thirdEntityField_autocomplete" )
				.matching( terms )
				.defaultOperator( BooleanOperator.AND ) )
		.fetchHits( 20 );

pablo_jaska · February 2, 2021, 11:53am

I appreciate your answer! I will try this. Thanks again!!!

yrodiere · September 23, 2021, 7:37am

FYI, the index reader can now be accessed from Hibernate Search, starting with version 6.1:

yrodiere · September 8, 2022, 6:25am

A post was split to a new topic: Native Lucene search - no fields in TopDocs

Topic		Replies	Views
High performance autocomplete optimization Hibernate Search	9	238	July 10, 2024
Migrating complex search from Hibernate Search 5 to 6 Hibernate Search	5	858	January 17, 2023
Unique entities based on field value Hibernate Search	3	129	June 29, 2024
How to search %like% as sql Hibernate Search	2	377	November 3, 2022
Search Returns No Results Hibernate Search	11	1898	August 17, 2020

Autocomplete with Hibernate Search 6

Related topics