Unique entities based on field value

I’m using an n-gram filter to add a suggestions/autocomplete feature. I’m coming across the issue where, since field values can be duplicated, the same value is being suggested multiple times.

For instance, if my db contained aB, aB, aB, aB, aB, aC, aD and I retrieved the first 5 from the n-gram search, I would get aB 5 times.

I want to retrieve the first five unique values (aB, aC, aD). Is this possible?

The current search code is very simple; just retrieving the first five:

SearchResult<Entity> result = searchSession.search(Entity.class).where(entity -> entity.match().field("autocomplete_field").matching(fieldValue)).fetch(5);
List<Entity> results = result.hits();

I don’t think there is a select distinct field from ... alternative in full-text search.
If you just need to get the unique values of a single field, you could try using the terms aggregation: Hibernate Search 7.1.1.Final: Reference Documentation :

.aggregation( countsByGenreKey, f -> f.terms()
                .field( "field", String.class )
                .maxTermCount( 5 ) )

But then, I don’t think that that would be a good solution… maybe @yrodiere would have an idea…

This is probably not the right way to do it. Maybe look into that instead:

There is not in Hibernate Search at the moment, but there could be in the future: [HSEARCH-868] - Hibernate JIRA
If you use Elasticsearch, you can probably leverage the relevant feature through the Elasticsearch extension in Hibernate Search, but that involves dealing with JSON.

It’s a good solution if:

Somehow I doubt it.

1 Like

This is exactly what I was looking for!

However, when I implement it verbatim into my codebase, I get this error:

Cannot infer type argument(s) for <T> aggregation(AggregationKey<T>, Function<? super AF,? extends AggregationFinalStep<T>>)

Here’s the relevant code.

AggregationKey<Map<Range<String>, Long>> countsByTitleKey = AggregationKey.of("countsByTitle");
SearchResult<String> result = searchSession.search(Entity.class)
		.where(entity -> entity.match().field("title").matching(title))
		.aggregation(countsByTitleKey, f -> f.terms().field("title", String.class).maxTermCount(5))