FulltextField + Analyzer & Aggregation

Hello !

I would like to use an edge-ngram analyzer to aggregation my results and get a completion.

here is my configurer

@Component("myAnalysisConfigurer")
public class MyAnalysisConfigurer implements ElasticsearchAnalysisConfigurer {

	@Override
	public void configure(ElasticsearchAnalysisConfigurationContext context) {
		context.analyzer("edge_ngram").custom()
				.tokenizer("keyword")
				.tokenFilters("lowercase", "edge_ngram_filter");
		context.tokenFilter("edge_ngram_filter")
				.type("edgeNGram")
				.param("side", "front")
				.param("max_gram", 15)
				.param("min_gram", 1);
	}
}

here the annotation on my string field.

@FullTextField(name = "field_suggest", analyzer = "edge_ngram", searchable = Searchable.YES, aggregable = Aggregable.YES, projectable = Projectable.YES)

Here the mapping of my field seen throught Kibana :

"field_suggest": {
          "type": "text",
          "store": true,
          "analyzer": "edge_ngram"
        }

and the generated analysis

"analysis": {
        "filter": {
          "edge_ngram_filter": {
            "min_gram": "1",
            "side": "front",
            "type": "edgeNGram",
            "max_gram": "15"
          }
        },
        "analyzer": {
          "edge_ngram": {
            "filter": [
              "lowercase",
              "edge_ngram_filter"
            ],
            "type": "custom",
            "tokenizer": "keyword"
          }
        }
      }

Here the request

GET /acme-000001/_search
{
  "size": 0,
  "from": 0,
  "track_total_hits": true,
   "aggregations": {
    "suggestions": {
      "terms": {
        "field": "field_suggest",
        "size": 20
      }
    }
  },
  "query": {
    "match": {
      "field_suggest": "xxx"
    }
  }
}

But when i execute the search i get this response :

"root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Fielddata is disabled on text fields by default. Set fielddata=true on [field_suggest] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    ]

do you have any ideas ? Analyzer are not avalaible on @KeywordField annotation

It looks like we expose aggregable() on @FullTextField but it doesn’t do anything. I opened HSEARCH-3913. Thanks for bringing this to our attention!

I suppose you could define a native field with field data enabled. That being said, as the message rightfully reminds you, fielddata consumes a lot of memory, so you should probably avoid it.

I generally implement auto-completion by returning the entities directly, and displaying the best matches in the suggestions. Something like this:

List<Acme> suggestions = Search.session( entityManager ).search(Acme.class)
    .where( f -> f.match().field( "field_suggest" ).matching( "xxx" ) )
    .fetchHits( 20 )

And if you don’t want to load the entities, you can do this to get the value of “field_suggest” directly from Elasticsearch:

List<String> suggestions = Search.session( entityManager ).search(Acme.class)
    .select( f -> f.field( "field_suggest", String.class ) )
    .where( f -> f.match().field( "field_suggest" ).matching( "xxx" ) )
    .fetchHits( 20 )

That would suggest the best-matching hits, however, and not the most frequent terms. If what you’re trying to do is to suggest additional terms for a query, then I suppose my initial suggestion of relying on a native field is the best you can do.

Thanks for your reply ! :slight_smile:
I already consider the projection to retrieve only my field, but how i avoid duplicate results without aggregation ?
In my use case i want to use suggestion to provide to the user the possibilities of what he could find (for exemple the value of enumeration) so i have to suggest him this without getting multiple duplicate results

The thing to keep in mind with String fields is that there are two ways to map them:

  • As text (@FullTextField). Then they can be tokenized (using an analyzer), which is best for search. They cannot be used for sorts or aggregations, however (at least not efficiently).
  • As keyword (@KeywordField). Then they cannot be tokenized, but can be normalized (using a normalizer), which is not great for search (you basically only get case-insensitive exact search). However they can be used for sorts and aggregations.

If you need the features from both… just declare two fields: one for search, the other for aggregation.

@FullTextField(name = "field_suggest", analyzer = "edge_ngram")
@KeywordField(name = "field_keyword", searchable = Searchable.NO, aggregable = Aggregable.YES)

Then do this:

AggregationKey<Map<String, Long>> suggestAgg = AggregationKey.of("suggestions");
SearchResult<Acme> result = Search.session( entityManager ).search(Acme.class)
    .where( f -> f.match().field( "field_suggest" ).matching( "xxx" ) )
    .aggregation( suggestAgg, f -> f.terms().field( "field_keyword" ).maxTermCount( 20 ) )
    .fetch( 20 ); // Use 0 if you don't need the top hits

List<Acme> topHits = result.getHits();
Map<String, Long> suggestionsAndCount = result.getAggregation( suggestAgg );

Very nice !
thank you