Hibernate Search 6 custom index settings

I would like to configure a custom tokenizer for using in my custom analyzer in Hibernate Search (6.0.8) with Spring Boot 2.5.x. According to the documentation (Hibernate Search 6.1.0.Final: Reference Documentation) I should use custom index settings like this:

spring:
  jpa:
    properties:
      hibernate:
        search:
          enabled: true
          backend:
            indexes:
              Lemma:
                analysis:
                  configurer: class:**.**.CustomAnalysisConfigurer
                schema_management:
                  settings_file: custom/index-settings.json

my custom/index-settings.json looks like

{
  "analysis": {
    "tokenizer": {
      "custom_ngram_tokenizer": {
        "type": "ngram",
        "min_gram": "2",
        "max_gram": "3"
      }
    }
  }
}

And the CustomAnalysisConfigurer looks like

package ***.elasticsearch

import org.hibernate.search.backend.elasticsearch.analysis.ElasticsearchAnalysisConfigurationContext
import org.hibernate.search.backend.elasticsearch.analysis.ElasticsearchAnalysisConfigurer

class CustomAnalysisConfigurer : ElasticsearchAnalysisConfigurer {
    override fun configure(context: ElasticsearchAnalysisConfigurationContext) {
        context.analyzer("customAnalyzer").custom().tokenizer("custom_ngram_tokenizer")
    }
}

And I would like to call it in my Entity like
@FullTextField(analyzer = "customAnalyzer")

When I use a default available tokenizer (like ngram) in CustomAnalysisConfigurer everything works fine. But I expect Hibernate will create an index for me with the settings from custom/index-settings.json. It looks like the file isn’t picked up at all. Also tried:

spring:
  jpa:
    properties:
      hibernate:
        search:
          enabled: true
          backend:
            analysis:
              configurer: class:**.**.CustomAnalysisConfigurer
            schema_management:
              settings_file: custom/index-settings.json

settings valid for all indexes. But this is also without the wanted result.

PS **.** is just for masking :wink:

I found out that I was working with a 6.0.x version. Although it was mentioned in the docs this feature settings_file: was not working for me. After upgrading to 6.1.x the settings are working the way I expect.

2 Likes

That’s expected, it’s a new feature of Hibernate Search 6.1 :).

Yes, as @gsmet said, settings_file is a new feature in 6.1, that’s why it was ignored in 6.0.

Just in case, I will add that if you only use the settings file to define a tokenizer, then maybe you don’t need it: you can perfectly well define tokenizers in your CustomAnalysisConfigurer. The example in the reference documentation only shows how to define token filters and char filters, but tokenizers work exactly the same way:

package ***.elasticsearch

import org.hibernate.search.backend.elasticsearch.analysis.ElasticsearchAnalysisConfigurationContext
import org.hibernate.search.backend.elasticsearch.analysis.ElasticsearchAnalysisConfigurer

class CustomAnalysisConfigurer : ElasticsearchAnalysisConfigurer {
    override fun configure(context: ElasticsearchAnalysisConfigurationContext) {
        context.analyzer("customAnalyzer").custom().tokenizer("custom_ngram_tokenizer")
        context.tokenizer("custom_ngram_tokenizer").type( "ngram" )
                .param( "min_gram", 2 )
                .param( "max_gram", 3 )
    }
}

Wow Tnx that is even easier