Hibernate Search 6, Are types from Japanese(kuromoji) Analysis plugin supported?

nemo · October 1, 2019, 7:51am

We have japanese customers we are being implemented the like query provided for DBMS.
The Like query has been slowed because of the large amount of data. we are being verified with hibernateSearch + elasticsearch.

This is because we are looking for a similar search function for the following types:

like ‘%日本プラ%’ // Can’t search by basic ngram / edgeNgram. I guest it is katakana so It can’t search.
like ‘%暁商%’ // Searchable by basic ngram / edgeNgram
like ‘%フィロ%’ // Can’t search by basic ngram / edgeNgram. I guest it is katakana so It can’t search.
like ‘%Ｉｓｌａｎｄｓ’ // Can’t search by basic ngram / edgeNgram.

I may be wrong, The Kuromoji Plugin contains several nice features for Katakana and Hiragana.
Is there already some kind of Japanese(kuromoji) Analysis support for Hibernate-Search?

ES-Japanes(Kuromoji) Analysis Plugin
Lucene-Analyzers-Kuromoji

yrodiere · October 1, 2019, 8:39am

If you install your plugin on your Elasticsearch cluster, then yes, you can definitely use the analyzers/tokenizers/filters provided by this plugin.

You will have to declare analyzers differently, though. From what I understand you’ve been using Lucene factories to declare analyzers. Hibernate Search automatically translates these to their Elasticsearch equivalent, but that prevents you from using anything that isn’t part of the default Lucene analyzers.

To go beyond that, you need to declare analyzers in a more “native” way. You have two options:

The cleanest solution is probably to use an analysis definition provider.
If you really want to use the @AnalyzerDef annotation, you can use Elasticsearch-specific factories

nemo · October 1, 2019, 8:50am

Thank you your advice.

Additional Comment
Could you verify that the compatibility information is correct?
When adding the Aphache lucene dependency in pom.xml and building, the following error occurs.

    <!-- hibernate -->
    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-data-jpa</artifactId>
      <!--<version>2.1.8.RELEASE</version>-->
    </dependency>
    <dependency>
      <groupId>org.hibernate</groupId>
      <artifactId>hibernate-core</artifactId>
      <version>5.4.1.Final</version>
    </dependency>
    <dependency>
      <groupId>org.hibernate</groupId>
      <artifactId>hibernate-search-orm</artifactId>
      <version>5.11.3.Final</version>
    </dependency>
    <dependency>
      <groupId>org.hibernate</groupId>
      <artifactId>hibernate-search-elasticsearch</artifactId>
      <version>5.11.3.Final</version>
    </dependency>

    <!-- Apache lucene -->
    <dependency>
      <groupId>org.apache.lucene</groupId>
      <artifactId>lucene-core</artifactId>
      <version>5.5.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.lucene</groupId>
      <artifactId>lucene-analyzers-common</artifactId>
      <version>5.5.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.lucene</groupId>
      <artifactId>lucene-analyzers-kuromoji</artifactId>
      <version>5.5.0</version>
    </dependency>

2019-10-01 17:38:34.418  WARN 72891 --- [  restartedMain] ConfigServletWebServerApplicationContext : Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'entityManagerFactory' defined in class path resource [org/springframework/boot/autoconfigure/orm/jpa/HibernateJpaConfiguration.class]: Invocation of init method failed; nested exception is javax.persistence.PersistenceException: [PersistenceUnit: default] Unable to build Hibernate SessionFactory; nested exception is org.hibernate.search.exception.SearchException: HSEARCH400059: The tokenizer factory 'org.apache.lucene.analysis.ja.JapaneseTokenizerFactory' is not supported with Elasticsearch. Please only use builtin Lucene factories that have a builtin equivalent in Elasticsearch.
2019-10-01 17:38:34.461 ERROR 72891 --- [  restartedMain] o.s.boot.SpringApplication               : Application run failed

org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'entityManagerFactory' defined in class path resource [org/springframework/boot/autoconfigure/orm/jpa/HibernateJpaConfiguration.class]: Invocation of init method failed; nested exception is javax.persistence.PersistenceException: [PersistenceUnit: default] Unable to build Hibernate SessionFactory; nested exception is org.hibernate.search.exception.SearchException: HSEARCH400059: The tokenizer factory 'org.apache.lucene.analysis.ja.JapaneseTokenizerFactory' is not supported with Elasticsearch. Please only use builtin Lucene factories that have a builtin equivalent in Elasticsearch.
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1778)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:593)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:515)
	at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:320)

yrodiere · October 1, 2019, 8:54am

Compatibility is just fine. This error is caused by the fact that Hibernate Search doesn’t know how to translate this analyzer into its Elasticsearch equivalent.

You will have to declare analyzers differently, as explained above.

nemo · October 17, 2019, 9:49am

My idea is not clear, I think ElasticsearchAnalysisDefinitionProvider is only declared inside HibernateSearch.

So I have additional question about analysis definition provider.
Is it possible to use declared custom analyzer on ElasticSearch in HibernateSearch?
Please let me know how a custom analyzer should be defined in ElasticSearch and then used in HibernateSearch?

We want to find a way to use the below ElasticSearch’s analyzer in HibernateSearch.

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "ngram",
          "min_gram": 1,
          "max_gram": 30,
          "token_chars": [
            "letter",
            "digit",
            "whitespace",
            "punctuation",
            "symbol"
          ]
        }
      }
    }
  }
}



POST my_index/_analyze
{
  "analyzer": "my_analyzer",
  "text": "メディカルとは - コトバンク"
}

Best Regards

yrodiere · October 17, 2019, 10:38am

I don’t understand what you mean, but ElasticsearchAnalysisDefinitionProvider is definitely a pluggable component that you are expected to implement.

When you declare analyzers in the ElasticsearchAnalysisDefinitionProvider, Hibernate Search will generate JSON definitions and push them to Elasticsearch creates/updates the Elasticsearch mappings. By default this will only happen if the index doesn’t exist yet in Elasticsearch, but the behavior can be tweaked: see Hibernate Search 5.11.12.Final: Reference Guide

If you want Hibernate Search to push the analyzer definitions to Elasticsearch automatically upon index creation, use the analysis definition provider as explained above and in the documentation.

If, on the other hand, you already defined the indexes with their analyzers, or if you’re using templates to define the analyzers, then your analyzers will already be defined in Elasticsearch when Hibernate Search tries to index/search. So in this specific case, you don’t need to define analyzers in Hibernate Search. Just reference the analyzers by their name in your mapping:

@Field(analyzer = @Analyzer(definition = "my_analyzer"))
private String myField;

Hibernate Search will detect that the analyzers are unknown, but will just assume they are defined somehow.

Topic		Replies	Views
Can Hibernate Search support Chinese analyzer when it integrate with Elastic Search? Hibernate Search	4	812	March 27, 2019
Backend agnostic analyzer in HS6 Hibernate Search	2	502	November 19, 2021
Hibernate Search 7.2 custom analyzer not being considered Hibernate Search	1	58	December 23, 2024
Search UTF-8 Hibernate search 6.1.7 Hibernate Search	2	378	November 7, 2022
HS+ES: hibernate.search.lucene_version Hibernate Search	1	530	February 3, 2020

Hibernate Search 6, Are types from Japanese(kuromoji) Analysis plugin supported?

Related topics