SPI / JMS, migrating to hibernate-search 6.x questions

Hi,

I’m migrating our project (5.11.10) to the hibernate search 6 version following the guide you provided but I had some remaining questions.

We were using org.hibernate.search.orm.spi.SearchIntegratorHelper to get a SearchIntegrator from our EntityManagerFactory how can we do this in 6.x ? The class doesn’t seems to exist anymore.

We also had two dependencies hibernate-search-backend-jms and hibernate-search-serialization-avro who were working with the same version of our previous version of hibernate-search-orm.
Right now I dont see any 6.x version for those, and didn’t found anything about it in the guide.
Is it right to keep them on 5.11.10 while we use 6.x for the mapper-orm / backend-lucene ?

Sincerely,

Hello,

Indeed it doesn’t. That was an SPI, generally not intended for use in applications. Maybe if you tell me what you needed the SearchIntegrator for, I can help you do something similar in Hibernate Search 6.

You will find answers in this section of the migration guide.

You will have to move from JMS + Lucene in Hibernate Search 5 to the Elasticsearch backend in Hibernate Search 6.0 or (even better) outbox-polling coordination + the Elasticsearch backend in Hibernate Search 6.1 (currently 6.1.0.CR1, 6.1.0.Final coming next week, most likely).

Thanks for the quick answer!

We have a 3 nodes architecture, ​documented to work with one master in charge of the indexing and two slaves who were transfering entity updates to the master threw a JSM queue.

The SearchIntegrator was provided to an AbstractJMSHibernateSearchController in the master node who is listening to the JMS queue and update indexes.

In reality I just found out that in fact all the 3 nodes are masters pointing on the same master/masterCopy index directories and all of them use filesystem-master as directory provider. So the JMS queue wasn’t used in any way.

I’m not and hibernate search expert but from what I read this configuration wasn’t supposed to be working since only one node is allowed to write indexes and we have 3.

I’m looking to the elastic search backend links since it’s looks good for multi nodes architecture, and since I will probably have to change ours.
But we always used Lucene backend and I’m dreading that I will have to rewrite all our analysers, tokenizers implementations since every class come from org.apache.lucene.analysis package.

So if you have any advices they would be welcome.

Ok, then you shouldn’t need the SearchIntegrator in Hibernate Search 6 anymore.

You’re right. Either some changes were not indexed, or it crashed from time to time, or at the very least (if you had some extra configuration) it was much slower than it could have been.

It depends.

If you are simply using built-in analyzers, or custom analyzers built from built-in tokenizers/filters, then Elasticsearch should have you covered: they allow for configuration of custom analyzers, and from my experience almost everything Lucene provides is also available in Elasticsearch. It’s just configured differently. And by the way, Hibernate Search provides a way to configure Elasticsearch analyzers, though you can also use Elasticsearch’s built-in analyzers directly without any configuration (if you don’t need to pass arguments to the tokenizer/filter/etc.).

If you actually implemented custom analyzers, i.e. you wrote a class that extends org.apache.lucene.analysis.Analyzer, then it’s more complicated. You really would be better off finding a way to build a custom analyzer from built-in tokenizers/filters. I understand Elasticsearch allows for plugging in custom JARs containing custom classes, so I suppose you might be able to plug in custom org.apache.lucene.analysis.Analyzer, but that probably won’t be easy, and I can’t help since I’ve never done it.

What will be harder to migrate is anything that uses Lucene classes directly. Stick to the Hibernate Search APIs (in particular the Search DSL) as much as you can, because those will work for either backend: once you’ve written the code for one backend, there’s no need to rewrite it when you switch to the other backend.

Beyond that, I don’t think there’s much more to mention than what’s already in the migration guide. Good luck :slight_smile:

Thank you for all thoses great answers!

As a first step I migrated what we had to HS6, and I will soon try to change our backend to elasticsearch witch OpenSearch.

Yes we have a custom TokenFilter who is used for phonetic purpose in conjunction of an ElisionFilter, SynonimFilter and other standard classes to build our custom analyzer.

Hi again,

I finally been able to migrate to the elastic search backend and to deploy an OpenSearch cluster (with custom security!).
Yet I was wondering how should I manage the indices for each of my application environments (mainly testing and production).

Typically with the lucene backend I had different index root directory for each of my profiles.

But now should I make a single cluster who will be responsible of multiple profiles or should I deploy a cluster by profile ?
I don’t really know how to make it work in either case.

  • For the first case I saw that I could specify users and their role in order to restrict the access to some part of the indices and that indices are stored with a number like ‘entity-00001’ but I dont know how to create an ‘entity-00002’

  • For the other one I have a requirement to have all my clusters on the same VM, so I’m wondering how I could differenciate the clusters, I bet I will have to define new ports for each clusters ?

I was also wondering about unit and integration tests, the lucene backend had a convenient hibernate.search.backend.directory.type = local-heap that I dont have anymore, will I need to launch a dockerized cluster to run my tests ?

Best regards,

I think you will find your answer in this other post.

Yes, if you really must run multiple Elasticsearch clusters on the same VM, then you don’t have a choice, you’ll have to expose them through different ports. I don’t know how you’ll prevent them from joining the same cluster, but I suppose there are ways. Best ask on the Elasticsearch forums (well, OpenSearch in your case, I suppose).

If you are talking about launch an Elasticsearch cluster, then yes. Though a single-node cluster will be enough. There are easy ways to do that nowadays, such as TestContainers, docker-maven-plugin, or Quarkus’ dev-services (though the one for Elasticsearch is still in development).

Alternatively, you could potentially use a Lucene backend just for tests, but that will only work if you don’t have any Elasticsearch-specific code, and anyway I wouldn’t recommend that as your tests should run in an environement as close to production as is practical.