I’m using Hibernate 5.2.12.Final, Hibernate Search 5.8.2.Final, Lucene 5.5.5.
I built a small app to run the FullTextSession class with a scrollable result set to scan the relevant tables and build initial indexes - all is well.
But based on the diagram of the Search Documentation, “Section 2.2.1. Lucene”, it appears that in order to have the indexes “shared” between JVMs, I’ll have to have the same volume mounted on each node of my clustered servers. Assuming this is possible, is this correct and what is meant by the same folder “shared” between JVMs?
The strategy you are referring to only works if you use a Lucene DirectoryProvider that manages sharing and locking internally. The default DirectoryProvider (FileSystem) does not.
From what I understand, even with the Infinispan DirectoryProvider you still need to coordinate your writes in some way to get reasonable write performance, and this is generally done with the JMS or JGroups backend in Hibernate Search.
I will let Sanne expand on this if he thinks it necessary, because he knows much more about the clustering part of Hibernate Search than I do…
The two links point indeed to the same document, but Yoann is pointing to two different sections using anchors.
In other words, see “21.4.4. Architecture considerations” and “21.3.8. Architectural limitations” .
In short, your interpretation is partially correct. You will need to enable some shared filesystem across the JVMs, but then you need to configure Hibernate Search according to those chapters to make sure it uses such shared mount points not for the “live” index but only to copy to/from it at periodic intervals. It is not possible to share the index directly on a shared mount because of locking issues.
Ok, based on what I’ve read, in order to get this to work with the least latency and problems, we’ll need: Wildfly, hosting a JMS instance against a database, accepting updates from each node, but only the master (running on this same box) will update the index. Replication will be done by each client per configuration (Example 14. JMS Slave configuration).
The only concern I have is that the nodes are non-EE (Tomcat). But they just have to implement similar code as shown in the MDBSearchController example. As long as a similar class is available within the non-EE application, Hibernate Search will use it (per the configuration) to write the updates to the queue.
On the other end, do I assume correctly, that a local app will be necessary to read the queue (per the configuration) and then HS process the updates? This should be very small/simple; is there an example?
On the other end, do I assume correctly, that a local app will be necessary to read the queue (per the configuration) and then HS process the updates?
If by “on the other end” you mean “on the master node”, then yes, you are correct.
This should be very small/simple
The master node doesn’t need to run your webapp, if that’s what you mean. Only Hibernate ORM + Hibernate Search + the “MDBSearchController” are needed.
; is there an example?
For the full setup, I don’t think so, at least not beyond the documentation. There are our integration tests, but they are more likely to be confusing than anything else, being full of code that is only necessary for testing.
For the JMS controller, you can have a look at the abstract base class in Hibernate Search. But the easiest solution is to extend this class and plug it into your JMS runtime. How to do that should be explained in the documentation of your JMS runtime, I guess.
Thanks for the guidance! Doesn’t look too hard. And we shouldn’t need to shard anything at the outset. I did a test index build on my workstation and against a copy of production and it was only 8.5 GB, very reasonable.