I need to add search to an existing application which has several modules. The older one was recently refactored from CMP to JPA using the orm/persistence files, while one of the new modules is Spring-based and contains a JPA configuration via code. They operate independently though through the same server pool.
Thus far I am testing in the older module w/o issue. But there are duplicate entity classes between the modules.
If I were to mark the duplicate/spring classes as indexed with the same field annotations, as I understand things, Hibernate would spin up TWO instances of the Lucene lib which means duplicate reader/writers, correct?
If so, what can I do short of refactoring code to not have duplicate entities?
Correct. And very likely to fail; the Lucene backend can only work with a single writer, and even 1 writer / N readers requires jumping through hoops (not officially supported/documented, but can be made to work).
You could use the Elasticsearch backend, which removes the “single instance” limitation.
Alternatively, you could do all of the following.
Set the location of your Lucene indexes to a network share.
Configure all but one of your app instances to be read-only. See here for hints on how to do that.
Use outbox-polling coordination so that entity change events are written to an “outbox table” in the database, and processed in the background.
Disable entity change event processing in your read-only app instances (because obviously, that wouldn’t work: these instances are not supposed to write to indexes).
Then all your app instances will be able to read indexes, and all your app instances will trigger reindexing events, but the actual reindexing (index writes) will only happen in a single app instance, sidestepping the “single writer” limitation of the Lucene backend.
We’re hoping to make this easier in the future with HSEARCH-5261, but this still needs some work.
Note you may also need to set hibernate.search.backend.io.strategy to debug in read-only app instances, so that these instances actually see index updates – without that, they’d be stuck with the initial view of the index. The strategy is called debug because it’s very naive (refresh on every search query) and will perform rather badly, and thus we didn’t intend it for production use, but with such a setup it’s currently your only solution.