Hibernate Search 6.0.6 does not index an entity with an @IndexedEmbedded@OneToMany collection using mappedBy, when an element in the collection is added or deleted.
For clarity, we are using persist and remove, not a merge/cascade after modifying the collection. We have not tested this with cascade because we only use cascade in a very small number of our associations.
In Hibernate Search 5, the entity being added or deleted would have had the @ContainedIn annotation, but this annotation was removed in Hibernate 6. (The migration guide indicates that @ContainedIn is no longer necessary in Hibernate Search 6, and to remove it.)
We do see the index being refreshed as expected after an update on the inverse (@ManyToOne) side.
We have examples of this problem both with and without includePaths on the @IndexedEmbedded annotation, and neither work as expected.
We would like to know if we missing something, or if this is a bug?
This is tricky. I believe you’re hitting something similar to what we intend to solve with HSEARCH-3567.
In short, Hibernate Search 5 had (very) partial support for reindexing on deleted entities: whenever an entity was deleted, it would crawl through its @ContainedIn-annotated associations to retrieve the entities to reindex.
The problem is, it would only work when the association was already loaded. When it wasn’t, ORM would try to load association for the deleted entity, and would fail (well yeah, it’s deleted after all). We would sometimes catch the failure and simply do nothing (and you wouldn’t get any reindexing), sometimes not (and you would get a LazyInitializationException). In short, it would seem like it works… until it doesn’t.
The correct way to make sure those changes trigger reindexing is to always update associations symetrically, as explained here. In your case, to force reindexing when you delete your entity, you should also get the associated entity and update its @ManyToOne association explicitly, removing the deleted entity from the association. Hibernate Search will detect that change and reindex.
The good news is, this kind of limitation should be gone when we address HSEARCH-3567. The bad news is, in the meantime, you should be careful to update associations when you delete an entity.
@ContainedIn is no longer necessary in Hibernate Search 6.
Hibernate Search 6 infers indexing dependencies from the mapping, and raises errors at bootstrap when the equivalent of @ContainedIn cannot be applied automatically (for example an @IndexedEmbedded association with no inverse side).
Thus, the recommended approach when migrating is to simply remove all @ContainedIn annotations, then deal with the bootstrap errors, if any.
This means that automatic indexing is “safe” by default: whenever you apply @IndexedEmbedded to an association, Hibernate Search will automatically resolve the inverse side of that association and will make sure that any change in “indexed-embedded” entities lead to reindexing of affected indexed entities.
I wasn’t following HSEARCH-3567. Sounds like everything above is disabled for insert and delete? If I had noticed this issue, I probably would have held off our migration until this issue was implemented.
HS5 was doing an adequate job for us, and in our opinion, the release of HS6 should have had parity with HS5 in this regard. We have never had to do anything with the “other side” of a ManyToOne association before, and there would be no reason to hard code this behavior. The annotations are there to determine what happens when a ManyToOne is inserted, updated or deleted, and this should be part of the framework.
In fact, the includePaths property is supposed to be filtering which associations need to be advised of the change. Why would we want to maintain “the other side” when all that does is trigger a load of an array of entities, possibly very large, which may not be included in the includePaths? Wouldn’t that be a waste of a query and the resources to load a large collection just to add or delete an element which after the load, the index won’t even be changed? Isn’t optimization of loading / indexing the whole point of HS6, where specifying a limited number of includePaths allows the framework to know how to selectively update the index?
If we follow the “application must maintain the other side” logic further, what about creating and deleting an entity with a @OneToMany@IndexedEmbedded association on the other side, which is then @IndexedEmbedded on the other side of those entities? What about “the other side of the other side”? Does HS6 currently pick up on the traversal of nested @IndexedEmbedded from there, processing associations “deeply”, in order to trigger the rest of the indexes?
In particular, a common (dodgy) practice when creating/updating entities with Hibernate ORM is to only update the owning side of associations, ignoring the non-owning side.
This is not dodgy at all, it is practical, efficient, and performs well. This area is firmly in the domain of the framework to take care of and figure out. Our policy, for the good of performance, is to never touch the *ToMany side of associations during insert, update or delete, unless we intend to do something with all of that data, except in a few cases where the entity is self-referencing, and the hierarchy is maintained by the end user in the UI, and in these cases, very small collections have Cascade turned on.
Otherwise, there would be far too many unnecessary loads of collections for no productive reason, and frankly, too much boilerplate to write for a model with 640 entities and 1,308 foreign keys when the frameworks have all the annotations they need to do these things automatically.
The *ToMany side is mostly useful during server side batch processing, and as an aid to assemble the data needed to populate the hierarchical tree for the client side view, when the UI needs collections of certain database objects.
I’m glad that HSEARCH-3567 is planned, but the migration guide seems to be misleading about how automatic HS6 is, when in reality, only update seems to be supported today.
Let’s be clear: the only new limitation here is when you insert/delete an entity without updating the corresponding association.
When I say the practice is dodgy, it’s in the context of Hibernate Search. I.e. in the context of a tool that must go through associations in order to reindex.
Let’s say you have this model:
@Entity
@Indexed
class A {
@IndexedEmbedded
@OneToMany(mappedBy = "a")
List<B> b;
}
@Entity
class B {
@FullTextField
String text;
@ManyToOne
A a;
}
Assuming you load a B, then delete it, Hibernate Search 5 will behave like you want, but Hibernate Search 6 will not. This is the new limitation.
But it’s worse. Assuming you load a B, then set B.a to null… what will happen then? Obviously we’ve lost the information “B was contained in A”. Hibernate Search knows that it should reindex something, but it does not know what.
Even if Hibernate Search was somehow able to resolve that “the previous value of B.a was this instance of A”, what will happen when it reindexes this instance of A? If you’re lucky and A.b wasn’t loaded yet, all will be fine. If you’re unlucky and A.b was already loaded, and you didn’t update it, well, then… Hibernate Search will reindex the instance of A as if the instance of B you just deleted still existed, because it’s still referenced in A.b.
So in that case, Hibernate Search won’t reindex anything, even in Hibernate Search 5. That’s a pre-existing limitation. One that is documented only now, but has always been there, and that may already affect you, regardless of whether you migrate or not.
As far as I know, the only situations where Hibernate Search needs you to update both sides of the association consistently are situations where both sides would end up being loaded anyway, for indexing. In the example above, Hibernate Search requires you to update A.b, because A.b is used in indexing. A.b would be loaded during indexing no matter what.
Hibernate Search does not care about associations that are not used in indexing.
So for example, with this model:
@Entity
@Indexed
class C {
@IndexedEmbedded
@ManyToOne
D d;
}
@Entity
class D {
@FullTextField
String text;
@OneToMany(mappedBy = "a")
List<C> c;
}
You can perfectly well update C.d and expect every will work fine.
So if you want to skip updating some inverse sides of associations, you can. But you need to be careful about it. As I said, when using Hibernate Search, this practice is dodgy.
If that’s what you’re suggesting, no, I was not trying to trick you, or anyone. This guide is massive, and it took me weeks; I was bound to forget something.
With that said, it seems the limitation on insert/delete is more problematic than I anticipated. To me it was just part of a larger limitation that’s always been there, but it seems worse.
No, I wasn’t suggesting that at all. Of course not, and I did not anticipate this interpretation. My apologies and please forgive me for leaving that possibility open if it sounded that way. The migration guide is awesome, and so is the reference guide! You did a great job.
I just needed to share with you somehow that the migration guide is out of sync with the code, and needs to be updated to reflect what is actually supported. It did sound like everything would be automatic and taken care of, and I’ve learned to blindly trust the awesome docs. I don’t think I’ve ever found a mistake in any of them!
I had many users of the application report serious inability to find anything newly created which had a parent/child relationship on the day of rollout, which caused some business critical issues to surface. It was my fault that I didn’t manually test every preexisting feature of HS indexing.
The migration from HS5 to HS6 was massive also, and I was bound to forget something, too.
Thank you for adding HSEARCH-4303. I’ll watch it.
If so, Hibernate Search is just incomplete, and being a work in progress is okay, as long as I know about it, but it’s a regression from HS5, and the requirement to touch both sides should be totally removed at some point in the future. It shouldn’t be assumed or expected that the developer is going to do it.
The devleoper would have to read and cache all of the metadata that the framework already read and processed to enforce the existence of “both sides” and duplicate checking, in order to figure out for himself which of the dozen relationships in a given entity are indexed, and which ones have this or that property in the includePath, etc., in order to learn that an association will be read anyway.
I am unable to think 7+ layers deep in the hierarchical tree from every perspective, about where a reference to an entity or property is going to turn up. That is a task for the framework / computer to do, with the annotations it already has. If it needs to plug in tighter to the ORM, so be it, and it may require that, but it’s still the job of the framework, not the application developer IMHO.
I don’t understand why new limitations that didn’t exist in a prior release would be introduced in a later release? Maybe I’m missing something, but it seems like regression.
Why does the information have to be lost, and why would that be obvious, unless you assume there’s no way around the issue? There’s byte code enhancement, and the proxy techniques used by ORM to work what seems to be magic, or HS could keep a cache of its own, etc. In fact there could be a stream of before and after images being sent from ORM to an event handler somewhere under the covers. We don’t know?
I understand the challenge, but I don’t see it as insurmountable. I’ve seen change tracking at the block level in a SAN, and a replica being maintained across the WAN, so nothing like keeping the index in sync seems impossible just because a variable is set to null.
I do believe that it is a fundamental requirement for a search framework that the index would stays in sync, by whatever means necessary to achieve it, with the database. We had that in HS5 to the extend that I didn’t even have to think about it. It was beautiful, aside from the performance of @Transient which hopefully is fixed by the new dependency annotations, and the asynchronous / event driven future of HS6 that will be completely awesome, but maybe not until HS7.
Ok, sorry. I tend to get touchy about Hibernate Search 6; it’s my baby after all
Sorry about that
As I was trying to explain, one of the limitations (the lack of reindexing when deleting/inserting without updating the association) can be considered a regression; that’s why I opened HSEARCH-4303.
But as I explained there are other limitations affecting both Hibernate Search 5 and 6. Yes they should be solved. And yes they will be, ultimately. But the fact remains they are not currently solved, so currently, be it with Hibernate Search 5 or 6, not updating the inverse side of associations is dodgy.
Yes, we could. My point was not “we cannot”; it was “we don’t”. So currently it’s still a valid problem.
By the way, the next paragraph precisely to explain that another problem remains if we solve this one:
So we’d need yet another level of resolution here, which we don’t currently have: the ability to detect that A.b is out-of-date, and to infer its correct values.
Again, there are solutions. But again, those take time.
I totally understand and agree. Thank you very much!
I think the closer one can be to indexing the source of the data, and the further away from indexing derivatives of the source, the easier these problems become (or they disappear altogether).
Indexing the stale, “past tense” but still in memory versions of the source data, which tend towards getting out of sync, seems to spawn all kinds of strange timing issues and side effects like the ones you mention about being lucky or unlucky. Those lead to building workarounds perpetually until the root cause is addressed.
Reading the database journal would be an interesting way to implement an indexing engine. Getting a feed established, and using checkpoints to figure out where you left off and from where to resume later would be ideal. Then you could do a warm start, or a cold start, even if SQL were executed outside of the ORM.
This is essentially how an IBM product I worked with years go would do replication.
That’s true, but the opposite is also true: when you are dealing with low-level data (e.g. table rows), it becomes harder to implement complex bridges and transformations of structured data.
Yes, that’s exactly the approach we will take with the Debezium integration. Debezium will handle all the low-level stuff, but we will have to declare all our dependencies at the database schema level (as opposed to declaring dependencies at the entity level, like we currently do), and we will have to interpret database events as entity change events. A whole lot to do, but interesting stuff
I would imagine that in the majority of cases, the database schema and the entity classes are always in sync, and the entity classes are what generated the database schema in the first place. Would this allow all declarations about the database schema and dependencies to remain where they are? Ideally there would be a single centralized source of the configuration.
Yes they would be in sync. The work would be about using Hibernate ORM metadata to translate “entity” dependencies into “table” dependencies. So yes, that shouldn’t require extra configuration.
I have run all of the test cases I wrote for HSEARCH-4323 using 6.1.0.Alpha1 and none of the results changed. This could be because the other issues are presently in the way.
So I wrote three new reproducers to illustrate:
the logic which was compatible with Hibernate Search 5
how the same logic breaks with Hibernate Search 6
what changes need to be made to cause the desired output with Hibrernate Search 6
After you see them, I’ll be interested to hear your interpretation, such as if you want things to stay this way, or if you want to make more changes, or even give the ORM folks some feedback.
My own opinion is that Hibernate Search 6 needs much better detection of when to invalidate what’s in the buffer. It’s very easy to create changes that need to affect the index which are invisible to it now. This is because there are changes which do not result in the need to flush, or to hit the database that can still affect what the index looks like at the end of a transaction. This is counterintuitive and basically dangerous, as it requires the developer to manage the HS6 buffer from the outside looking in.
There are more complex test cases I should probably write, such as when @IndexedEmbedded properties are nested several layers deep.
Ok, I had a look at the tests, and it seems Hibernate Search does perform automatic indexing on add/delete now, but it doesn’t find any entity to reindex because of the state your entities are in (detached, not in sync with the database).