HS6 corrupting index after merge of detached entity with @OneToMany association

busitech · October 12, 2021, 7:36am

We just found how to instantiate an object before the session is created, and apply interception later the way Hibernate does:

Constructor<Entity> constructor = ReflectHelper.getDefaultConstructor(Item.class);
Entity entity = constructor.newInstance( (Object[]) null );
// deserialize
entity = applyInterception(entity);
// persist or merge

protected <T> T applyInterception(T entity) {
	SharedSessionContractImplementor s = getDelegate().unwrap(SharedSessionContractImplementor.class);
	MetamodelImplementor entityMetamodelImpl = s.getFactory().getMetamodel();
	final EntityPersister persister = entityMetamodelImpl.entityPersister(Item.class.getName());
	EntityMetamodel entityMetamodel =  persister.getEntityMetamodel();

	PersistentAttributeInterceptor interceptor = new LazyAttributeLoadingInterceptor(
		entityMetamodel.getName(),
		null,
		entityMetamodel.getBytecodeEnhancementMetadata()
				.getLazyAttributesMetadata()
				.getLazyAttributeNames(),
		s
	);
	( (PersistentAttributeInterceptable) entity ).$$_hibernate_setInterceptor(interceptor);
	return entity;
}

Our test cases work with this method equally well.

yrodiere · October 12, 2021, 8:52am

Glad that you found a solution!

If I understand correctly, you basically only need to merge entities in the session and have Hibernate ORM consider them fully initialized, without Hibernate ORM performing any select. If you’re using optimistic locking and bytecode-enhanced entities that track their changes, that seems reasonable. Judging by how org.hibernate.event.internal.DefaultMergeEventListener#onMerge(org.hibernate.event.spi.MergeEvent, java.util.Map) is implemented, that might be an already supported use case, actually…

I’m a bit worried by what I’m seeing though, so I’ll give a few fair warnings. Feel free to ignore them if you think I’m being too risk-averse; people usually do

You’re digging deep into internals, and this code might not compile in a future version of Hibernate ORM.
You’re essentially fooling Hibernate ORM into believing it instantiated these objects itself, so that it skips the initial database roundtrip. I hope your have an extensive test suite to check the resulting behavior matches what you want, because:
1. There’s no telling if this has deeper impacts than just skipping the initial database roundtrip; for all I know, some internal metadata might not be initialized correctly, causing Hibernate ORM to skip some updates.
2. This is probably not tested in Hibernate ORM, at least not this exact code. The behavior now might be different from the behavior in the next version of Hibernate ORM.
This will only work correctly if you eventually merge all deserialized entities into the session. But I assume you’re doing just that, otherwise I don’t think Hibernate ORM would persist your changes to entities.
I think this will result in Hibernate ORM considering all properties of all deserialized entities having been updated. Which is good for consistency, and I don’t think you can do much better anyway. But that may mean a few more UPDATE statements than you want.
Because of 4, I hope you’re using optimistic locking with @Version on all entities involved; otherwise you run the risk of some updates getting erased just because we considered all properties were updated, even if they weren’t.

busitech · October 12, 2021, 9:30am

No, they won’t be initialized at all, only the primitive properties will be, and every @ManyToOne and @OneToMany will trigger lazy loading, just like it did after a refresh. Every one of these objects will be either sent to persist or merge, and the associations will be armed like a land mine, waiting for Hibernate Search to step on them and load data from the database. ORM now passes Hibernate Search 6 disarmed associations after flush, and unless we refresh and request indexing a second time, the index becomes corrupt.

Yes, you preferred that the ORM would fix it, and I agreed, but they’re too busy, so I am left with few other options. Hibernate Search could do similar things, but I offered to work it out instead. At the rate I’m going, I might be able to provide a patch to ORM unless one of you get to it first. Now you can spend your time on Debezium

Remember our long term plan? The Hibernate team is going to fix this as soon as possible, so we can remove all of the workarounds…

Yes, this is what we have now, only with entities that have no proxies installed until after a refresh, which causes an extra database call to obtain. If this new code has no ill effects, we will be eliminating that extra fetch. Not the one before merge, but the one after persist or merge that followed the referesh call we were doing to prime the pump for Hibernate Search (and proxy init for selectively populating the client view after transaction is basically complete, right before commit / session close). This will slightly increase performance. I’m not sure if our current setup causes an initial fetch before merge or not. I’ll have to trace the SQL and see for sure if it does or does not.

Yes, when the client sends objects to the server, they are either for persist, merge or delete. They could also be just a parameter to a function call, in which case the business logic will start out by reading those objects in order to start processing.

Yes indeed, we have that today.

I am cautiously optimistic, but yes, the approach will have to be fully vetted and tested carefully. Have a look at the new test cases I added to the repo I shared with you. You can quickly experiment with them and see the code in action.

busitech · October 12, 2021, 11:59pm

Not at all, you’re 100% correct, and I completely share all of your concerns. I just hope the Hibernate team will have a chance to review the JIRA issue HHH-14839 before this workaround breaks, and that after their review, they will see the merit in making an improvement. For now, I’ll focus on getting it working, tested and in production.

My concern about the ORM change is that since the framework has been this way for so long, there probably are not going to be an avalanche of users clammoring for it.

The three-tier architecture we use is highly optimal for the type of on-prem enterprise business systems we develop, which target powerful and demanding end users. Unfortunately, the infaturation with the browser caused these client server RMI architectures to take a back seat, even though the experience of using a browser UI is still struggling to catch up after almost 30 years.

Thank you for all your advise and support!

https://hibernate.atlassian.net/browse/HHH-14839

busitech · October 18, 2021, 8:28am

Hello @yrodiere, our workaround has become somewhat solidified over the last week. We found some areas to improve, such as the persist and merge code is now divergent.

Our test case file BITTestCase10 contains quite a few scenarios which are giving us the desired results without a refresh/reindex after merge/persist.

Barring any unforeseen bugs we find in the next day or two, we are considering moving what you see there into production, using your 6.0.7-SNAPSHOT build. We have been testing our large system with good results also.

If you are able, we would appreciate any additional critique you may have to prevent the unforeseen, such as bugs in the solution we have missed by not testing enough scenarios. Obviously we don’t utilize every single ORM feature or mapping style, so we are mostly concerned about covering our use cases.

We are probably going to try to build a modified ORM version which includes our workaround, to see how many ORM test cases might fail.

Thank you!

yrodiere · October 18, 2021, 8:46am

This is excellent news!

I’m currently releasing 6.0.7.Final, you should have it on Maven Central today.

I’ll have a closer look just in case, but running the ORM test suite would indeed give you a more complete answer.

Good idea. Ideally you would then send a pull request for HHH-14839. Note that depending on the extent of your changes, it’s possible that we will need your additional code to be enabled through a configuration options for performance reasons; though more correct, I’m guessing it’s not completely free to go through every single association on an entity during merging. We can discuss that on the pull request, though.

busitech · October 18, 2021, 9:01am

One more change we’re going to make is to exit early from our before merge routine if an interceptor is already present on an object, which means the work has already been done by ORM (object wasn’t detached; object was freshly loaded from the database already prior to initiating the changes to be merged).

There are no changes before or after a call to EntityManager.remove(object) because ORM doesn’t support removing a detached entity.

Something I forgot to mention is that we’re only getting what we want with this byte code enhancement configuration, and we’re not pursing any of the other combinations at this time: (lazyLoading = true, biDirectionalAssociationManagement = true, inlineDirtyChecking = true).

Our changes were not implemented this way, as you will notice once you take a look. We’re only making a slight change to a single object after it is persisted, or before it is merged.

yrodiere · October 21, 2021, 9:42am

I’ve had a look. Here are my comments.

Fist, bi-directional association management through bytecode enhancement is great if you need it, but be aware of shortcomings, especially when dealing with collections that accept duplicate elements (e.g. List). There was a great explanation there: Disable automatic association management in Hibernate ORM entities by Sanne · Pull Request #6569 · quarkusio/quarkus · GitHub

Regarding the workaround itself, from what I understand it boils down to doing this when deserializing an entity:

		PersistentAttributeInterceptable interceptable = ((PersistentAttributeInterceptable) entity );
		// ...
		persistenceContext.addEnhancedProxy(entityKey, interceptable);
		persistenceContext.addEntry(entity, org.hibernate.engine.spi.Status.MANAGED, null, null, entity.getId(), entity.getVersion(), LockMode.NONE, true, persister, false);
		persister.getBytecodeEnhancementMetadata().injectEnhancedEntityAsProxyInterceptor(entity, entityKey, s);

Or this after persisting an entity:

		PersistentAttributeInterceptable interceptable = ((PersistentAttributeInterceptable) entity );
		// ...
		persistenceContext.addEnhancedProxy(entityKey, interceptable);
		persister.getBytecodeEnhancementMetadata().injectEnhancedEntityAsProxyInterceptor(entity, entityKey, s);

Or this before merging an entity:

		PersistentAttributeInterceptable interceptable = ( (PersistentAttributeInterceptable) entity );
		// ...
		PersistentAttributeInterceptor interceptor = new LazyAttributeLoadingInterceptor(
				entityMetamodel.getName(),
				entity.getId(),
				entityMetamodel.getBytecodeEnhancementMetadata()
						.getLazyAttributesMetadata()
						.getLazyAttributeNames(),
				s
		);
		interceptable.$$_hibernate_setInterceptor(interceptor);

The concept itself of adding lazy-loading to a newly deserialized/persisted/merged entity does make sense to me. Whether it’s safe… I honestly can’t tell, but if your tests and Hibernate ORM tests pass, it’s probably a good sign.

You’re aware of that, but I’ll restate this as a conclusion: the most important task now is to get rid of this workaround ASAP, both to be sure your application relies on tested code (Hibernate ORM’s), and to ensure you won’t get stuck because the SPI and internal code you’re using have no equivalent in a later version of Hibernate ORM.
Please ping me if you send a pull request for HHH-14839, even if it doesn’t pass all tests. As long as you can tell me that your application works with Hibernate ORM patched this way, and as long as you add relevant tests in your PR, we can start discussing the problem with the Hibernate ORM team, and the sooner the better.

busitech · October 28, 2021, 8:35am

I have incorporated what we need into a fork of ORM. It passes all but one test. I added a new class to our hibernate search test repo, which also passes using our build of ORM. I have yet to write any ORM tests which are apart from hibernate search. Even so, I wanted to get this much to you as soon as I could.

busitech · October 28, 2021, 8:56am

Our latest hibernate search tests pass without bi-directional association management, when combined with our patched ORM build.

busitech · November 4, 2021, 6:59am

Has anything been said to the ORM team yet?

A pull request has been created and added to the JIRA issue. I also implemented the other form of proxy when entities themselves are not the proxy (not PersistentAttributeInterceptable).

yrodiere · November 4, 2021, 8:46am

No, since I just learned of that PR. Now that you’ve pinged me, I will have a look

busitech · November 5, 2021, 12:30am

I thought there was a chance you may have mentioned our fork to the team once the code had been published, even though the PR was not set up yet, because the effect of this issue on Hibernate Search users could grow to become significant as more users upgrade to version 6, and discover the problems.

Thank you very much for the initial PR review , and I have addressed all of your feedback with either answers or code changes.

The PR where I found your review was actually a Closed PR which I could not delete. You will find the latest changes we made today in the branch refered to by the other PR having the Open status.

Topic		Replies	Views
HS6 not indexing add or delete, only update with @OneToMany @IndexedEmbedded Hibernate Search	14	1427	September 18, 2021
Do you find or refresh after persist or merge, just to use lazy loading? Hibernate ORM	0	754	November 22, 2021
HS 6 index update problem Hibernate Search	4	423	September 16, 2020
Issue with the Hibernate Search index Hibernate Search	2	421	September 2, 2021
HSEARCH700088: Invalid indexing request: the add and update operations require a non-null entity Hibernate Search	7	960	November 12, 2022

HS6 corrupting index after merge of detached entity with @OneToMany association

Related topics