Best way to cascade thousands of objects?

bjansen · January 13, 2022, 9:10am

Hi,

I’m facing a serious performance problem while trying to persist a root object that contains a large collection (100k objects). My application is ‘stuck’ for 20-30min between the call to myRepo.saveAndFlush(root) and the actual first SQL query in the logs.

Profiling the application show that all the time is spent in StatefulPersistenceContext.getOwnerId, which allocates a lot of temporary objects and triggers 1-2 GC/s. It looks like this method is called because of my bidirectional mapping, and the algorithm is not particularly well suited for ‘large’ collections. It might be the same problem as [HHH-1612] Serious performance lost within IdentityMap... - Hibernate JIRA.

I’m wondering what would be the best way to persist such a hierarchy: object A which contains a List<B> of 100k elements, and each B has a List<C> of < 10 elements.

My current mapping is:

<class name="A" table="a">
...
	<bag name="bList" inverse="true">
		<key column="a_uuid" not-null="true"/>
		<one-to-many class="B"/>
	</bag>
</class>

<class name="B" table="b">
	<bag name="cList" table="assoc_table" cascade="all-delete-orphan">
		<key column="b_uuid" not-null="true"/>
		<element column="el">
			<type name="..."/>
		</element>
	</bag>

	<many-to-one name="parentA" column="a_uuid"/>
</class>

I have configured batching, but the performance problem happens before SQL queries are run, so I don’t think it really matters.

Do I need to remove the cascade and persist the List<B> separately?

beikov · January 18, 2022, 8:21am

From what I can see, A and B are connected through a FK on B, so why do you need cascading from A to B? You can simply persist B objects that point to A through the FK.

Having said that, if you have an idea how to improve the performance for your use case, we are happily accepting PRs for that.

bjansen · January 31, 2022, 3:42pm

I don’t really need cascading from A to B, it just seems easier to persist a whole graph of entities.

I just tried changing to cascade="none" while keeping the <bag>. I’d like bList to be fetched automatically instead of making a manual select + a.setBList().

Now I’m getting a whole other class of problems. After persisting a, Hibernate modifies a.bList and replaces every element with an “empty” instance of B.

When I try to “restore” the original bList like this, I get a ConcurrentModificationException:

void saveA(A a) {
    var copyOfBList = List.copyOf(a.getBList());
    myRepo.save(a);

    a.setBList(copyOfBList);
    myRepo.save(a.getBList());
}

if you have an idea how to improve the performance for your use case

Well I don’t have any knowledge of how Hibernate works internally, that’s why I was asking if I was missing something.

beikov · January 31, 2022, 6:14pm

I don’t really need cascading from A to B, it just seems easier to persist a whole graph of entities.

This “easyness” comes with a price as you see and although I am sure there are ways to improve the situation, I think that if you need an improvement here, your best way to get it would be to provide a PR for it.

Now I’m getting a whole other class of problems. After persisting a , Hibernate modifies a.bList and replaces every element with an “empty” instance of B .

I guess that by “empty” you might mean that it is replaced with a managed proxy? Don’t worry, this shouldn’t affect your app.

When I try to “restore” the original bList like this, I get a ConcurrentModificationException

Yeah, better not do that

bjansen · January 31, 2022, 7:26pm

Well, it turns out the actual problem had nothing to do with what I thought it was. It felt like I was going the wrong way, so I started from scratch and noticed the Javadoc of StatefulPersistenceContext.getOwnerId states this:

This is performed in the scenario of a uni-directional, non-inverse one-to-many collection (which means that the collection elements do not maintain a direct reference to the owner)

As I explained in my initial post, my relation is bi-directional, so this method should not be called in my case. I set up a “logging breakpoint” in IntelliJ to log the parameters of this function, and quickly found out that the problem is caused by another not-directly-related mapping. I fixed it to be bi-directional, and now I can save 40k objects in 18s.

Thanks for your help, as I kinda expected there was nothing wrong with Hibernate, only with my mapping.

Topic		Replies	Views
Cascades is saturating the stack Hibernate ORM	4	602	December 3, 2020
Performance slow at merge operation (copyValue BasicPropertyAccessor BasicSetter) Hibernate ORM	2	649	November 10, 2020
Efficient Queries Avoiding Massive Joins with Broad Class Hierarchy Hibernate ORM	10	2018	August 28, 2018
Collection cascade with detached entities Hibernate ORM	4	453	May 27, 2024
Large data leads to OutOfMemory Hibernate Search	4	204	April 4, 2024

Best way to cascade thousands of objects?

Related topics