Hibernate Batch Fetch style recommendations


#9

Hi, thanks for this, very useful!

Could you confirm which preview version exactly of Hibernate ORM 5.3 you’re using?

We switched the default bytecode provider from Javassist to ByteBuddy. Could you also confirm which one you’re using, and ideally compare them?

Use either:

hibernate.bytecode.provider=javassist

or:

hibernate.bytecode.provider=bytebuddy

Thanks!


#10

Hello,

I have this problem with both bytecode providers. I first tried with version 5.3.0.CR1 which as far as I know, was using javassist as the default bytecode provider. Now I’ve tried with 5.3.0.CR2 which is using byte buddy as default one and I have exactly the same behavior…

Taking into account such heap differences between fetch styles, I would like to know which are your recommendations in this regard.

  • How the three different styles (legacy, padded and dynamic) impact at database and java level?
  • Do you have any benchmark that could help me to choose the most suitable style for my model?

#11

Hi @caristu ,

Any chance you could share your memory dumps privately with us if they don’t contain any sensitive information?

In any case, it would be helpful if you could put together a test case reproducing your issue using our test case template: https://github.com/hibernate/hibernate-test-case-templates/tree/master/orm/hibernate-orm-5 .

It definitely looks like something we should fix so the sooner we have more information about the issue, the better.

Thanks!


#12

Hello,

yes I can share the dumps without trouble. How can I send them to you?

Besides, I would like to share with you the following changeset[1], for your consideration. It helped us to reduce the heap when using Hibernate 3.6. But it seems to be not enough after the upgrade to Hibernate 5.3.

[1] https://github.com/alostale/hibernate-orm/commit/612675c92c97bacd61d0c07a0802344eca42f5c0


#13

I shared a Google Drive folder with you. Hope it will work :).

Thanks!


#14

It worked :). Dumps uploaded into the folder:

  • hb36.hprof: the memory dump having Hibernate 3.6 (patched with the changeset mentioned in my previous post)
  • hb53cr2.hprof: the memory dump having Hibernate 5.3.0.CR2

Thanks!


#15

I studied the dump yesterday.

For now, your best bet is probably to use a dynamic strategy, even if it will slow things down a bit.

We are investigating possible ways of improving the situation, we will keep you in the loop.

If you have some time, any chance you could use our test case template (https://github.com/hibernate/hibernate-test-case-templates/tree/master/orm/hibernate-orm-5) to provide a reproducible test case.

The idea would be to import in the test case all the model related to the entity FinancialMgmtAccountingCombination (i.e. this particular entity and all its relations). Not sure how much work it would be but that would be very helpful for us to play with your case and see how we can improve things with a real use case in mind.

Thanks!


#16

Hello,

ok, I would appreciate if you could keep me in touch about your progress in this situation.

Regarding your suggestion about using the dynamic strategy, have you measured/compared the differences when switching between the strategies?

It would be great if you could share any kind of benchmark (in case you have) that could help us to determine how this slow down can impact us. Do you also know how this change can affect us in terms of garbage collection?

I’ll try to extract that part of the model to be able to send you a test case, but it can take me some time.

Finally, have you been able to check the changeset I provide in my previous response? Do you think if it could make sense to include it in Hibernate?

Thank you!


#17

Regarding your suggestion about using the dynamic strategy, have you measured/compared the differences when switching between the strategies?

Not really. What you can also do is reducing the batch size to something like 5. It should reduce by ~ a third the number of EntityLoaders per LegacyBatchingEntityLoader and it should help.

For now, I have identified the EntityLoaders as the culprit.


#18

Hello,

I’ve tried with 5 as DEFAULT_BATCH_FETCH_SIZE, and the retained heap is indeed reduced to 392 MB. In our case this is still 100 MB higher than before.

I’ve uploaded the memory dump (batchreduced.tar.gz) into the Google Drive folder, just in case you want to take a look.

Thanks!


#19

Yeah sure, it will be higher than before as you still have EntityLoaders created but far less than before.

I think it might be a good trade-off for you for the time being.


#20

I insist a bit because that would be really helpful to improve the situation. I saw you work at OpenBravo so I thought maybe I could get the sources and do the work myself but the sources I found seem to be very old and I couldn’t find the files by browsing the Mercurial UI (I haven’t checked out the repo though).


#21

Hi,

I’m a @caristu’s coworker.

I’ve extracted Openbravo Hibernate’s model and created a test case checking retained heap on SessionFactory with legacy and dynamic batch fetch styles with different (10 and 50) batch sizes.

The numbers I got are:

Style         Batch Size      Heap
------------------------------------
Legacy                10     788MB
Legacy                50    1000MB
Dynamic               10     149MB
Dynamic               50     150MB

Here you can find the repo with the test cases (fetch-style branch).

If you want to check Openbravo complete sources:

Note Openbravo generates Hibernate mapping classes from meta-model defined in DB.

I hope this helps.


#22

Hi @alostale ,

Very nice thanks. It’s helping a lot.


#23

Follow-ups:

Discussions on the mailing list:


#24

Thanks a lot, that’s really great!

Btw, I guess here you meant Openbravo, not OpenConcerto :wink:

Thanks again!


#25

I just tested the same again with 5.3.0.Final, and the difference is awesome:

Style         Batch Size      Heap
------------------------------------
Legacy                10     228MB
Legacy                50     269MB
Dynamic               10     119MB
Dynamic               50     120MB

@gsmet and the rest, thanks for your great job!


#26

Glad to see a happy user :).

Thanks for taking the time to prepare the test case, it helped a lot validating we were on the right path. And fixing that will probably help a lot of our users.


#27

For the record, there are 2 patches:

  • the first one was not risky at all so we also included it into the next 5.2: it reduces the memory used by the LoadPlan based entity loaders (you have 13 of them with a batch size of 50) by sharing what could be shared: it reduces the memory used from 1 GB to ~470 MB in your case
  • the second one was only applied to 5.3 and lazy load the per lock mode loaders (it loads eagerly the two most commonly used and loads lazily the others). This accounts for the rest of the gain but obviously if you are using some exotic lock modes, more memory will be used once they get used in your application. It should help anyway as there is very little chance that you use the 11 lock modes involved for all your entities.

Assertion error in org.hibernate.engine.internal.BatchFetchQueueHelper
#29

@caristu The best way to control the amount of data you fetch is at query-time because the fetch strategy is given by the current business user requirements.

Default batch fetching strategy or mapping-time constructs like @Subselect, @BatchSize or @LazyCollection are like applying a band-aid on a broken foot. They don’t really fix the actual problem, they only bring some relief in the short-run.

If performance is important to you, you want to:

  • switch to LAZY fetching for all association types,
  • use entities only if you want to later modify them and benefit from optimistic locking or optimistic locking
  • use DTOs for read-only projections (e.g. trees, tables, reports)
  • avoid anti-patterns like Open-Session in View or enable_lazy_load_no_trans

If you do that, you will see that there is no need for stuff like @Subselect or @BatchSize, and, not only that you’ll get better memory utilization on the JVM size, but you avoid lots of processing on the DB side as well (CPU, memory, IO) as well on the networking layer.