Processing millions of records with query.getResultStream()

fabianobonin · July 10, 2020, 2:00am

I am doing some tests to process millions of records with Hibernate using streams to avoid memory exhaustion.
In the test case below I am trying to process 20 million records.

		var sql = "select\r\n" //
				+ "	 generate_series as id,\r\n" //
				+ "	 'abcdefghijklmnopqrstuvwxyz 0123456789' as code,\r\n" //
				+ "  'abcdefghijklmnopqrstuvwxyz 0123456789' as description,\r\n" //
				+ "  'abcdefghijklmnopqrstuvwxyz 0123456789' as tags,\r\n" //
				+ "	 generate_series as version\r\n" //
				+ "from\r\n" //
				+ "  pg_catalog.generate_series(1, 20000000)";

		var em = emf.createEntityManager();
		try {
			em.getTransaction().begin();

			var query = em //
					.createNativeQuery(sql, GroupEntity.class) //
					.setHint(QueryHints.HINT_FETCH_SIZE, "1000");
			query.getResultStream().forEach(o -> {
				// do something
			});

			em.getTransaction().commit();
		} finally {
			em.close();
		}

With this setup, it consumes all available memory and raises an OOM error (I have 4GB of RAM available).

I presume it is because Hibernate is caching all entities in the first level cache.

If I create the query without assigning the result to an entity, it works as expected and the memory consumption is almost constant (100MB).

			var query = em //
					.createNativeQuery(sql) //
					.setHint(QueryHints.HINT_FETCH_SIZE, "1000");

I presume it is because as each result is not tied to an entity, Hibernate doesn’t cache anything.

But I want to get the entity instance instead of an Object[].

So, I used a workaround that solved it.
I manually detach each instance from the persistence context.
This way, the memory consumption is almost constant as well (100MB)

			query.getResultStream().forEach(o -> {
				// do something
				em.detach(o);
			});

Is there some recommended (standard) way to acomplish what I am trying to do? (read millions of entities without using too much memory).

It took me some time to find this solution, and it looks like a hack to me.
I presume many people will face the same problem when working with getResultStream().

If there is no way to disable first level caching when working with streams, it seems almost useless to use streams at all, as all entities will be stored in memory anyway.

Best regards,

Fabiano

sebersole · July 13, 2020, 3:15pm

If you select scalar rows (no collection joins), it would be better to clear the Session/EntityManager after X number of results. Assuming your driver responds appropriately to the 1000 row fetch hint, X=1000 would be a good start:

MutableInteger i = new MutableInteger();
query.getResultStream().forEach(
        (o) -> {
            // do something

            if ( i.incrementAndGet() > 1000 ) {
                em.clear();
            }
        }
);

NOTE : I use a custom MutableInteger rather than Java’s AtomicInteger because it has a worse, and here completely unnecessary, performance overhead.

But other than that, yeah, that would be the gist.

fabianobonin · July 15, 2020, 1:43pm

Thanks for your response.

Yes, I think this is the way to go.

And I presume you meant to clear the cache after each X number of results (your example clears the cache just once).

PS: I usually create a single element array when I need to have something final and mutable to use inside a lambda.

int[] count = new int[] { 0 };
query.getResultStream().forEach(o -> {
	// do something
	if (++count[0] % 1000 == 0)
		em.clear();
});

Topic		Replies	Views
Query.getResultStream() throws OOM Hibernate ORM	0	1349	September 21, 2019
Careful: Query.setMaxResults() can cause huge memory usage in ListResultsConsumer (initialCollectionSize) (Hibernate 6.x) Hibernate ORM	2	224	January 29, 2025
QueryImpl consuming more memory Hibernate ORM	2	881	February 21, 2018
Java request taking 40-50MB memory Hibernate ORM	8	2958	March 30, 2018
Hibernate Native Method of Loader Class is consuming more time Hibernate ORM	6	520	August 17, 2018

Processing millions of records with query.getResultStream()

Related topics