Hibernate issuing individual insert statements, even though batch insert is enabled

In our Spring-data-JPA (2.1.0) using Hibernate 5.3.7, we are using following settings to enable the batch processing.

spring.jpa.properties.hibernate.jdbc.batch_size=15
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true
spring.jpa.properties.hibernate.jdbc.batch_versioned_data=true
spring.jpa.properties.hibernate.id.new_generator_mappings=false

and in our service,

List<T> entities = new ArrayList<>();
if(entities.size() % 500 == 0){

        repository.saveAll(entities);
        repository.flush();
        entities.clear()
}
entities.add(e);

So, the code is basically calling saveAll for every 500 records, and the batch_size is defined as 15. Does this difference have any impact on the way Hibernate behaves with respect to insertions?

But what I noticed is, I see individual insert statements 500 times, when flush is called. Is this expected behavior when hibernate.jdbc.batch_size is set?

How can I ensure that Hibernate is actually performing a batch insert if at all doing?

Also, does Hibernate provide any feature to issue an insert multi rows if Oracle database is used? I read, PostgreSQL supported with a setting on JDBC URL, but can this be achieved if Oracle is used instead, without using NativeQuery

Thanks

But what I noticed is, I see individual insert statements 500 times, when flush is called. Is this expected behavior when hibernate.jdbc.batch_size is set?

If you are logging SQL statements with Hibernate, you cannot tell if batching is used.

You need to use a tool like datasource-proxy or p6spy to see that JDBC batching is actually used. Check out this article for more details.

Also, does Hibernate provide any feature to issue an insert multi rows if Oracle database is used? I read, PostgreSQL supported with a setting on JDBC URL, but can this be achieved if Oracle is used instead, without using NativeQuery

PostgreSQL supports multi-row inserts via the reWriteBatchedInserts configuration property. However, that’s needed because, although you’re using a single database roundtrip, PostgreSQL processes each statement individually.

Oracle might not require multi-row inserts, so just make sure you are using SEQUENCE identifiers as IDENTITY generators disable batched inserts as explained in this article.

More, the batch processing example can be optimized too. Instead of just clear-and-flush, you want to commit the DB transaction from time to time to avoid a long-running transaction:

for (int i = 0; i < entityCount; i++) {
	if (i > 0 && i % batchSize == 0) {
		entityTransaction.commit();
		entityTransaction.begin();

		entityManager.clear();
	}

	Post post = new Post(
		String.format("Post %d", i + 1)
	);
	 
	entityManager.persist(post);
}

Check out this article if you want to know the best way to do batch processing with JPA and Hibernate.

Thanks Vlad, I will refer to the links provided. Currently ID is defined as below

@Id
@Basic
@GeneratedValue(generator = "ENT_SEQ", strategy = GenerationType.SEQUENCE)
@SequenceGenerator(name = "ENT_SEQ", sequenceName = "ENT_SID_SEQ")
@Column(name = "ENT_SID")
protected int entSID;

CREATE SEQUENCE  "ENT_SID_SEQ"  MINVALUE 1 MAXVALUE 9999999999999999999999999999 INCREMENT BY 1 START WITH 1 NOCACHE  NOORDER  NOCYCLE  NOPARTITION ;

I hope my sequence def is correct.

“More, the batch processing example can be optimized too. Instead of just clear-and-flush, you want to commit the DB transaction from time to time to avoid a long-running transaction:”

Unfortunately, we can’t do that, as the requirement is to do all or nothing and each call requires us to persist several thousand of records. We need to persist all or nothing.

From the initial round of testing, we noticed around 15 - 20 secs to persist about 10K. Which seems to be high?

You need to profile the batch process and to break down the response time into call stack. Then, you will know for sure why it takes so much time and which component causes the slow processing.

I am sorry I am just trying to understand the rationale behind what you said. I understood what you said about long running transaction but many times we would want the batch to fail as a whole if it ever does right, so we can re-run it from the beginning?