Should I use an application-generated entity identifier or use the database native generator with Hibernate?


#1

Hi,

I am using the database native sequence generator using following tags in hibernate hbm file:
<id name=“contactId” column=“contact_id”>
<generator class=“native”>
<param name=“sequence”>PimContact_seq</param>
</generator>
</id>

I run an import job to import records in that table. The records are in the order of several thousands to even million. The performance of import is slow so when I use the SQL server Query Store to analyze queries fired on the database, I see that the select next value from the sequence is getting executed as many times as the number of records to be imported plus there are some more supporting queries related to sequence which are getting executed as many times as well.

I guess this is one of the reasons the import job is slow.

  • Are there any ways to optimize this?
  • Will using an application generated identifier (and avoiding database sequence generator) improve the performance of import job?

#2

For SQL Server, the IDENTITY generator will be used by the native id generator as explained in this article.

Switch to SEQUENCE generator and use the pooled optimizer.

Check out this article about how to control the Persistence Context size as well as the transaction length as well, which can help you speed up your batch processing task.


#3

Thanks Vlad… regarding my 2nd question: will generating an identifier through application will also help? I guess that will be the most optimized alternative.


#4

That’s a bad idea because:

  • the DB sequence generator can be used concurrently by both your application threads and the DB admin as well. If you use an app-generated sequence, the DBA will not know what’s the next available ID.
  • the DB sequence uses lightweight locks, so generating a new sequence value is very fast and the pooled optimizer speeds that up even more.
  • if you use a UUID, it will require more space, hence the indexes will take more memory and that will affect the clustered index as well, which is the typical way of organizing a table in SQL Server.

Therefore, use a DB sequence as it’s much better from a performance perspective.


#5

Is the DB SEQUENCE generator supported by every DB?
at least, the databases supported by the product are SQL Server, Oracle, PostgreSQL and recently MySQL support is being added in current release.


#6

Sequence is supported by all DBs but MySQL. For MySQL, you can define the Ids via orm.xml as explained in this article.


#7

For SQL Server, the IDENTITY generator will be used by the native id generator as explained in [this article ]

Thanks Vlad… by the way, I am confused a bit… the generator class above is ‘native’ however there is a param with name sequence named as PimContact_seq which is getting queried by SQL server before each insert, see below query from SQL server Query Store:

select next value for PimContact_seq as seq

so is this IDENTITY generator or the SEQUENCE generator? From above query, this seems Sequence generator itself.


#8

The Dialect#getNativeIdentifierGeneratorStrategy looks as follows:

public String getNativeIdentifierGeneratorStrategy() {
	if ( getIdentityColumnSupport().supportsIdentityColumns() ) {
		return "identity";
	}
	else {
		return "sequence";
	}
}

Therefore, it’s not clear why it’s picking up sequence on SQL Server in your case, but you can debug it and find out why.


#9

Anyway, so essentially it’s the Sequence generator that is getting used. Hence, it must be taking advantage of hibernate JDBC batching already right?

Currently hibernate.jdbc.batch_size is set to 30 but it could be tuned for better performance or 30 is the standard number that all applications should use?


#10

Anyway, so essentially it’s the Sequence generator that is getting used. Hence, it must be taking advantage of hibernate JDBC batching already right?

It shouldn’t. That’s why you need to debug it and see exactly what identifier it picks, or just log the queries and you will understand.

Currently hibernate.jdbc.batch_size is set to 30 but it could be tuned for better performance or 30 is the standard number that all applications should use?

There’s no standard number. You need to measure your application to find the right size for your use case.