Customized L2 cache does not support pagination in Hibernate6

In my application, I need to use a custom second level cache。

In Hibernate 5, complete SQL is used as the cached key. All operations are running normally.

In Hibernate 6, change to using QueryKey. Moreover, currently only the return value of the generateHashCode() method in QueryKey can be used as the Key. This is not a problem for non paginated queries.

However, because the generateHashCode() method does not include the values of firstRow and maxRows, it can result in cached keys being the same when paginated queries are performed.

The problem presented is actually the inability to paginate.

	private int generateHashCode() {
		int result = 13;
		result = 37 * result + sqlQueryString.hashCode();
		// Don't include the firstRow and maxRows in the hash as these values are rarely useful for query caching
//		result = 37 * result + ( firstRow==null ? 0 : firstRow );
//		result = 37 * result + ( maxRows==null ? 0 : maxRows );
		result = 37 * result + parameterBindingsMemento.hashCode();
		result = 37 * result + Arrays.hashCode( enabledFilterNames );
		return result;
	}

My temporary solution is to release comments and overwrite the code. Afterwards, the pagination query ran normally.

    private int generateHashCode() {
        int result = 13;
        result = 37 * result + sqlQueryString.hashCode();
        // Don't include the firstRow and maxRows in the hash as these values are rarely useful for query caching
		result = 37 * result + ( firstRow==null ? 0 : firstRow );
		result = 37 * result + ( maxRows==null ? 0 : maxRows );
        result = 37 * result + ( tenantIdentifier==null ? 0 : tenantIdentifier.hashCode() );
        result = 37 * result + parameterBindingsMemento.hashCode();
        result = 37 * result + Arrays.hashCode( enabledFilterNames );
        return result;
    }

image

Just because two entries share the same hash code, that doesn’t mean the objects are the same. See Object#hashCode which explicitly allows this. If this change breaks your cache, then your cache implementation has a big problem, because even completely different queries could theoretically have the same hash code.

We don’t think it’s very useful to query cache different pages of a paginated results and think it is rare to do that, which is why we opted to not destroy the effectiveness of the hash code, by not including the firstRow and maxRows in the hash code.

With the release of Spring Boot 4, I upgraded to using Hibernate 7.

After carefully analyzing and debugging the code, I still insist that this is a problem。

My method of implementing L2 caching is not complicated, just using my own cache to implement DomainDataStoreAccess and RegionFactoryTemplate

The reason why I think this is a problem is that,taking the following query as an example:

Query SQL on page two is:

SELECTsa1_0.attribute_id,sa1_0.attribute_code,sa1_0.class_name,sa1_0.create_by,sa1_0.create_time,sa1_0.description,sa1_0.method_name,sa1_0.ranking,sa1_0.request_method,sa1_0.is_reserved,sa1_0.reversion,sa1_0.service_id,sa1_0.status,sa1_0.update_by,sa1_0.update_time,sa1_0.url,sa1_0.web_expressionFROMsys_attribute sa1_0ORDER BYsa1_0.url OFFSET 10 ROWS FETCH FIRST 10 ROWS ONLY;

Query SQL on page four is:

SELECTsa1_0.attribute_id,sa1_0.attribute_code,sa1_0.class_name,sa1_0.create_by,sa1_0.create_time,sa1_0.description,sa1_0.method_name,sa1_0.ranking,sa1_0.request_method,sa1_0.is_reserved,sa1_0.reversion,sa1_0.service_id,sa1_0.status,sa1_0.update_by,sa1_0.update_time,sa1_0.url,sa1_0.web_expressionFROMsys_attribute sa1_0ORDER BYsa1_0.url OFFSET 30 ROWS FETCH FIRST 10 ROWS ONLY;

Please note the difference between OFFSET

I think this should be considered as different SQL statements.

In the QueryKey object, the value of the sqlQueryString for both queries is the following statement (which is reasonable and I fully understand)

selectsa1_0.attribute_id,sa1_0.attribute_code,sa1_0.class_name,sa1_0.create_by,sa1_0.create_time,sa1_0.description,sa1_0.method_name,sa1_0.ranking,sa1_0.request_method,sa1_0.is_reserved,sa1_0.reversion,sa1_0.service_id,sa1_0.status,sa1_0.update_by,sa1_0.update_time,sa1_0.url,sa1_0.web_expressionfromsys_attribute sa1_0order bysa1_0.urloffset? rowsfetchfirst ? rows only

However, the second level cache uses QueryKey as the cached key, and the only public method for QueryKey is hashcode. If firstRow and maxRows are not added to hashcode, the Hashcodes of the SQL queries for these two queries will be completely identical and cannot be distinguished.

My Code is below:

    private String secure(Object key) {
        if (key instanceof QueryKey queryKey) {
            int hashCode = queryKey.hashCode();
            String hashCodeString = String.valueOf(hashCode);
            String secureKey = SecureUtil.md5(hashCodeString);
            log.trace("[Herodotus] |- SPI - Convert query key hashcode [{}] to secureKey [{}]", hashCode, secureKey);
            return secureKey;
        }
        return String.valueOf(key);
    }


    private String wrapper(Object key) {
        String original = secure(key);
        String tenantId = getTenantId();

        String result = tenantId + SymbolConstants.COLON + original;
        log.trace("[Herodotus] |- SPI - Current cache key is : [{}]", result);
        return result;
    }

You can see the two Debug screenshots below, one is the query on the second page, and the other is the query on the fourth page.

So I still think we should let go of the firstRow and maxRows comments in Hashcode. @beikov

I understand that the hashCode is the same in this scenario, but that shouldn’t be a problem, because hash collisions can happen even for unrelated queries, regardless of the firstRow/maxRows. If you think that hashCode values are unique or enough to tell apart query executions, then you just have wrong expectations. You must use equals to understand if two objects are the same. Your wrapper implementation for generating a key is simply wrong.

I understand that hash collisions might lead to somewhat worse lookup performance if you store multiple pages in your cache, because that will require some linear scanning. Last time we looked into this matter, we decided to remove firstRow/maxRows from the hashCode, because it reduced the amount of multiplications and improved performance results for scenarios with many different queries.

Oh, I understand now. Thank you for your patient explanation.