Join clause in Criteria API left out with versions 6.5.0.Final and above

Hi, we have a Spring Boot application running on 3.0.x which we are currently updating to the Spring Boot 3.3.3.
We are facing problems with Criteria API and join clauses after migration - one join is left out from final query (which should exist).

We tried upgrading Hibernate to 6.5.3.Final and 6.6.1.Final, without any luck. We also upgraded spring-data to 3.3.4, which also did not help. But we managed to pinpoint the Hibernate version which broke everything - 6.5.0.Final. Every other newer hibernate version fails (including 6.5.0), all without any code change.

Lets get back to the original version that comes with spring-data-3.3.3, which is 6.5.2.Final. With 6.5.2.Final version it seems that there is some sort of optimization performed when casting Fetch to Join. If we manually perform fetch and join on root, generated query is the same.

Here are the DTOs which we are using and also a piece of code which produces such behavior.
DTOs:

ProductEntity.java

@Data
@AllArgsConstructor
@NoArgsConstructor
@Builder
@Entity
@Table(name = "product")
public class ProductEntity implements Serializable {

    /** Used to regulate the number of entities to be included in one hibernate fetch operation.*/
    private static final int BATCH_SIZE = 30;

    @Id
    @Column(name = "prod_id")
    private Long prodId;
    @Column(name = "prod_code")
    private String prodCode;
    @Column(name = "prod_type")
    private String prodType;
    @Column(name = "prod_lifecycle")
    private String prodLifecycle;
    @Column(name = "prod_description")
    private String prodDescription;
    @Column(name = "login")
    private String login;
    @Column(name = "comment")
    private String comment;
    @Column(name = "party")
    private String party;
    @Column(name = "system_record_timestamps")
    private LocalDateTime systemRecordTimestamps;
    @Column(name = "system_update_timestamps")
    private LocalDateTime systemUpdateTimestamps;
    @Column(name = "record_modification_timestamp")
    private LocalDateTime recordModificationTimestamp;
    @Column(name = "creation_date")
    private LocalDate creationDate;
    @Column(name = "last_change_date")
    private LocalDate lastChangeDate;
    @Column(name = "managed_by")
    private String managedBy;

    @OneToMany(mappedBy = "product", cascade = CascadeType.ALL, fetch = FetchType.LAZY)
    @BatchSize(size = BATCH_SIZE)
    private List<ProductEntityEntity> ents = new ArrayList<>();
}

ProductEntityEntity.java

@Data
@AllArgsConstructor
@NoArgsConstructor
@Builder
@Entity
@EqualsAndHashCode(onlyExplicitlyIncluded = true)
@Table(name = "product_entity")
public class ProductEntityEntity implements Serializable {

    @Id
    @Column(name = "prod_entity_id")
    private Long prodEntityId;
    @Column(name = "prod_id", insertable = false, updatable = false)
    private Long prodId;

    @EqualsAndHashCode.Include
    @Column(name = "entity_type")
    private String entityType;
    @EqualsAndHashCode.Include
    @Column(name = "entity_id")
    private String entityId;
    @EqualsAndHashCode.Include
    @Column(name = "module")
    private String module;
    @EqualsAndHashCode.Include
    @Column(name = "initiator")
    private String initiator;
    @EqualsAndHashCode.Include
    @Column(name = "party")
    private String party;

    @Column(name = "system_record_timestamps")
    private LocalDateTime systemRecordTimestamps;
    @Column(name = "system_update_timestamps")
    private LocalDateTime systemUpdateTimestamps;
    @Column(name = "record_modification_timestamp")
    private LocalDateTime recordModificationTimestamp;

    @ManyToOne
    @JoinColumn(name = "prod_id")
    @ToString.Exclude
    private ProductEntity product;
}

Criteria API usage

public static Specification<ProductEntity> getProductOverviewSpecification(
        FilterDto filter) {
        return (alert, query, criteriaBuilder) -> {

            List<Predicate> predicates = new ArrayList<>();

            predicates.addAll(getProductPredicates(filter,
                criteriaBuilder, alert));

            Fetch<ProductEntity, ProductEntityEntity> entityFetch;
            Join<ProductEntity, ProductEntityEntity> initiatorEntityJoin;

            if (Long.class != query.getResultType()) {
                entityFetch = alert.fetch("ents");
                initiatorEntityJoin = (Join<ProductEntity, ProductEntityEntity>) entityFetch;
            } else {
                initiatorEntityJoin = alert.join("ents");
            }

            predicates.add(criteriaBuilder.equal(initiatorEntityJoin.get("initiator"),
                AlertEntityInitiator.YES.getFlag()));

            if (StringUtils.isNotBlank(filter.getMemberSign())) {
                predicates.add(criteriaBuilder.equal(initiatorEntityJoin.get("party"),
                    filter.getMemberSign()));
            }

            if (StringUtils.isNotBlank(filter.getPersonId())
                || StringUtils.isNotBlank(filter.getPersonApplicationModule())
                || StringUtils.isNotBlank(filter.getSuperPersonId())) {
                Join<ProductEntity, ProductEntityEntity> searchEntityJoin = alert.join("ents");

                predicates.addAll(getProductEntityPredicates(filter,
                    criteriaBuilder, searchEntityJoin));
            }

            return criteriaBuilder.and(predicates.toArray(new Predicate[0]));
        };
    }

As far as I traced out, it seems that this type cast in ifology behaves differently between versions.
For this purpose, we used 6.4.10.Final as last version which works ok.

Here are the generated queries for two different Hibernate versions:

Hibernate 6.4.10.Final

select
        pe1_0.prod_id,
        pe1_0.comment,
        pe1_0.creation_date,
        e1_0.prod_id,
        e1_0.prod_entity_id,
        e1_0.entity_id,
        e1_0.entity_type,
        e1_0.initiator,
        e1_0.module,
        e1_0.party,
        e1_0.record_modification_timestamp,
        e1_0.system_record_timestamps,
        e1_0.system_update_timestamps,
        pe1_0.last_change_date,
        pe1_0.login,
        pe1_0.managed_by,
        pe1_0.party,
        pe1_0.prod_code,
        pe1_0.prod_description,
        pe1_0.prod_lifecycle,
        pe1_0.prod_type,
        pe1_0.record_modification_timestamp,
        pe1_0.system_record_timestamps,
        pe1_0.system_update_timestamps 
    from
        product pe1_0 
    join
        product_entity e1_0 
            on pe1_0.prod_id=e1_0.prod_id 
    join
        product_entity e2_0 
            on pe1_0.prod_id=e2_0.prod_id 
    where
        pe1_0.creation_date between ? and ? 
        and pe1_0.prod_type=? 
        and pe1_0.prod_lifecycle=? 
        and e1_0.initiator=? 
        and e2_0.module=? 
        and e2_0.entity_id=? 
        and e2_0.entity_type=? 
    order by
        pe1_0.prod_code

Hibernate 6.5.2.Final

select
        pe1_0.prod_id,
        pe1_0.comment,
        pe1_0.creation_date,
        e1_0.prod_id,
        e1_0.prod_entity_id,
        e1_0.entity_id,
        e1_0.entity_type,
        e1_0.initiator,
        e1_0.module,
        e1_0.party,
        e1_0.record_modification_timestamp,
        e1_0.system_record_timestamps,
        e1_0.system_update_timestamps,
        pe1_0.last_change_date,
        pe1_0.login,
        pe1_0.managed_by,
        pe1_0.party,
        pe1_0.prod_code,
        pe1_0.prod_description,
        pe1_0.prod_lifecycle,
        pe1_0.prod_type,
        pe1_0.record_modification_timestamp,
        pe1_0.system_record_timestamps,
        pe1_0.system_update_timestamps 
    from
        product pe1_0 
    join
        product_entity e1_0 
            on pe1_0.prod_id=e1_0.prod_id 
    where
        pe1_0.creation_date between ? and ? 
        and pe1_0.prod_type=? 
        and pe1_0.prod_lifecycle=? 
        and e1_0.initiator=? 
        and e1_0.module=? 
        and e1_0.entity_id=? 
        and e1_0.entity_type=? 
    order by
        pe1_0.prod_code

So, with 6.4.10.Final we get two join clauses, but with 6.5.2.Final, we only get one (this is the missing one: join product_entity e2_0 on pe1_0.prod_id=e2_0.prod_id)

Can someone please check what might cause such behavior?

Thank you in advance.

What makes you think that this is wrong behavior? Hibernate ORM simply avoids adding an additional join on the same table for the same join predicate. Doing that makes no difference.

This might be a very specific situation, but if you closely look at the queries, you’ll see that also where part is a bit different. Last three columns in 6.4.10 are referenced from second join, not from the first one (which is the situation in the 6.5.0 version).

And that is actually the key difference, as this changed WHERE part serves as additional filter on results over first join.

All joins are inner joins of *-to-one associations, so there is no difference between formulating a where condition on one alias vs. the other. Both joins are interchangeable.

But if I drop one join and redirect all conditions to same alias, I get different result compared to old behavior. And that is also obvious as the filter completely changed.

Although additional join on the same table for the same join predicate might be treated as redundant, it widely depends on what was the objective of the query.

Here we wanted to match all records that have same id and column initiator set to 1 but also all rows that fulfill conditions over second alias (regardless of the value in this initiator column).

And in order to fulfill this, we constructed query that uses two inner joins. Some might say that this can also be achieved by rewriting this query, but the the hard truth is this - in 6.4.x version Criteria API generated two inner joins over same table (as we wanted and expected), but with 6.5.x this second join is completely discarded. Discarding the second join resulted in completely different query, which does not perform intended task.

What you are writing here makes no sense, so I assume that you either work on a different example than what you have shared or that you are doing a mistake somewhere.

When you have 1 row in table A joining against table B with a foreign from table A against the primary key of table B, you will always get 0 or 1 row. If you do that join again, this will not change the amount of the rows, but simply “add” the same data under “different” aliases to the result row.
It doesn’t matter if you use the alias of the first join or the second join, the data it uses for comparisons is the same, and when you use a comparison predicate in the where clause, the row will either be filtered or not.

Please provide a self-contained SQL example (tables + insert + query) that shows this misbehavior.

It is an older piece of code and possibly has some hidden mistake. I can share the test data for integration test, via which we observed change in behavior. Tested everything over PostgreSQL. Here are the table definitions:

CREATE TABLE PRODUCT
(PROD_ID             BIGINT NOT NULL,
 PROD_CODE           VARCHAR(20) NOT NULL,
 PROD_TYPE           VARCHAR(3) NOT NULL,
 PROD_LIFECYCLE      VARCHAR(1) NOT NULL,
 SYSTEM_RECORD_TIMESTAMPS         TIMESTAMP NOT NULL,
 PROD_DESCRIPTION         VARCHAR(64),
 LOGIN         VARCHAR(64) NOT NULL,
 COMMENT            VARCHAR(64),
 SYSTEM_UPDATE_TIMESTAMPS         TIMESTAMP NOT NULL,
 PARTY        VARCHAR(4) NOT NULL,
 RECORD_MODIFICATION_TIMESTAMP          TIMESTAMP NOT NULL,
 CREATION_DATE      DATE NOT NULL,
 LAST_CHANGE_DATE   DATE NOT NULL,
 MANAGED_BY      VARCHAR(1)
);
CREATE TABLE PRODUCT_ENTITY
(PROD_ENTITY_ID         BIGINT NOT NULL,
 PROD_ID             BIGINT NOT NULL,
 ENTITY_TYPE        VARCHAR(1) NOT NULL,
 ENTITY_ID          VARCHAR(20) NOT NULL,
 MODULE    VARCHAR(18),
 INITIATOR     VARCHAR(1) NOT NULL,
 SYSTEM_RECORD_TIMESTAMPS         TIMESTAMP NOT NULL,
 PARTY        VARCHAR(4) NOT NULL,
 SYSTEM_UPDATE_TIMESTAMPS         TIMESTAMP NOT NULL,
 RECORD_MODIFICATION_TIMESTAMP          TIMESTAMP NOT NULL
);

Inserts for the tables:

INSERT INTO PRODUCT
(prod_id, prod_code, prod_type, prod_lifecycle, system_record_timestamps, prod_description,
 login, comment, system_update_timestamps, party, record_modification_timestamp,
 creation_date, last_change_date)
VALUES (1, 'PRODUCT001', 'LUA', '3', '2022-11-16 10:00:00.000', 'This is a PRODUCT001 error.',
        'TESTER_1', 'no comment', '2022-11-16 10:00:00.000', 'XX', '2022-11-16 10:00:00.000',
        '2022-11-16', '2022-11-16');
INSERT INTO PRODUCT_ENTITY
(prod_entity_id, prod_id, entity_type, entity_id, module, initiator,
 system_record_timestamps, party, system_update_timestamps, record_modification_timestamp)
VALUES (1, 1, '0', 'PERSON_1', 'DCC', '1',
        '2022-11-16 10:00:00.000', 'XX', '2022-11-16 10:00:00.000', '2022-11-16 10:00:00.000');
		
INSERT INTO PRODUCT
(prod_id, prod_code, prod_type, prod_lifecycle, system_record_timestamps, prod_description,
 login, comment, system_update_timestamps, party, record_modification_timestamp,
 creation_date, last_change_date, managed_by)
VALUES (2, 'PRODUCT002', 'LUA', '0', '2022-11-16 10:00:00.000', 'This is a PRODUCT002 error.',
        'TESTER_1', 'no comment', '2022-11-16 10:00:00.000', 'XX', '2022-11-16 10:00:00.000',
        '2022-11-16', '2022-11-16', '1');
INSERT INTO PRODUCT_ENTITY
(prod_entity_id, prod_id, entity_type, entity_id, module, initiator,
 system_record_timestamps, party, system_update_timestamps, record_modification_timestamp)
VALUES (2, 2, '0', 'PERSON_2', 'DCC', '1',
        '2022-11-16 10:00:00.000', 'XX', '2022-11-16 10:00:00.000', '2022-11-16 10:00:00.000');

INSERT INTO PRODUCT
(prod_id, prod_code, prod_type, prod_lifecycle, system_record_timestamps, prod_description,
 login, comment, system_update_timestamps, party, record_modification_timestamp,
 creation_date, last_change_date)
VALUES (3, 'PRODUCT003', 'LUA', '3', '2022-11-29 10:00:00.000', 'This is a PRODUCT003 error.',
        'TESTER_1', 'no comment', '2022-11-29 10:00:00.000', 'XX', '2022-11-29 10:00:00.000',
        '2022-11-29', '2022-11-29');
INSERT INTO PRODUCT_ENTITY
(prod_entity_id, prod_id, entity_type, entity_id, module, initiator,
 system_record_timestamps, party, system_update_timestamps, record_modification_timestamp)
VALUES (3, 3, '0', 'PERSON_3', 'DCC', '0',
        '2022-11-29 10:00:00.000', 'XX', '2022-11-29 10:00:00.000', '2022-11-29 10:00:00.000');
INSERT INTO PRODUCT_ENTITY
(prod_entity_id, prod_id, entity_type, entity_id, module, initiator,
 system_record_timestamps, party, system_update_timestamps, record_modification_timestamp)
VALUES (4, 3, '0', 'PERSON_4', 'DCC', '1',
        '2022-11-29 10:00:00.000', 'XX', '2022-11-29 10:00:00.000', '2022-11-29 10:00:00.000');
INSERT INTO PRODUCT_ENTITY
(prod_entity_id, prod_id, entity_type, entity_id, module, initiator,
 system_record_timestamps, party, system_update_timestamps, record_modification_timestamp)
VALUES (5, 3, '0', 'PERSON_5', 'DMD', '0',
        '2022-11-29 10:00:00.000', 'XX', '2022-11-29 10:00:00.000', '2022-11-29 10:00:00.000');

INSERT INTO PRODUCT
(prod_id, prod_code, prod_type, prod_lifecycle, system_record_timestamps, prod_description,
 login, comment, system_update_timestamps, party, record_modification_timestamp,
 creation_date, last_change_date)
VALUES (4, 'PRODUCT004', 'LUA', '0', '2022-11-29 10:00:00.000', 'This is a PRODUCT004 error.',
        'TESTER_1', 'no comment', '2022-11-29 10:00:00.000', 'XX', '2022-11-29 10:00:00.000',
        '2022-11-29', '2022-11-29');
INSERT INTO PRODUCT_ENTITY
(prod_entity_id, prod_id, entity_type, entity_id, module, initiator,
 system_record_timestamps, party, system_update_timestamps, record_modification_timestamp)
VALUES (6, 4, '1', 'SPECIAL_3', null, '1',
        '2022-11-29 10:00:00.000', 'XX', '2022-11-29 10:00:00.000', '2022-11-29 10:00:00.000');
INSERT INTO PRODUCT_ENTITY
(prod_entity_id, prod_id, entity_type, entity_id, module, initiator,
 system_record_timestamps, party, system_update_timestamps, record_modification_timestamp)
VALUES (7, 4, '0', 'PERSON_3', 'DCC', '0',
        '2022-11-29 10:00:00.000', 'XX', '2022-11-29 10:00:00.000', '2022-11-29 10:00:00.000');
INSERT INTO PRODUCT_ENTITY
(prod_entity_id, prod_id, entity_type, entity_id, module, initiator,
 system_record_timestamps, party, system_update_timestamps, record_modification_timestamp)
VALUES (8, 4, '0', 'PERSON_6', 'DMD', '0',
        '2022-11-29 10:00:00.000', 'XX', '2022-11-29 10:00:00.000', '2022-11-29 10:00:00.000');

And queries, both generated by Hibernate (I just placed the concrete values that our tests use):

select
        pe1_0.prod_id,
        pe1_0.comment,
        pe1_0.creation_date,
        e1_0.prod_id,
        e1_0.prod_entity_id,
        e1_0.entity_id,
        e1_0.entity_type,
        e1_0.initiator,
        e1_0.module,
        e1_0.party,
        e1_0.record_modification_timestamp,
        e1_0.system_record_timestamps,
        e1_0.system_update_timestamps,
        pe1_0.last_change_date,
        pe1_0.login,
        pe1_0.managed_by,
        pe1_0.party,
        pe1_0.prod_code,
        pe1_0.prod_description,
        pe1_0.prod_lifecycle,
        pe1_0.prod_type,
        pe1_0.record_modification_timestamp,
        pe1_0.system_record_timestamps,
        pe1_0.system_update_timestamps
    from
        product pe1_0
    join
        product_entity e1_0
            on pe1_0.prod_id=e1_0.prod_id
	join
        product_entity e2_0
            on pe1_0.prod_id=e2_0.prod_id
    where
        pe1_0.creation_date between CAST('2022-09-01' as DATE) and CAST('2022-12-05' as DATE)
        and pe1_0.prod_type='LUA'
        and pe1_0.prod_lifecycle='3'
        and e1_0.initiator='1'
        and e2_0.module='DCC'
        and e2_0.entity_type='0'
	    and e2_0.entity_id='PERSON_3'
    order by
        pe1_0.prod_code;

this SQL returns 1 row

This is the Hibernate 6.5.2.Final generated query

select
        pe1_0.prod_id,
        pe1_0.comment,
        pe1_0.creation_date,
        e1_0.prod_id,
        e1_0.prod_entity_id,
        e1_0.entity_id,
        e1_0.entity_type,
        e1_0.initiator,
        e1_0.module,
        e1_0.party,
        e1_0.record_modification_timestamp,
        e1_0.system_record_timestamps,
        e1_0.system_update_timestamps,
        pe1_0.last_change_date,
        pe1_0.login,
        pe1_0.managed_by,
        pe1_0.party,
        pe1_0.prod_code,
        pe1_0.prod_description,
        pe1_0.prod_lifecycle,
        pe1_0.prod_type,
        pe1_0.record_modification_timestamp,
        pe1_0.system_record_timestamps,
        pe1_0.system_update_timestamps
    from
        product pe1_0
    join
        product_entity e1_0
            on pe1_0.prod_id=e1_0.prod_id
    where
        pe1_0.creation_date between CAST('2022-09-01' as DATE) and CAST('2022-12-05' as DATE)
        and pe1_0.prod_type='LUA'
        and pe1_0.prod_lifecycle='3'
        and e1_0.initiator='1'
        and e1_0.module='DCC'
        and e1_0.entity_type='0'
	and e1_0.entity_id='PERSON_3'
    order by
        pe1_0.prod_code;

this SQL returns 0 rows

Her is a simple HQL example that shows the relevant thing that you are trying to do:

select p
from Product p
join fetch p.ents pe1
join p.ents pe2
where pe1.initiator = '1'
  and pe2.module = 'DCC'

You are inner joining a *-to-many collection two times and try to filter the collection contents and the result that you get is not what you expect.
This is the reason why JPA does not allow to specify aliases for join fetch paths, because you’re going to shoot your foot.

Never ever use the alias of a joined fetched association except for further join fetching. Hibernate ORM allows you to do this to allow you to do nested join fetch, but that’s it. The need to cast from Fetch to Join should have already raised a red flag since that means you’re entering unknown territory.

If you want to filter i.e. decide if a ProductEntity is in the result or not based on ProductEntityEntity, then you should use a so called semi join which can be implemented with an exists subquery i.e.

select p
from Product p
join fetch p.ents
where exists (
  select 1
  from p.ents pe
  where pe.initiator = '1'
    and pe.module = 'DCC'
)

If you really need to filter the contents of a collection, I would advise you to use a DTO instead of an entity or if that is too much trouble, at least use Hibernate ORM Filters, since that concept is known to collections. Filtering a collection by using the join fetch alias in a predicate will lead to trouble, because Hibernate ORM believes that the state of the collection is in sync with the database. When you alter the collection in memory and auto-flushing kicks in, Hibernate ORM might delete data that you have filtered out with your query. With filters, you don’t have this problem, because filters are respected during auto-flushing.

Thank You very much for detailed analysis. I also came to an idea of subselect, but was reluctant to go into it. I am just wondering isn’t our problem very similar to the one discussed here: Selection with criteria and entity graph updates data - #8 by Steven_Mahieu

Specifically, last sentence in this thread states

That’s why @beikov confirmed this is a bug, we should still try to optimize query by reusing joins when possible but if used in the where clause create two separate ones.

We also have two joins and reference to the both aliases in WHERE clause.

Based on the communication here and Your suggestions, we will change our codebase but I am wondering if this behavior could have been documented more explicitly :slight_smile: