Hint HINT_PASS_DISTINCT_THROUGH reduces the amount of Entities returned per page for a PageRequest down to below the configured page size (PostgreSQL)

Note: I’ve copied the below question from this unanswered stackoverflow post (it’s not my post but I am interested in any solutions/workarounds):

https://stackoverflow.com/questions/55921415/hint-hint-pass-distinct-through-reduces-the-amount-of-entities-returned-per-page

Original post:

I’m setting up a JPA Specification based repository implementation that utilizes jpa specifications(constructed based on RSQL filter strings) to filter the results, define result ordering and remove any duplicates via “distinct” that would otherwise be returned due to joined tables. The JPA Specification builder method joins several tables and sets the “distinct” flag:

final Join<Object, Object> rootJoinedTags = root.join("tags", JoinType.LEFT);
final Join<Object, Object> rootJoinedLocations = root.join("location", JoinType.LEFT);
...
query.distinct(true);

To allow sorting by joined table columns, I’ve applied the “HINT_PASS_DISTINCT_THROUGH” hint to the relevant repository method(otherwise, sorting by joined table columns returns an error along the lines of “sort column must be included in the SELECT DISTINCT query”).

@QueryHints(value = {
        @QueryHint(name = org.hibernate.jpa.QueryHints.HINT_PASS_DISTINCT_THROUGH, value = "false")
    })
    Page<SomeEntity> findAll(@Nullable Specification<SomeEntity> spec, Pageable pageable);

The arguments for said repository method are constructed as such:

final Sort sort = getSort(searchFilter);
final Specification spec = getSpecificationIfPresent(searchFilter);
final PageRequest pageRequest = PageRequest.of(searchFilter.getPageNumber(), searchFilter.getLimit(), sort);

return eventRepository.findAll(spec, pageRequest);

After those changes, filtering and sorting seems to work as required. However, the hint seems to cause “distinct” filtering to be applied after the result page is already constructed, thus reducing the number of returned entities in the page from the configured “size” PageRequest argument, to whatever is left after the duplicates are filtered out. For example, if we’d make a PageRequest with “page=0” and “pageSize=10”, then the resulting Page may return only 5 “SomeEntity” instances, although the database contains way more entries(177 entities to be exact in this case). If i remove the hint, then the returned entities number is correct again.

Question: is there a way to make the same Specification query setup work with correctly sized Pages(some other hints that might be added to have duplicate filtering performed before the Page object is constructed)? If not, then is there another approach I could use to achieve the required Specification-based filtering, with joined-column sorting and duplicate removal as with “distinct”?

PS: PostgreSQL is the database behind the application in question

My (rmf) additional questions:

Alternative approaches/workarounds?

  1. Use subqueries instead of joins (and avoid the need to use distinct altogether), but I don’t think JPA has support for specifying an order by in a sub-query? (in the case of a parent-child relationship, where we want to filter and sort parents based on fields in their children).

  2. Encapsulating the joins, and sharing them among different Specification instances, instead of each Specification create new instances of the joins (in the case where I’m ANDing separate Specifications instances together)? Not sure if this is possible or would work?

  3. Use separate queries to get counts, then ids, then entities, not sure if this will work with specifications:
    https://stackoverflow.com/questions/9418268/hibernate-distinct-results-with-pagination