Get the total result size of a query with aggregations and groupping

Hello

We are facing a problem that seems to be unsolvable in the current Hibernate implementation. Let’s say we have a query that is using aggregation and grouping:

select u.name, sum(o.amount) from UserEntity u join u.orders o where u.createdAt > ‘2018-01-01’ group by u.name

When we want to display the data in a UI grid, the obvious thing that comes into mind is to paginate the result (that can consist of thousands of rows). Pagination requires us to know the size of the result set, to calculate the amount of pages.

Normally, using native queries we would be able to to something similar to this:

select count(*) from (select u.name, sum(o.amount) from UserEntity u join u.orders o where u.createdAt > ‘2018-01-01’ group by u.name)

Unfortunately Hibernate doesn’t support subqueries in the from clause. Is there another way around this, except loading the entire result set to the memory and paginate on the application side?

Business requirements have ruled out infinite scrolling and using native query in that particular case.

In your case, only native SQL will do that and we advise you to do so.

Now, why is it that the business requirements rule out SQL?

The main reason is that according to the requirements the persistence layer should be totally abstracted away from the underlying DB implementation, as the product in question has to be prepared (to the best possible extent) to support multiple DB engines (including H2 in testing).

Pagination is a very poor candidate for native queries anyway, because of the differences in the way limit and offset are applied across different databases. The query that I’ve posted is just a simple example, aggregations that we use are much more complex and contain dynamic filters. To achieve the db agnosticism and at the same time use native query to get the total size, we would have to construct each aggregation query 2 times:

  • using native syntax to get the total size
  • using HQL/JPQL to apply limit and offset in a db agnostic way

Alternatively we would have to construct a strategy pattern that would choose the appropriate native query according to the dialect in use, but that would be ugly and untestable.

Either way, using a native query seems to be more like a workaround for a missing feature, not like a solution. I know that Hibernate can’t implement all db specific features of every database engine, but a subquery in the from clause is allowed by (I think) all major databases. Even if some of them do not support such a feature it could be validated on a Hibernate Dialect level.

If native query is the only way to get the total size, it means that Hibernate is unequipped to handle pagination in a database agnostic way and is in fact missing a pretty important feature.

Hibernate can handle pagination just fine for both JPQL and SQL queries. However, that works for selecting records not for aggregate results.

As for portability, you can still can do it even with native SQL.

If you think your requirement should be supported by Hibernate, send us a Pull Request with an implementation for it. After all, that’s the beauty of open source software development.

hi Vlad

Your link about portability in SQL is related to primary key generation strategies. It is a part of JPA and I fail to see the analogy to managing the limit/offset differences using native SQL. Can you elaborate?

As for your statement that “Hibernate can handle pagination just fine for both JPQL and SQL queries” - as you can see the pagination support is somewhat limited. Many business applications have reporting features that require both aggregation and pagination at the same time. I thought Hibernate is usable not only for simple CRUD apps :slight_smile:

As for the Pull Request - would be happy to do so, but I am working on a commercial application and my employer wouldn’t be happy if I spent this time doing open source stuff :stuck_out_tongue: Just pointing out that a feature is missing that would be very beneficial to many developers. From your post I can only read “things are good as they are and we see nothing wrong” or more IT specific “it works for me” :slight_smile: Anyways, I think the topic can be closed as there is nothing more to add and there is no way to solve the problem in a clean way. We will try to work around this issue.

Hibernate has been available since 2001 and is used by millions of applications. So far, there was no real demand for your feature.

But sure, if that’s important for your employer, I’m sure they will be interested in investing in it.

As (sadly) most employers and clients, mine too is interested in the results, not how we come up with them. He has no interest in that the solution is coded in a clean manner, as long as it works. That’s why (at least for now) we will have to go with a workaround (deadlines, deadlines everywhere).

I fixed the link which gives you more details about having both portability and native SQL queries.

Just as I’ve thought - you are proposing a Strategy pattern. It’s good but it has one flaw - basically untestable via automated tests (at least without an infrastructure overkill). We need to know that the code that we test during the build is the exact same code that is running on production - hence the requirement for total abstraction and a ban for native queries.

Lack of automatic testing is an infrastructure smell. Anyway, you have two ways to fix it:

  • implement the feature
  • use native SQL with DB-specific DAO

How about using a query like select count(distinct u.name) from UserEntity u join u.orders o where u.createdAt > ‘2018-01-01’ to count the elements?
I would suggest you take a look at Blaze-Persistence Pagination API which will handle this automatically for you. You just write a query like

PagedList<Tuple> pagedList = criteriaBuilderFactory.create(entityManager, Tuple.class)
  .select("u.name")
  .select("sum(u.orders.amount)")
  .from(User.class, "u")
  .where("u.createdAt").gtExpression("DATE '2018-01-01'")
  .page(firstResult, maxResults)
  .getResultList();

pagedList.getTotalSize(); // The total count
pagedList.size(); // The size of the page

And will that cover cases with groupping by many columns? Because this is where the simple count distinct becomes unusable.