Using = any(:array) with Hibernate 6.4.4 on PostgreSQL 16

Hi all,

I’m trying to to improve the handling of some of the queries in our application. Our security framework automatically adds several restrictions to queries, based on the information the user is allowed to see. These restrictions usually boil down to something like property in (x, y, z). In some corner cases however, the lists of these values can become quite long, causing a very large number of bind parameters being added to the query. From what I’ve read so far, it is recommended to use property = any(:array) in these cases, which only requires a single bind parameter and is also handled more efficiently by PostgreSQL. I noticed Hibernate also uses this construction when batch fetching entities and collections.

This same question (or very similar) was asked some time ago in Using PostgreSQL’s = ANY(array) syntax with Hibernate 6.2.9. Here it was suggested to use array_contains (I presume that’s the same as arrayContains in HibernateCriteriaBuilder). However, this does not give me the desired result. It renders as ?@>array[property] in the SQL.

So my question is, is there any way to get hibernate to render property = any (?) or is ?@>array[property] a good replacement?

Best regards,
Emond

What do you mean? This should have the same semantics as property = any(?).

We did some testing, and although ? @> array[p] does have the same functional effect as p = any (?), it is not equivalent from the perspective of the database. For the former, PostgreSQL is unable to use indexes on the properties, whereas for the latter it does use these indexes. For example, compare these 2 very simple queries:

explain analyze select * from audit_record where id = any (array[8243949::bigint]);
---
Index Scan using audit_record_pkey on audit_record  (cost=0.43..8.45 rows=1 width=3751) (actual time=0.010..0.011 rows=1 loops=1)
  Index Cond: (id = ANY ('{8243949}'::bigint[]))
Planning Time: 0.068 ms
Execution Time: 0.019 ms

versus:

explain analyze select * from audit_record where array[8243949::bigint] @> array[id];
---
Gather  (cost=1000.00..159099.25 rows=18754 width=3751) (actual time=1.238..212.281 rows=1 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Seq Scan on audit_record  (cost=0.00..156223.85 rows=7814 width=3751) (actual time=132.376..201.812 rows=0 loops=3)
        Filter: ('{8243949}'::bigint[] @> ARRAY[id])
        Rows Removed by Filter: 1250943
Planning Time: 0.100 ms
JIT:
  Functions: 6
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 0.264 ms, Inlining 0.000 ms, Optimization 0.314 ms, Emission 4.009 ms, Total 4.587 ms
Execution Time: 212.393 ms

Both queries will indeed return the same row, but the first is executed way more efficient than the second.

That’s just a limitation of the database, and if you want to use conventional indexes you will have to stay with = any, or implement some sort of limit for the number of parameters after which you start using arrays.

My suggestion is testing the impact on performance of the = any vs array strategies with a varying number of parameters, and find a breakpoint where it might be worth utilizing one strategy against the other.

That’s exactly my point: I do not want to use @>, I want to use = any, but I can’t find a way to do that via the criteria API. In Using PostgreSQL's = ANY(array) syntax with Hibernate 6.2.9 the suggestion was made to use array_contains, but that doesn’t give me = any. So how do I get = any via the criteria API? Hibernate internally does use = any for batch fetching, so the construct exists.

You can use the HibernateCriteriaBuilder extension API which exposes the method:

org.hibernate.query.criteria.HibernateCriteriaBuilder#in(jakarta.persistence.criteria.Expression<? extends T>, T...)

That doesn’t work either. That results in in(?, ......), which is what we currently have and what I want to replace with = any.

According to this SO post, the indexed column must be on the left of the operator, so maybe you just need to swap the first and second argument in your array_contains use to make this work efficiently?

I think that’s only relevant for columns with an array type. This is a normal, single valued, column. I don’t think it’s a very good idea to use a GIN index on a simple BIGINT column. Also, the order cannot be changed, because HibernateCriteriaBuilder requires an array (or array type) on the left side.

It seems there is no “official” way to render the predicate as property = any (?) via the criteria API. I could probably use cb.isTrue(cb.sql("? = any (?)", Boolean.TYPE, path, values)) on HibernateCriteriaBuilder. However, that’s a rather cumbersome way to do such a simple task. I was hoping Hibernate had an official way to build such a predicate, especially since the same construct is used for batch fetching internally.

It’s really unfortunate that the PostgreSQL optimizer is so limited and requires dedicated syntax, but it shouldn’t be too hard to implement. All you have to do is change the last else branch in org.hibernate.dialect.function.array.ArrayContainsOperatorFunction#render to render this special syntax instead.

Please create a bug ticket in our issue tracker and link to this discussion. If you feel confident to work on this, we’d very much appreciate if you could create a PR with a fix for this.

I don’t think it’s appropriate to change an array_contains into an = any, because they can have quite different semantics on the database. They are entirely different operators. I’d much rather see dedicated methods like these (overloaded versions of the existing any with a subquery):

	<T> JpaPredicate<T> any(Expression<? extends T> expression, T... values);
	<T> JpaPredicate<T> any(Expression<? extends T> expression, Collection<T> values);

However, I really don’t know what would be needed to add this to HibernateCriteriaBuilder.

The code currently special cases the rendering to arg0 @> array[arg1] if arg1 is not an array, so as far as I understand, arg1 = any(arg0) should have the same semantics. What are your concerns?

My concerns are that these are different operators with different behavior. Not only can they result in very different query plans, the outcome might not even be the same, especially in the case of null. Take these examples, the first is NULL, the second is False.

select 3 = any (array[1,2,null])
select array[1,2, null] @> array[3];

From the perspective of a developer, I also think it is very unexpected to see = any end up in a query when instructing Hibernate to use array_contains. Also, this would make this very hard to find. HibernateCriteriaBuilder already has an any method for the subquery form. What’s against adding overloaded methods for the array form?

I was not aware of this problem with an array containing null, but you can easily workaround it, just coalesce to false. The only real problem I see though, is that arg1 = any(arg0) can’t use a GIN index on arg0, whereas arg0 @> array[arg1] can. It’s a pity that PostgreSQL isn’t smart enough to figure this out.

There is no native array_contains function in PostgreSQL, so whatever we render is fine as long as the semantics as described in the documentation are met.

How is it hard to find? You want to check if something is part of an array, so you look for a “contains” function that works with arrays, sounds simple enough to me.

The way any is modeled in the AST simply doesn’t play well with this sort of operation.

From what I understand, the best way forward would be to special case parameter/constant arrays i.e. if arg0 is an array of constants or parameters, then render it as coalesce(arg1=any(arg0),false) and otherwise stick to arg0@>array[arg1]. That way, we get maximum index usage and retain the desired semantics.

I would say that @> qualifies as a native operator for array contains, just as = is the native operator for equal. There is no native equal function in PostgreSQL either, but everyone expects CriteriaBuilder.equal to yield =. The PostgreSQL documentation for @> literally says: Does the first array contain the second.

I really don’t know how hard it would be to fix this, but in PostgreSQL, there are two forms of ANY: one takes a subquery, the other takes an array. This is very similar to IN: one takes a subquery, the other takes a list of values. The any with a subquery is already supported by Hibernate and is part of JPA. If it is very difficult to add the array variant, what about modeling it as a function in the AST? Something like any_array?

ANY is a quantifier that can be used around subqueries. This is a SQL standard behavior, so we can just wrap a subquery and render this without knowledge about where this quantified subquery expression is embedded in.
Implementing support for arrays is not so simple though if the database does not support this natively, because an emulation needs to be able to access all comparison operands. This is why modelling this as a function or a dedicated predicate is necessary. Since functions can be used as predicates directly, it is much simpler to just make this a function, hence my suggestion to model this within the scope of the array_contains function.

I made a suggestion that should satisfy your needs, yet keep everything working smoothly or even improve certain existing use cases.

I don’t have a strong opinion on whether we should provide a dedicated inArray method to create this construct directly in addition to the existing array_contains variants, but I don’t see a reason to not implement this through the existing array_contains function infrastructure.