Complex Filter with Criteria API

I use Java 20 with Quarkus 3.0.2 and Hibernate 6.2.

I have customers, each customer can have their own customers, “end_customers”.

A customer usually has 1000-300k customers, I would say 3000-5000 on average.

Each customer can define a series of self-defined, custom fields.
Each field has a field type, like text, number, date, or a constrained text value.

The data is stored in a mariadb database.
Here’s the current SQL DDL:

CREATE TABLE end_customer
(
id varchar(36) NOT NULL,
external_id varchar(36) NOT NULL,
email varchar(255) NOT NULL,
first_name varchar(255) DEFAULT NULL,
end_customer_status varchar(12) NOT NULL,
version int(11) NOT NULL DEFAULT 0,
created_at timestamp NOT NULL DEFAULT current_timestamp(),
updated_at datetime DEFAULT current_timestamp() ON UPDATE current_timestamp(),

PRIMARY KEY (id),
UNIQUE KEY end_customer_email_unique (external_id, email),
INDEX end_customer_idx1 (external_id),
INDEX end_customer_idx2 (email),
INDEX end_customer_idx3 (end_customer_status),
INDEX end_customer_idx4 (created_at)

) ENGINE = InnoDB
DEFAULT CHARSET = utf8mb4
COLLATE utf8mb4_general_ci;

CREATE TABLE end_customer_field
(
id varchar(36) NOT NULL,
end_customer_id varchar(36) NOT NULL,
field_type_id varchar(36) NOT NULL,
value varchar(255) DEFAULT NULL,
version int(11) NOT NULL DEFAULT 0,
created_at timestamp NOT NULL DEFAULT current_timestamp(),
updated_at datetime DEFAULT current_timestamp() ON UPDATE current_timestamp(),

PRIMARY KEY (id),
CONSTRAINT end_customer_field_fk1 FOREIGN KEY (end_customer_id) REFERENCES end_customer (id) ON DELETE CASCADE,
CONSTRAINT end_customer_field_fk2 FOREIGN KEY (field_type_id) REFERENCES field_type (id) ON DELETE CASCADE,
UNIQUE KEY end_customer_field_unique (end_customer_id, field_type_id),
INDEX end_customer_field_fk1 (end_customer_id),
INDEX end_customer_field_fk2 (field_type_id)

) ENGINE = InnoDB
DEFAULT CHARSET = utf8mb4
COLLATE utf8mb4_general_ci;

Since end_customer_field is basically just a stupid key value lookup, there are no indexes on the table,
and the data type of value is simply varchar(255).

Now I want to allow my customers to execute dynamic filter queries on their end customers.
Each customer has 2 default fields which are part of the end_customer table - first name and email - and up to another 28 custom fields,
30 fields in total.

I allow up to 30 search criterias to be defined -
that means I need to join the customer table up to 30 times with the end_customer_field table -
That means that for each search criteria, we need to join the customer table with the end_customer_field table,
and for each data type other than text, we need to convert the text value in the database to a number or date field.
Further, I do allow up to 24 different logical comparison operators to be used, depending on the related data type.

I implemented all this… but for more complex filters, even on my local system, the query runs for several minutes
and then at some point just times out, so I urgently need to adjust and fix this to make it work.

Here’s the pattern-matching switch I use to create the Predicates:

private static Predicate toFieldMatcherPredicate(CriteriaBuilder criteriaBuilder, FieldMatcher matcher, String value, Join<CustomerEntity, CustomerFieldEntity> end_customerFields) {
return switch (matcher) {
case TEXT_EQUALS -> criteriaBuilder.equal(end_customerFields.get("value"), value);
case TEXT_EQUALS_IGNORE_CASE ->
criteriaBuilder.equal(criteriaBuilder.lower(end_customerFields.get("value")), value.toLowerCase());
case TEXT_NOT_EQUALS -> criteriaBuilder.notEqual(end_customerFields.get("value"), value);
case TEXT_STARTS_WITH -> criteriaBuilder.like(end_customerFields.get("value"), value + "%");
case TEXT_ENDS_WITH -> criteriaBuilder.like(end_customerFields.get("value"), "%" + value);
case TEXT_CONTAINS -> criteriaBuilder.like(end_customerFields.get("value"), "%" + value + "%");
case NUMBER_EQUALS ->
criteriaBuilder.equal(toNumberFieldValue(end_customerFields), toNumericValue(criteriaBuilder, value));
case NUMBER_NOT_EQUALS ->
criteriaBuilder.notEqual(toNumberFieldValue(end_customerFields), toNumericValue(criteriaBuilder, value));
case NUMBER_GREATER_THAN ->
criteriaBuilder.greaterThan(toNumberFieldValue(end_customerFields), toNumericValue(criteriaBuilder, value));
case NUMBER_LESS_THAN ->
criteriaBuilder.lessThan(toNumberFieldValue(end_customerFields), toNumericValue(criteriaBuilder, value));
case NUMBER_GREATER_OR_EQUAL ->
criteriaBuilder.greaterThanOrEqualTo(toNumberFieldValue(end_customerFields), toNumericValue(criteriaBuilder, value));
case NUMBER_LESS_OR_EQUAL ->
criteriaBuilder.lessThanOrEqualTo(toNumberFieldValue(end_customerFields), toNumericValue(criteriaBuilder, value));
case DATE_EQUALS ->
criteriaBuilder.equal(toDateFieldValue(criteriaBuilder, end_customerFields), toDateValue(value));
case DATE_NOT_EQUALS ->
criteriaBuilder.notEqual(toDateFieldValue(criteriaBuilder, end_customerFields), toDateValue(value));
case DATE_BEFORE ->
criteriaBuilder.lessThan(toDateFieldValue(criteriaBuilder, end_customerFields), toDateValue(value));
case DATE_AFTER ->
criteriaBuilder.greaterThan(toDateFieldValue(criteriaBuilder, end_customerFields), toDateValue(value));
case DATE_BEFORE_OR_EQUAL ->
criteriaBuilder.lessThanOrEqualTo(toDateFieldValue(criteriaBuilder, end_customerFields), toDateValue(value));
case DATE_EQUAL_OR_AFTER ->
criteriaBuilder.greaterThanOrEqualTo(toDateFieldValue(criteriaBuilder, end_customerFields), toDateValue(value));
case CONSTRAINED_VALUE_EQUALS -> criteriaBuilder.equal(end_customerFields.get("value"), value);
case CONSTRAINED_VALUE_NOT_EQUALS -> criteriaBuilder.notEqual(end_customerFields.get("value"), value);
};
}

private static Expression<Integer> toNumberFieldValue(Join<EndCustomerEntity, EndCustomerFieldEntity> end_customerFields) {
     return end_customerFields.get("value").as(Integer.class);
}

private static Expression<Integer> toNumericValue(CriteriaBuilder criteriaBuilder, String value) {
    return criteriaBuilder.literal(Integer.parseInt(value));
}

private static Expression<LocalDate> toDateFieldValue(CriteriaBuilder criteriaBuilder, Join<EndCustomerEntity, EndCustomerFieldEntity> endCustomerFields) {
    /* FIXME - STR_TO_DATE is mysql/mariadb specific */
    return criteriaBuilder.function("STR_TO_DATE", LocalDate.class, endCustomerFields.get("value"), criteriaBuilder.literal("%Y-%m-%d"));
}

private static LocalDate toDateValue( String value) {
    return LocalDate.parse(value, DateTimeFormatter.ISO_LOCAL_DATE);
}

}

Do you have any suggestions on how I can make the performance fast enough so this could actually be used?
Some ideas from my side:
Remove some boolean operators. GPT suggested removing the text contains, for instance
Limit the number of criterias from 30 to 15 or whatever
Store the data in a different way.
One idea would be one field table with 30 optional varchars. That would mean just one join
One idea would be one field table/data type - that would make the joining/lookup logic more complex,
but would avoid having to cast from varchar to number or date.
One idea would be to use a different storage medium… like a nosql database… at least GPT suggested that -
I don’t really have experience with nosql yet, to be honest.

Ideas and suggestions are highly welcome :slight_smile:

I don’t understand why you need 30 joins. Can’t you just re-formulate the query like this?

from CustomerEntity c
where exists (
  select 1
  from c.fields f
  where f.value = :param1
     or lower(f.value) = :param2
     ...
)

:blush: :pray: Thank you Christian for your reply and support, greatly appreciated!

A filter consists of filter groups, the filter groups can be connected with AND / OR / NOT.
Inside the filters, so far AND is the default between the single logical checksm but that could also be made more dynamic. Finally, we have a large set of logical comparison operators like

TEXT_EQUALS, TEXT_NOT_EQUALS, TEXT_EQUALS_IGNORE_CASE, TEXT_STARTS_WITH,
TEXT_ENDS_WITH,TEXT_CONTAINS, 
NUMBER_EQUALS, NUMBER_NOT_EQUALS, NUMBER_GREATER_THAN, NUMBER_LESS_THAN, NUMBER_GREATER_OR_EQUAL,
NUMBER_LESS_OR_EQUAL,
DATE_EQUALS, DATE_NOT_EQUALS, DATE_BEFORE, DATE_AFTER, DATE_BEFORE_OR_EQUAL, DATE_EQUAL_OR_AFTER,
CONSTRAINED_VALUE_EQUALS, CONSTRAINED_VALUE_NOT_EQUALS


To allow all that, we are currently creating the query dynamically using the criteria api, and it creates those joins. Is there a way to go with those inner queries that you used with the criteria api also, and you’d think that could lead to better performance then the joins?

Another idea I currently have is storing all the field value related data as json in a single column and then using the new Hibernate 6.2 composite aggregate mappings? But unsure if/how that could be combined with the Criteria API… totally new approach to me… and I would love to keep the changes to a minimum - I just need acceptable performance not necessarily perfect performance. But a query that runs over 1-10 seconds and times out would not be acceptable :slight_smile:

Is there a way to go with those inner queries that you used with the criteria api also, and you’d think that could lead to better performance then the joins?

Yeah, reducing joins is going to improve the performance significantly and yes, you can model that with JPA Criteria API.

Subquery<Integer> subquery = query.subquery(Integer.class);
subquery.select( criteriaBuilder.literal(1) );
Join<CustomerEntity, CustomerFieldEntity> end_customerFields = subquery.correlate( root ).join( "fields" );
subquery.where( createPredicate( end_customerFields ) );
query.where( criteriaBuilder.exists( subquery ) );

Thanks, will try that.

Would you think that only by using subqueries performance should be ok, or should I combine it with my approach of storing integers as int_values and dates as date_value in the same field value table row
(causing potentially 3 times the amount of storage data used) - to prevent Hibernate / the database from having to cast the string value.
?

You can always optimize further. Storing things into a JSON column and adding a GIN index or into array columns is always an option to optimize further, but also brings some complexity with it. You said you want acceptable performance. What I am suggesting should give you that.

1 Like

Hi @beikov Could you please suggest how can we use criteria builder query to filter data from Array of JSON column?

Please start a new thread and add more details to your question.