Hey all,
We use Hibernate Search with IndexedEmbedded in a 3-level depth structure.
I’m trying to limit the amount of the data that is coming from the relational database (i.e. Hibernate ORM projection) during indexing.
We have some blobs that we don’t index and which ones I’m trying to avoid to be pulled from the database.
I thought includePaths attribute will do that automatically but I don’t see a query difference using it.
Example code:
@Indexed
class PaymentSearch {
...
@OneToOne
@JoinColumn(name = "payment_id")
@IndexedEmbedded(includeEmbeddedObjectId = true)
Payment payment
...
}
class Payment {
...
@OneToMany(mappedBy = "payment")
@IndexedEmbedded(name = "invoices", includePaths = ["invoice.number"])
@IndexingDependency(reindexOnUpdate = ReindexOnUpdate.SHALLOW)
Set<InvoicePayment> invoicePayment = new HashSet<InvoicePayment>()
...
}
class InvoicePayment {
...
@IndexedEmbedded(includePaths = ["number"])
@IndexingDependency(reindexOnUpdate = ReindexOnUpdate.SHALLOW)
@ManyToOne
@JoinColumn(name = "invoice_id")
Invoice invoice
...
}
class Invoice {
...
@KeywordField
String number
...
}
Only the top level class (PaymentSearch) has @Indexed annotation.
Query that hits the database and that I would like to optimize:
SELECT invoicepay0_.payment_id AS payment_4_7_0_,
invoicepay0_.id AS id1_7_0_,
invoicepay0_.id AS id1_7_1_,
invoicepay0_.invoice_id AS invoice_3_7_1_,
invoicepay0_.payment_id AS payment_4_7_1_,
invoice1_.id AS id1_6_2_,
invoice1_.custom_attributes AS custom_att2_6_2_,
invoice1_.number AS number9_6_2_
FROM invoice_payment invoicepay0_
LEFT OUTER JOIN invoice invoice1_ ON invoicepay0_.invoice_id=invoice1_.id
WHERE invoicepay0_.payment_id='1'
I would like to be able to project only the relevant columns for indexing (i.e. invoice number) and skip everything else.
Interesting enough, the id and the payment_id columns of invoice_payment are even projected twice (though not sure if Hibernate Search has anything to do with this).
This is Hibernate Search 6.1.7 with Hibernate ORM 5.4.
Thanks in advance to anyone looking into this one.
Best,
Ivan
Hey,
Hibernate Search’s optimizations are mainly at the POJO level: it will only access properties that it needs to access for indexing, not ever touching any other. That works well with Hibernate ORM’s lazy loading, but that requires you to configure that lazy loading; by default, only associations are lazy-loaded, and then only *ToMany
associations.
This means in particular:
- Eager associations will always be loaded, regardless of whether they are useful to Hibernate Search or not. This doesn’t seem to be your problem here, but in general you should try to avoid eager associations unless you have a good reason.
- An entity will be loaded with all its columns, unless you configure Hibernate ORM in a specific way, but that’s not related to Hibernate Search itself.
In general, skipping the loading of a few text or numeric columns is unlikely to give you a large performance boosts, so if I were you I would only bother if I had very large columns (large byte arrays, blobs) that are currently loaded eagerly, for some reason.
If that’s your case, I’d argue that your problem goes beyond Hibernate Search, as your application will likely also suffer from loading these blobs unnecessarily from time to time. So what you need really is customization of the Hibernate ORM mapping. I can see two solutions currently:
- Avoid property types that imply eager loading (e.g.
byte[]
, JSON objects, …) and favor intrinsically lazy-loaded types (e.g. java.sql.Blob
/java.sql.Clob
). Short of some problems with your database/JDBC driver, this should make the relevant properties lazy, meaning if Hibernate Search doesn’t need to access those properties, the lobs won’t be loaded: at most, what will be loaded is an identifier of the lob, which is much lighter. Blob
/Clob
can, however, be a bit awkward to work with.
- Use Hibernate ORM’s bytecode enhancement and configure lazy groups appropriately. That can also work for non-
Blob
/Clob
properties, so it’s easier to use in your application, but a bit more complicated to set up (unless you use Quarkus, which enables bytecode enhancement by default).
As to future solutions… In Hibernate Search itself, for mass indexing in particular, we plan to add ways to control what’s loaded more finely in Hibernate Search 6.2; see [HSEARCH-4471] - Hibernate JIRA. But as I said above, you’re probably better off addressing your problem in Hibernate ORM.
Thanks for the detailed response, Yoann.
I’ll try to optimize the query on Hibernate ORM level then, though [HSEARCH-4471] - Hibernate JIRA looks promising.