HSearch Metamodel

Hi guys !

can we imagine having something like https://docs.jboss.org/hibernate/orm/5.0/topical/html/metamodelgen/MetamodelGenerator.html for the indexed fields of entities, alias of index etc… ?

See ya


At some point, certainly. We don’t plan on working on it for 6.0, though. Not because it’s useless, but because we’ve already added a lot of stuff to the backlog and it’d be nice if we released 6.0.0.Final before the end of the decade :slight_smile:

When we implement such a generator, I expect we’ll have to basically boot Hibernate Search in the generator (to address all the edge cases, including the bridges) and take advantage of the metadata API, which hasn’t been restored in Search 6 yet [EDIT: now it has]

If you want to work on restoring the metadata API, which would be a first step, please come chat with us before you try anything. It may not be very complex, but I expect it will require a significant amount of refactoring, and some time. Better start with all the information in mind :slight_smile:

EDIT: Note however that aliases of indexes would probably not be included in such a metamodel, since they are not needed to build a Hibernate Search query.

Out of curiosity, what do you need that metadata for?

Both metadata solutions mentioned above (the static metamodel and the metadata API) are aimed at helping build queries. The static metamodel gives you a solution for queries that are type-safe at compile time, and the metadata API allows you to programmatically discover fields available for queries, and their capabilities.

Neither would be appropriate if you want, for example, to inspect the Elasticsearch mapping, which is lower level and wouldn’t be completely visible though these APIs (you wouldn’t know whether doc values are enabled, etc.).

Hello @yrodiere,

thank you for the answers !!
I was thinking about metamodel because i’am currently building a “configurable” statistic engine on top of ElasticSearch, i use the HighLevelRest Client for most of my request (for now) and HSearch only for the indexation part.
And it could be nice to the use the fieldNames to be more “type-safe” but i can’t rely only on my Hibernate metamodel because i have multiple “virtual” field in my indexes wich are not describe in generated metamodels.

Interesting. By “virtual”, you mean fields that are defined in the Elasticsearch mapping, but not in Hibernate Search? I suppose they are either alias fields or multi-fields? Or maybe meta-fields?

Any reason you don’t want to rely on “native” predicates/sorts/aggregations/projections for these fields, such as what is described here?

There are also ways to post-process the JSON of generated search requests, though that should probably just be used for small changes (the code can easily get messy if you have lots of changes to apply to the search request).

By virtual, i mean field who are indexed in elastic but not really mapped in Hibernate.
I describe one example few month ago here => Hibernate Search 6 - Advanced mapping

this kind of fields are @Transient for hibernate and for the metamodel

List<Book> result = searchSession.search( Book.class ) 
        .where( f -> f.matchAll() )
        .sort( f -> f.field( "pageCount" ).desc() 
                .then().field( "title_sort" ) )
        .fetchHits( 20 );

In this example i would like to avoid the string “pageCount” and replace it by metamodel, same thing for the sort wich is define by annotation in the entites.

I my current case i have to build multiple statistics widget who rely on some entities, filter, order, and aggregation. I don’t want to have string who depends on my transient field without check at compilation time because shit will happen eventually.

the old reason i wasn’t rely on native HSearch predicates is i wanted to use some lucenne syntax directly i my API, and now i have something i didn’t know to do with HSearch yet as search on multiple entities at sametimes with common field (like title)

Alright, I think I understand: you don’t want to use the Search DSL at all, not because you can’t, but because it requires passing strings for field names.

What you call “virtual” fields is just fields as far as Hibernate Search is concerned, but they do not exist for Hibernate ORM, because the property is @Transient or the @*Field name does not match the name of the property (sort fields, typically). So when you try to rely on the Hibernate ORM static metamodel, you get stuck because some indexed fields are not represented in that metamodel.

Indeed, you’d need a separate static metamodel just for Hibernate Search in order to solve that. Thanks for the explanation.

You might be interested in the simple query string predicates. Or not, since you still have to pass the field name as a string :slight_smile:

Interestingly, that would be much more problematic with a static metamodel… I expect the Search API will have to be lenient when using the static metamodel in search queries targeting multiple entity types, and some problems will only be detected at runtime (such as a field having a different type in two targeted entity types).

Yes you get it !
Also i was thinking about in someway to generate the indexed representation of the entities, maybe extend from DocumentReference
It could be easier to map and expose the results throught an api.

So at the end we could have :
[GENERATED] User_.class => static metamodel for hibernate fields
[GENERATED] UserDocument.class => representation of indexed document
[GENERATED] UserDocument_.class => static metamodel for indexed fields

btw, with hibernate search can i get the complete _source part of my hits with .select(SearchProjectionFactory::documentReference) or should i select it with something else?

=> found it

.extension( ElasticsearchExtension.get() )
        .select( f -> f.source() )

You’re talking about extracting the whole data from Elasticsearch, right? Because otherwise you can retrieve managed entities from Hibernate ORM as results of Hibernate Search queries, and then transform them easily using tools such as MapStruct.

That’s also an interesting feature, though maybe not to be addressed at the same time.

I’ve thought about something similar, some sort of “reverse” mapping that would allow defining projections statically (with annotations), then ask Hibernate Search to perform the projection:

public class UserProjection {

    public Long id;

    public String firstName;
    public String lastName;

    public UserProjection.Company company;

    public static class Company {
        public String name;

A “full” representation of each index could be automatically generated by the static metamodel generation tool. But typically, you’ll also need the ability to write such DTO manually in order to handle large mappings where you do not want to fetch all the fields, just some of them.

Of course this kind of things is already possible to some extent with current APIshttps://docs.jboss.org/hibernate/search/6.0/reference/en-US/html_single/#search-dsl-projection-composite, but the static definition is not possible yet and would be a nice addition.

That being said, we’d have to improve support for projections first, in particular by adding support for projections on multi-valued fields

Yes exactly, in my case i can rely on the indexed field to produce result on my rest API directly. So it could be nice to project my hit’s _source as you explain directly to my DTO, so the swagger UI could be documented with a minimum effort and i won’t have to write mappers.

When you pull everything together there is so many thing to improve but it’s the really interesting part :smiley:

I will keep diving in my business case and let you know here if i have some other ideas.

do you think with the existing i can easly manage to convert the source to some class without writing each composite ? maybe using the jsonRepresentation of source

Using f.source() you can definitely do it. You can implement a class representing your data and ask Gson to convert the JsonObject to that type using gson.fromJson( theJsonObject, UserDocument.class ). You can even do it transparently using f.composite. Something like that may work:

public class UserDocument {
    public String firstName;
    public String lastName;

public class UserRepository {

    private EntityManager entityManager;

    private final Gson gson;

    public UserRepository {
        gson = new Gson();
        // Workaround for https://github.com/google/gson/issues/764:
        // Concurrent lazy initialization of type adapters can fail... so we initialize them eagerly.
        gson.getAdapter( UserDocument.class );

    public List<UserDocument> search() {
        return Search.session( entityManager ).search( User.class )
                .select( f -> f.composite( 
                        source -> gson.fromJson( source, UserDocument.class ), 
                ) )
                .where( f -> f.matchAll() )
                .fetchHits( 20 ); 

How hard it will be will heavily depend on your model, so I couldn’t say :slight_smile:

1 Like