Searching for two Fields of an @IndexEmbedded List/Collection

Peter_Muller · July 20, 2020, 8:23am

I think I’ve arrived at a set of requirements that’s not solveable with Hibernate Search 5.11. That’s ok, I’m not expecting a solution but a short statement that it’s not possible would be nice.

I have Cases (“Kauffälle”) with a List of Adresses (“KatasterAdressen”). Now I want to search for all Cases with the following Input (plus other inputs of case-data):

Problem
The @IndexEmbedded flattens the Adress in the search index and a search street: ABC && housenumber: 2 yields the results:

{ Street: “ABC-Street”, From: 1, To: 2 }
{ Street: “ABC-Street”, From: 100, To: 101} (match only on street name, because of the index flattening)
{ Street: “XYC”, From: 2, To: 2} (match only on house number, because of the index flattening)

Of course I only want the result 1. that is both the adress and house number should fit.

Questions
(1) The recommended hack-solution i’ve found somewhere in Stackoverflow is to put all adress data in one field (so that the information of what data belongs together is not lost). Unfortunately i can’t figure out a way that keeps the ngram only on the street name and the in range search for the house number all in one field. Do you know another possible hack, or is it simply not possible with Hibernate 5.11?
(2) I’ve seen Hibernate 6 supports nested fields which seems to solve my problem. Is there a way to hack this single feature myself for usage with Hibernate 5.11? Maybe directly access Lucene API or something?
(3) I’ve read you don’t publish dates, but is there a long term ETA for a first production release of Hibernate Search 6? “Beta 6” suggests it might be somewhat stable. Would you recommend to use it in Production?

The Case is simple (irrelevant Annotations, Fields left out):

@Entity
@Indexed
@...

public class Kauffall {

    @ElementCollection
    @IndexedEmbedded
    @Field
    @...
    private List<KatasterAdresse> adressen = new ArrayList<>();

    ...
}

The Adress is more complicated, because it expresses a street combined with a range of house numbers. I can’t find a way to reduce these five fields into one single field that would support completion of street names and a search of house numbers that are inside the range.

@Embeddable
@...
public class KatasterAdresse implements IPrimeentity {

    
    @Field(name = "strasse_ngram", analyzer = @Analyzer(definition = "edgeNGram"))
    @Field
    private String strasse;

    @Field(analyze = Analyze.NO, indexNullAs = "0000000000")
    private Long hausnummerVon;

    @Field(indexNullAs = "")
    private String hausnummerVonZusatz;

    @Field(analyze = Analyze.NO, indexNullAs = "9999999999")
    private Long hausnummerBis;

    @Field(indexNullAs = "zzzzzzzzzz")
    private String hausnummerBisZusatz;

Appendix:

Analyzer Configuration:

@AnalyzerDef(
    name = "edgeNGram",
    tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
    filters = {
            @TokenFilterDef(factory = ASCIIFoldingFilterFactory.class), // Replace accented characters by their simpler counterpart (è => e, etc.)
            @TokenFilterDef(factory = LowerCaseFilterFactory.class),
            @TokenFilterDef(factory = EdgeNGramFilterFactory.class, params = {
                    @Parameter(name = "maxGramSize", value = "20"),
                    @Parameter(name ="minGramSize", value = "3"),
            }),
})
@Analyzer(definition = "default")
@AnalyzerDef(name = "default",
        tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
        filters = {
                @TokenFilterDef(factory = ASCIIFoldingFilterFactory.class), // Replace accented characters by their simpler counterpart (è => e, etc.)
                @TokenFilterDef(factory = LowerCaseFilterFactory.class) // Lowercase all characters
        })

yrodiere · July 20, 2020, 9:00am

That hack will not work in your case, since you’re searching on a tokenized field. There is no way to distinguish between two tokens coming from two different indexed strings in the same field, unless… you use nested documents, which are only available in Search 6.

No, I don’t think it’s possible to backport the feature to Hibernate Search 5.11 without major changes, let alone with a simple hack.

I won’t commit to an ETA, but we’re definitely nearing the end. At this point the work is about implementing the remaining minor features, polishing the documentation, making usability improvements, checking some exotic Hibernate Search 5 use cases can be addressed with the Hibernate Search 6 APIs, etc.

The latest Beta (Beta8) can definitely be used in production. Some people already do.
Hibernate search 6 is tested at least as well as Hibernate search 5, and the API is relatively stable. The only changes you can expect before the full release are minor changes to the API or bugfixes that would change some minor behavior, but they should be very localized.

If you’re upgrading from Hibernate Search 5, the migration guide hasn’t been written yet. But the documentation of Hibernate Search 6 if almost complete, and even more detailed than the documentation of Hibernate Search 5 in some areas, so you should be able to get back on your feet. Some of the more exotic features (such as analyzer discriminators, dynamic sharding or more-like-this queries) haven’t been ported to Search 6.0, and never will be; but you should spot them quickly enough when migrating your mapping and configuration properties.

To sum up:

If I had to start a project with Hibernate Search right now, I’d 100% go with 6.
If I had a relatively up-to-date project using mostly standard features of Hibernate Search 5, I’d seriously consider at least trying to migrate, be it only to evaluate the cost of migration and prepare for the future.
If I was maintaining an ancient project relying on exotic features with hacks all over the place, then yeah, I’d wait for a migration guide, because migrating would probably be a lot of work.

Peter_Muller · July 20, 2020, 9:22am

Thanks for the information !!! This really really helps me make my decision.

Peter_Muller · July 21, 2020, 7:35am

The migration took me around 1 day for a medium complex application. I thought since there’s no migration guide i can share what i changed.

But first what I failed to migrate:
I had to delete my https://docs.jboss.org/hibernate/search/5.6/api/index.html?org/hibernate/search/indexes/interceptor/EntityIndexingInterceptor.html, because i found no counterpart in Hibernate 6. As it only kept some entities out of the index i just adapted to search to always exclude it.

Replace the depencencies:

         <!-- Search -->
         <dependency>
-            <groupId>org.hibernate</groupId>
-            <artifactId>hibernate-search-orm</artifactId>
-            <version>5.11.5.Final</version>
+            <groupId>org.hibernate.search</groupId>
+            <artifactId>hibernate-search-mapper-orm</artifactId>
+            <version>6.0.0.Beta8</version>
+        </dependency>
+        <dependency>
+            <groupId>org.hibernate.search</groupId>
+            <artifactId>hibernate-search-backend-lucene</artifactId>
+            <version>6.0.0.Beta8</version>
         </dependency>

Use the new properties with named backends:

   jpa.properties:
     hibernate.create_empty_composites.enabled: true
-    hibernate.search.default.directory_provider: filesystem
-    hibernate.search.default.indexBase: ./index/default
+    hibernate.search.backends.myBackend.type: lucene
+    hibernate.search.backends.myBackend.directory.root: ./index/default
+    hibernate.search.default_backend: myBackend
+    hibernate.search.backends.myBackend.analysis.configurer: de.muenchen.kps.LuceneAnalyzerConfig

Remove the AnalyzerDefs:

-@AnalyzerDef(
-    name = "edgeNGram",
-    tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
-    filters = {
-            @TokenFilterDef(factory = ASCIIFoldingFilterFactory.class), // Replace accented characters by their simpler counterpart (è => e, etc.)
-            @TokenFilterDef(factory = LowerCaseFilterFactory.class),
-            @TokenFilterDef(factory = EdgeNGramFilterFactory.class, params = {
-                    @Parameter(name = "maxGramSize", value = "20"),
-                    @Parameter(name ="minGramSize", value = "3"),
-            }),
-})
-@Analyzer(definition = "default")
-@AnalyzerDef(name = "default",
-        tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
-        filters = {
-                @TokenFilterDef(factory = ASCIIFoldingFilterFactory.class), // Replace accented characters by their simpler counterpart (è => e, et
c.)                                                                                                                                                  
-                @TokenFilterDef(factory = LowerCaseFilterFactory.class) // Lowercase all characters
-        })

Add a LuceneAnalysisConfigurer to configure the analyzers:

public class LuceneAnalyzerConfig implements LuceneAnalysisConfigurer {
    @Override
    public void configure(LuceneAnalysisConfigurationContext context) {

        context.analyzer("default").custom()
                .tokenizer(StandardTokenizerFactory.class)
                .tokenFilter(ASCIIFoldingFilterFactory.class)
                .tokenFilter(LowerCaseFilterFactory.class);

        context.analyzer("edgeNGram").custom()
                .tokenizer(StandardTokenizerFactory.class)
                .tokenFilter(ASCIIFoldingFilterFactory.class)
                .tokenFilter(LowerCaseFilterFactory.class)
                .tokenFilter(EdgeNGramFilterFactory.class)
                .param("maxGramSize", "20")
                .param("minGramSize", "3");

    }
}

Use the analyzers with the new FullTextField Annotation:

-    @Field(analyzer = @Analyzer(definition = "edgeNGram"))
+    @FullTextField(analyzer = "edgeNGram", searchAnalyzer = "default")
     private String ort;

EnumBridge is not required anymore:

-    @Field(analyze = Analyze.NO, bridge=@FieldBridge(impl= EnumBridge.class))
+    @GenericField(sortable = Sortable.YES)
     @Column(name = "BEARBEITUNG_AUSGEWERTETNACH")
     @Enumerated(EnumType.STRING)
-    @SortableField
     private AusgewertetNach ausgewertetNach;

BigDecimalBridge is not required anymore:

-    @Field(analyze = Analyze.NO)
-    @NumericField
-    @SortableField
-    @FieldBridge(impl = BigDecimalNumericFieldBridge.class)
+    @ScaledNumberField(decimalScale = 2, sortable = Sortable.YES)
    private BigDecimal nettoflaeche;

Facets now seem do be done with “aggregable” (but i did not test this yet):

-    @Field(analyze = Analyze.NO, bridge = @FieldBridge(impl= EnumBridge.class))
-    @Facet(encoding = FacetEncodingType.STRING)
+    @GenericField(sortable = Sortable.YES, aggregable = Aggregable.YES)
     @Column(name = "FLURSTUECK_GEMARKUNG")
     @Enumerated(EnumType.STRING)
     private Gemarkung gemarkung;

The Indexing on startup needs to be called differently:

     @EventListener(ApplicationReadyEvent.class)
     @Transactional(readOnly = true)
     public void startIndexing() throws InterruptedException {
-        final FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(entityManager);
-        fullTextEntityManager.createIndexer().startAndWait();
+        SearchSession searchSession = Search.session( entityManager );
+        searchSession.massIndexer()
+                .startAndWait();
     }

A bridge i had is now replaced by an getter with @IndexingDependency

     @ElementCollection
     @LazyCollection(LazyCollectionOption.FALSE)
     @CollectionTable(name = "KATASTER_ADRESSEN")
-    @IndexedEmbedded
-    @Field
-    @FieldBridge(impl = KatasterAdressenFieldBridge.class)
+    @IndexedEmbedded(storage = ObjectFieldStorage.NESTED)
     private List<KatasterAdresse> adressen = new ArrayList<>();
 
+
+    public final static String PATH_PRIMEADRESSE = "primeAdresse";
+    @KeywordField(name = PATH_PRIMEADRESSE, sortable = Sortable.YES)
+    @IndexingDependency(derivedFrom = @ObjectPath(
+            @PropertyValue(propertyName="adressen")
+    ))
+    public String getPrimeAdresse() {
+        return adressen.stream().filter(KatasterAdresse::isPrime).findAny().map(KatasterAdresse::getAdresseReadable).orElse("");
+    }

Searching is now longer done with the FullTextEntityManager and QueryBuilder but with the SearchSession and lambda api:

-        final FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(entityManager);
-        final QueryBuilder qb = fullTextEntityManager
-                .getSearchFactory()
-                .buildQueryBuilder()
-                .forEntity(KauffallBBG.class)
-                .overridesForField(PATH_STRASSE_NGRAM, "default")
-                .get();
+        final SearchSession searchSession = Search.session(entityManager);
...
+        final SearchResult<KauffallBBG> result = searchSession.search(KauffallBBG.class)
+                .where(f ->
+                        f.bool(b -> {
+                            b.must(f.matchAll());
+                            b.must(filterBasicSearch(kauffallSuche, f));
+                            ...
+                        }))
+                .sort(f -> {
+                    ...
+                })
+                .fetch(pageable.getPageSize() * pageable.getPageNumber(), pageable.getPageSize());
-        return new PageImpl(fullTextQuery.getResultList(), pageable, fullTextQuery.getResultSize());
+        return new PageImpl(result.hits(), pageable, result.totalHitCount());

My factored out queries for reuse in the application now returns a SearchPredicate instead of a Query:

-    public static Query createQuery(QueryBuilder qb, BigDecimal nettoflaeche) {
-        return qb
-                .keyword()
-                .onField(PATH_NETTOFLAECHE)
-                .matching(BigDecimalNumericFieldBridge.convertToQueryValue(nettoflaeche))
-                .createQuery();
+    public static SearchPredicate createPredicate(SearchPredicateFactory f, BigDecimal nettoflaeche) {
+        return f.match().field(PATH_NETTOFLAECHE).matching(nettoflaeche).toPredicate();
     }

yrodiere · July 21, 2020, 8:09am

Thanks for your feedback! Glad to hear it worked out fine.

A few precisions below…

Yes, that’s still in the works: [HSEARCH-3108] - Hibernate JIRA

By the way, the requirement to name the backend will be dropped in Beta9. And you won’t have to specify the backend type if there’s only one backend in the classpath. So you’ll be able to do this (in Beta9):

   jpa.properties:
     hibernate.create_empty_composites.enabled: true
-    hibernate.search.default.directory_provider: filesystem
-    hibernate.search.default.indexBase: ./index/default
+    hibernate.search.backend.directory.root: ./index/default
+    hibernate.search.backend.analysis.configurer: de.muenchen.kps.LuceneAnalyzerConfig

Topic		Replies	Views
Query across multiple Indexes (Sub Classes) Hibernate Search	21	2866	October 22, 2020
About IndexedEmbedded annotation and collection Hibernate Search	4	2631	August 23, 2019
HS6 not indexing add or delete, only update with @OneToMany @IndexedEmbedded Hibernate Search	14	1482	September 18, 2021
Hibernate Search 6 Index Aliases Hibernate Search	15	2743	December 4, 2020
How to use annotations to read two files and index the connection into multiple fields Hibernate Search	3	22	December 11, 2024

Searching for two Fields of an @IndexEmbedded List/Collection

Related topics