Aggregation query with no limit using Lucene backend

Using HS 6.0.7. We are using aggregation queries to populate a list of filters for our application, using the values for a specific field in the index. Our code is setup the same way as the example in the documentation, but we fetch 0 results (only care about the singular field value), and have set the maxTermCount to the max value of int (as we want the filter to display all possible options for the field).

This works as intended when using an Elasticsearch back end. However when we using the lucene backend this takes several seconds and leads to a huge jump in memory usage. This happens even if the index only has a couple of records in it.

I couldn’t see anything mentioned about this on the issue tracker or in recent patch notes. Not sure if this is a bug, or we should be using a different method to gather said data when using lucene?

I suspect this will involve creating a datastructure of size Integer.MAX_VALUE. I’m actually surprised this even works; Lucene has checks in several places that forbid creating priority queues as large as this.

Probably; please report it, but I’ll need a reproducer. And I can’t promise this can be solved; this part of the code is relying on relatively exotic parts of Lucene…

Or at least give me an idea of the code that defines the aggregation, the relevant part of the mapping, and a stack trace showing where the memory allocation happens (you can probably get such a stacktrace by drastically reducing the amount of memory assigned to your VM; your aggregation should trigger an OOM error then).

If your intent is to retrieve all values of a given field across the whole index, then indeed, you could go for a different approach. You can get access to an index reader and from there you can leverage low-level APIs to list all values of a field. One way of doing this would be to leverage docvalues as we do in Hibernate Search: hibernate-search/backend/lucene/src/main/java/org/hibernate/search/backend/lucene/types/aggregation/impl/LuceneTextTermsAggregation.java at 92c597a1ad693a4b06d7114d4cc631bf5e9cd58c · hibernate/hibernate-search · GitHub .

There are probably other, possibly more efficient ways, in particular if your field is not analyzed, with org.apache.lucene.index.LeafReader#terms.

Note however that those are Lucene APIs, not Hibernate Search APIs, so we cannot guarantee the stability of these APIs in future versions :slight_smile:

Thanks for the speedy reply as always. I’ll take a look at the index reader. As the code is internal can’t share it out, but I’ll create a simple example externally in the next couple of days and share it here. Thanks again!

Apologies for the delay in getting back. Example code can be found here. My personal machine has 16gb of memory so wasn’t able to run it above 12gb of max heap space. Full output including stack trace below. For now we have just hard coded the limit as some stupidly high value (I think 10 million) as that is way above our max count but far below causing memory issues.

Inserting books
Books inserted

Reading books from search
Book title: Title Example
Book title: The Sequel
Book title: Title Example
All books read

Running aggregation with max term count of 10
Title Example: 2
The Sequel: 1
Aggregation finished in: 63

Running aggregation with max term count of 1000
Title Example: 2
The Sequel: 1
Aggregation finished in: 2

Running aggregation with max term count of 100000
Title Example: 2
The Sequel: 1
Aggregation finished in: 1

Running aggregation with max term count of 1000000
Title Example: 2
The Sequel: 1
Aggregation finished in: 2

Running aggregation with max term count of 10000000
Title Example: 2
The Sequel: 1
Aggregation finished in: 19

Running aggregation with max term count of 100000000
Title Example: 2
The Sequel: 1
Aggregation finished in: 235

Running aggregation with max term count of 1000000000
Title Example: 2
The Sequel: 1
Aggregation finished in: 2311

Running aggregation with max term count of 2147483630
Exception in thread “main” java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.PriorityQueue.(PriorityQueue.java:100)
at org.apache.lucene.util.PriorityQueue.(PriorityQueue.java:47)
at org.apache.lucene.facet.TopOrdAndIntQueue.(TopOrdAndIntQueue.java:41)
at org.hibernate.search.backend.lucene.lowlevel.facet.impl.TextMultiValueFacetCounts.getTopChildrenSortByCount(TextMultiValueFacetCounts.java:98)
at org.hibernate.search.backend.lucene.lowlevel.facet.impl.TextMultiValueFacetCounts.getTopChildren(TextMultiValueFacetCounts.java:72)
at org.hibernate.search.backend.lucene.types.aggregation.impl.LuceneTextTermsAggregation.getTopChildren(LuceneTextTermsAggregation.java:54)
at org.hibernate.search.backend.lucene.types.aggregation.impl.AbstractLuceneFacetsBasedTermsAggregation.getTopBuckets(AbstractLuceneFacetsBasedTermsAggregation.java:124)
at org.hibernate.search.backend.lucene.types.aggregation.impl.AbstractLuceneFacetsBasedTermsAggregation.extract(AbstractLuceneFacetsBasedTermsAggregation.java:65)
at org.hibernate.search.backend.lucene.types.aggregation.impl.AbstractLuceneFacetsBasedTermsAggregation.extract(AbstractLuceneFacetsBasedTermsAggregation.java:39)
at org.hibernate.search.backend.lucene.search.query.impl.LuceneExtractableSearchResult.extractAggregations(LuceneExtractableSearchResult.java:156)
at org.hibernate.search.backend.lucene.search.query.impl.LuceneExtractableSearchResult.extract(LuceneExtractableSearchResult.java:81)
at org.hibernate.search.backend.lucene.search.query.impl.LuceneExtractableSearchResult.extract(LuceneExtractableSearchResult.java:60)
at org.hibernate.search.backend.lucene.search.query.impl.LuceneSearcherImpl.search(LuceneSearcherImpl.java:74)
at org.hibernate.search.backend.lucene.search.query.impl.LuceneSearcherImpl.search(LuceneSearcherImpl.java:32)
at org.hibernate.search.backend.lucene.work.impl.SearchWork.execute(SearchWork.java:42)
at org.hibernate.search.backend.lucene.orchestration.impl.LuceneSyncWorkOrchestratorImpl$WorkExecution.execute(LuceneSyncWorkOrchestratorImpl.java:152)
at org.hibernate.search.backend.lucene.orchestration.impl.LuceneSyncWorkOrchestratorImpl.doSubmit(LuceneSyncWorkOrchestratorImpl.java:86)
at org.hibernate.search.backend.lucene.orchestration.impl.LuceneSyncWorkOrchestratorImpl.doSubmit(LuceneSyncWorkOrchestratorImpl.java:32)
at org.hibernate.search.engine.backend.orchestration.spi.AbstractWorkOrchestrator.submit(AbstractWorkOrchestrator.java:135)
at org.hibernate.search.backend.lucene.orchestration.impl.LuceneSyncWorkOrchestratorImpl.submit(LuceneSyncWorkOrchestratorImpl.java:58)
at org.hibernate.search.backend.lucene.orchestration.impl.LuceneSyncWorkOrchestrator.submit(LuceneSyncWorkOrchestrator.java:28)
at org.hibernate.search.backend.lucene.search.query.impl.LuceneSearchQueryImpl.doSubmit(LuceneSearchQueryImpl.java:203)
at org.hibernate.search.backend.lucene.search.query.impl.LuceneSearchQueryImpl.doFetch(LuceneSearchQueryImpl.java:178)
at org.hibernate.search.backend.lucene.search.query.impl.LuceneSearchQueryImpl.fetch(LuceneSearchQueryImpl.java:100)
at org.hibernate.search.backend.lucene.search.query.impl.LuceneSearchQueryImpl.fetch(LuceneSearchQueryImpl.java:41)
at org.hibernate.search.engine.search.query.spi.AbstractSearchQuery.fetch(AbstractSearchQuery.java:40)
at org.hibernate.search.engine.search.query.dsl.spi.AbstractSearchQueryOptionsStep.fetch(AbstractSearchQueryOptionsStep.java:146)
at aggregationtesting.AggregationTesting.aggregationQuery(AggregationTesting.java:72)
at aggregationtesting.Main.main(Main.java:22)

Thanks. I filed [HSEARCH-4544] - Hibernate JIRA . This should be fixed in the next releases of Hibernate Search.