Hello Hibernate Search Community,
I am currently working with Hibernate Search to query large datasets from OpenSearch and have encountered a limitation with the size of the result window. My application needs to process potentially tens of millions of records stored in OpenSearch, but I am hitting the index.max_result_window
limit, which is set to 10,000 by default.
Here’s how my current method to fetch data looks:
public Page<EntityDetails> findEntities(SearchRequest searchRequest) {
var searchResult = entitySearchRepository.search(Entity.class)
.where(searchRequest.toPredicate())
.loading(searchLoadingOptionsStep -> entityGraphs.forEach(graph -> searchLoadingOptionsStep.graph(graph, GraphSemantic.FETCH)))
.sort(searchRequest.toSort())
.fetch(searchRequest.getOffset(), searchRequest.getLimit());
var result = searchResult
.hits()
.stream()
.map(mapper::map)
.toList();
return new PageImpl<>(result, PageRequest.of(searchRequest.getPage(), searchRequest.getSize()), searchResult.total().hitCount());
}
While attempting to retrieve data beyond the 10,000th record, I receive the following error:
Response: 400 'Bad Request' with body {
"error": {
"root_cause": [{
"type": "illegal_argument_exception",
"reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [36540]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
}],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [{
"shard": 0,
"index": "entity-index-v1",
"node": "R8TtsgiuQIKy9Tya8KWvmw",
"reason": {
"type": "illegal_argument_exception",
"reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [36540]. See the scroll api for a more efficient way to request large data sets."
}
}]
},
"status": 400
}
I understand that adjusting the index.max_result_window
setting could temporarily resolve this issue. However, as our data continues to grow, this might not be a viable long-term solution since we could eventually exceed any statically set limit. I am looking for guidance on how to implement a more scalable approach, possibly using Hibernate Search’s (7.1.0 Final) Scroll API, to handle large datasets effectively.
Could anyone provide examples or insights on how to adjust my method to utilize the Scroll capabilities for handling very large data sets efficiently?
Thank you in advance for your help!