Get internal Lucene docId in Hibernate Search 6.1

Good morning,

I want to show the Lucene term vector from an given document. For this, I used the reader.getTermVector(int docID, String field) method in the past (which need the internal Lucene document id) but right now I can’t get this internal id from a search result.

According to the Migration Guide the DOCUMENT_ID proyection now returns the entity id and not the internal document id, so I don’t know how the get this value right now :thinking:

The internal document ID can change from one reader to another, so what you are doing is not guaranteed to work. If the index changes for some reason between when you retrieve the internal doc id and when you open your reader, the internal doc id may change as well and you will retrieve the term vector for the wrong document.

The right way of retrieving the internal document ID would be to manually run a search query in your new reader, to get the internal doc id for the document with the ID returned by the documentReference() projection. But that’s a bit more complex.

If you really want to continue using internal document IDs across different readers, you can retrieve the TopDocs after a Lucene search, and from there you can get the internal doc ids.

The documentReference() projection returns the typeName and id, but this id is also the entity id, no the internal Lucene document id.

On the other side, if I use the entity id to look for the entity data how can I get it’s related Lucene document id? Because I need this ID to get the document fields and term vector. You said to run a search query in the reader, but I don’t see any search method in IndexReader.

Yes; that’s why I told you:

Basically do a first search query to retrieve the (Hibernate Search) document ID, then another on your reader with the query __HSEARCH_id:<whatever id was returned by Hibernate Search>, then you get the Lucene document ID.

You’re trying to retrieve term vectors, which is a rather low-level operation. You will have to rely on rather low-level APIs to run search queries too; unfortunately it’s a bit more complex than running a query in Hibernate Search. I’d carefully check whether I really need it before going down that path.

But if you really want to do it, something like this should work:

String hibernateSearchDocumentId = /* ... */;
IndexReader myReader = /* ... */;
IndexSearcher searcher = new IndexSearcher( myReader );
TopDocs topDocs = searcher.search( new TermQuery( "__HSEARCH_id", hibernateSearchDocumentId ), 1 );
if ( topDocs == null || topDocs.scoreDocs.length == 0 ) {
     throw new IllegalStateException( "Document " + hibernateSearchDocumentId + " was deleted since the Hibernate Search query execution." );
}
int luceneDocId = topDocs.scoreDocs[0].doc;
Terms = myReader.getTermVector(luceneDocId, "myField");

Also, I’d like to be very clear that all this dance to avoid reusing Lucene document IDs across readers is not something caused by a change in Hibernate Search 6. Reusing an Lucene document IDs across readers was already incorrect in Hibernate Search 5, and that’s why we tried to fix the problem by changing the document ID projection to no longer returns internal document IDs.

If you’re fine with the level of correctness you had in Hibernate Search 5 (most of the time it works, sometimes it won’t), you can still retrieve the internal doc IDs as I suggested above:

I’ve found some interesting sample code at java - What is docID in IndexReader.getTermVector(int docID ,String field) in Lucene 8.5.1 and how does it work? - Stack Overflow

Anyway I seen the problem is that I can’t find the entity by it’s id. Not sure what is the problem, because this worked in Hibernate Search 5. The entity id is just annotated with @DocumentId but I wonder if it’s possible to configure other parameters like with @KeywordField and “searchable” and “projectable” parameters.

The ID is no longer a field, at least not in the way you would expect as a Hibernate Search user.

That’s explained in the migration guide as well: https://docs.jboss.org/hibernate/search/6.0/migration/html_single/#document-id-is-not-a-field

O, I see. So if I want to replicate the Hibernate Search 5 behaviour I need to add @KeywordField(searchable = Searchable.YES, projectable = Projectable.YES) to my entity id field.

Thanks!