Will multivalued VectorFields be possible in the future?


I am currently trying to implement a dense passage retrieval system and i am evaluating whether i can use Hibernate search for that purpose. Essentially I want to perform a knn search on Entities that have multiple Embeddings associated with them.

Consider something like this:

public class Book {
        private Integer id;
        @OneToMany(mappedBy = "book")
        private List<Embedding> bookEmbeddings;

        // Other properties ...
public class Embedding {
        private Integer id;
        private Book book;
        @VectorField(dimension = 768, vectorSimilarity = VectorSimilarity.COSINE, searchable = Searchable.YES)
        private float[] embedding;

        // Other properties ...
float[] queryEmbedding = /*...*/

List<Book> hits = searchSession.search( Book.class )
.where( f ->
    f.knn( 5 )
        .field( "bookEmbeddings" )
        .matching( queryEmbedding )
).fetchHits( 20 );

This is currently not possible as stated in the documentation:

“It is not allowed to index multiple vectors within the same field, i.e. vector fields cannot be multivalued.“

Is this a temporary limitation because the vector search functionality is currently being developed or are there any underlying limitations that make this impossible to implement?

Thank you for your help!

Hey @felixs

Thanks for reaching out with this question. At the moment it is a limitation in the underlying Lucene implementation of vector search. Lucene allows to index exactly one vector into a vector field. Hence it cannot be multi-valued (e.g. see Multi-value Support for KnnVectorField · Issue #12313 · apache/lucene · GitHub).

But when we are saying that it cannot be multivalued it means that there cannot be something like:

public class Book {
        private Integer id;
        @OneToMany(mappedBy = "book")
        private List<float[]> bookEmbeddings;

        // Other properties ...

In your example, though, you’ve wrapped the vector field in another embedded object. That should work fine since it means that the vector field will be located in a nested object containing a single vector.

Note that with such mapping, the query should be something like this:

session.search( Book.class )
		.where( f -> f.knn( BATCHES ).field( "bookEmbeddings.embedding" ).matching( queryEmbedding ) )

using the bookEmbeddings.embedding.
I’ve also run a quick test with such schema, and it worked.

NOTE: keep in mind that for it to work the embedded object must have a NESTED structure:

@IndexedEmbedded(structure = ObjectStructure.NESTED)
List<Embedding> bookEmbeddings;

Thanks @mbekhta !

I clearly had a wrong understandig what multivalued means and i was not aware that ObjectStructure.NESTED can help me in this situation.

My usecase works as expected now that I use ObjectStructure.NESTED.

Thank you for your help!

