Will multivalued VectorFields be possible in the future?

Hello,

I am currently trying to implement a dense passage retrieval system and i am evaluating whether i can use Hibernate search for that purpose. Essentially I want to perform a knn search on Entities that have multiple Embeddings associated with them.

Consider something like this:

@Indexed
public class Book {
        @Id
        private Integer id;
        
        @OneToMany(mappedBy = "book")
        @IndexedEmbedded
        private List<Embedding> bookEmbeddings;

        // Other properties ...
}
@Entity
public class Embedding {
        @Id
        private Integer id;
        
        @ManyToOne
        private Book book;
    
        @VectorField(dimension = 768, vectorSimilarity = VectorSimilarity.COSINE, searchable = Searchable.YES)
        private float[] embedding;

        // Other properties ...
}
float[] queryEmbedding = /*...*/

List<Book> hits = searchSession.search( Book.class )
.where( f ->
    f.knn( 5 )
        .field( "bookEmbeddings" )
        .matching( queryEmbedding )
).fetchHits( 20 );

This is currently not possible as stated in the documentation:

“It is not allowed to index multiple vectors within the same field, i.e. vector fields cannot be multivalued.“

Is this a temporary limitation because the vector search functionality is currently being developed or are there any underlying limitations that make this impossible to implement?

Thank you for your help!

Hey @felixs

Thanks for reaching out with this question. At the moment it is a limitation in the underlying Lucene implementation of vector search. Lucene allows to index exactly one vector into a vector field. Hence it cannot be multi-valued (e.g. see Multi-value Support for KnnVectorField · Issue #12313 · apache/lucene · GitHub).

But when we are saying that it cannot be multivalued it means that there cannot be something like:

@Indexed
public class Book {
        @Id
        private Integer id;
        
        @OneToMany(mappedBy = "book")
        @VectorField(....)
        private List<float[]> bookEmbeddings;

        // Other properties ...
}

In your example, though, you’ve wrapped the vector field in another embedded object. That should work fine since it means that the vector field will be located in a nested object containing a single vector.

Note that with such mapping, the query should be something like this:

session.search( Book.class )
		.where( f -> f.knn( BATCHES ).field( "bookEmbeddings.embedding" ).matching( queryEmbedding ) )
		.fetchAllHits()

using the bookEmbeddings.embedding.
I’ve also run a quick test with such schema, and it worked.

NOTE: keep in mind that for it to work the embedded object must have a NESTED structure:

@IndexedEmbedded(structure = ObjectStructure.NESTED)
List<Embedding> bookEmbeddings;
2 Likes

Thanks @mbekhta !

I clearly had a wrong understandig what multivalued means and i was not aware that ObjectStructure.NESTED can help me in this situation.

My usecase works as expected now that I use ObjectStructure.NESTED.

Thank you for your help!

1 Like