Hello,
I am currently trying to implement a dense passage retrieval system. Essentially I want to perform a knn search on Entities that have multiple Embeddings associated with them.
I want the entites in the search result to be sorted by the maximum embedding similarity.
My current implementation seems to sort the entites by the average embedding similarity.
Consider a scenario like this:
@Indexed
public class Book {
@Id
private Integer id;
@OneToMany(mappedBy = "book", fetch = FetchType.EAGER)
@IndexedEmbedded(structure = ObjectStructure.NESTED)
private List<Embedding> bookEmbeddings;
// Other properties ...
}
@Entity
public class Embedding {
@Id
private Integer id;
@ManyToOne
private Book book;
@VectorField(dimension = 768, vectorSimilarity = VectorSimilarity.COSINE, searchable = Searchable.YES)
private float[] embedding;
// Other properties ...
}
float[] queryEmbedding = /*...*/
List<Book> hits = searchSession.search( Book.class )
.where( f ->
f.knn( 5 )
.field( "bookEmbeddings.embedding" )
.matching( queryEmbedding )
).fetchHits( 20 );
In this example one book has multiple emeddings associated with it.
When searching for books with an embedding I want the first result to be the book that has the most similair embedding associated with it.
Currently the average embedding similarity seems to be used for sorting.
Is it possible to achieve this behaviour with hibernate search?