I have added Hibernate Search to our web application. It is supposed to index Process
es, it works but indexing is very slow. At the same time it creates high load. In processes: Application (Tomcat) 50%, MySql 35%, and in services: Mariadb 30%, Windows Search 10% (according to Windows Resource Monitor). Indexing 10k objects took a whole night. I have the feeling that the framework is doing far more than it would need to.
My bean class Process
to be indexed has a field id
(the ID column of the database) and transient getter methods for the index words. Simplified, it is this:
@MappedSuperclass
public abstract class BaseBean implements Serializable {
@Id
@Column(name = "id")
@GenericField
@GeneratedValue(strategy = GenerationType.IDENTITY)
protected Integer id;
}
@MappedSuperclass
public abstract class BaseTemplateBean extends BaseBean {
// ... 3 fields, not to be indexed ...
}
@Entity
@Indexed(index = "process")
@Table(name = "process")
public class Process extends BaseTemplateBean {
// ... 30 fields, with complex dependencies, but nothing to be indexed ...
@Transient
@FullTextField(name = "search")
@IndexingDependency(reindexOnUpdate = ReindexOnUpdate.NO)
public String getKeywordsForFreeSearch() {
return kewordsFromFileFor(id); // returns space-separated list of
// indexing terms like
// "dog house treatment bell ...
}
}
You can see the whole class here: kitodo-production/Kitodo-DataManagement/src/main/java/org/kitodo/data/database/beans/Process.java at a21e0f3f2966d5a36ac4e5a15fa7df759f15e812 · kitodo/kitodo-production · GitHub
Then I search for the ID like this (it works fine):
public Collection<Integer> searchIds(Class<? extends BaseBean> beanClass,
String searchField, String value) {
SearchSession searchSession = Search.session(getSession());
SearchProjection<Integer> idField = searchSession.scope(beanClass)
.projection().field("id", Integer.class).toProjection();
List<Integer> ids = searchSession.search(beanClass).select(idField)
.where(function -> function.match().field(searchField)
.matching(value)).fetchAll().hits();
return ids;
}
My assumption would be that the content from Process
will be indexed to a document with an internal indexing number, and the indexed document only contains the ID.
dog → 1234, house → 1234, treatment → 1234, bell → 1234, …
1234 → { id: 42 }
And that should be much faster and much less load, especially on the database virtually no load. I think the framework does a lot in the background, but I don’t know what or why. Is it related to those annotations? Does it load the entire object with all the lazy-loading collections and objects on it to check if there is an annotation, just to notice: oh, there isn’t one, I can forget about this. Can I improve something here? Can I / do I have to tell the framework: You don’t have to look here, there’s nothing to do? Or is it trying some extended evaluation on the @FullTextField
aside from splitting it at spaces? Do I have to disable something?
Any suggestions will be appreciated.