Better and faster mass indexing

Tony · March 26, 2024, 8:44am

Hello. I use hibernate search 6 and my postgres db is 1gb !
I want to do mass indexing on all entities, but I notice strange behavior. The size of the Lucene directory I’m creating goes to 6MB, 7MB, 8MB, and then back to 6MB, 7MB, 8MB. The problem, I have identified, lies with the following entities

@Indexed(index = "idx_name")
@Table(name = "tableName")
public class ObjectΑ {
	
	@Id
	@Column(name = "id", nullable = false, precision = 18)
    @ScaledNumberField(decimalScale = 0, sortable = Sortable.YES)
	private BigInteger id;
	
	
	@OneToMany(mappedBy = "objectA", cascade = CascadeType.ALL, fetch = FetchType.LAZY, orphanRemoval = true)
	@IndexedEmbedded(structure = ObjectStructure.NESTED, includePaths = {""relObjectb.id", "fielda", "fieldb", ...})
	@IndexingDependency(reindexOnUpdate = ReindexOnUpdate.SHALLOW)
	private List<ObjectB> bList = new ArrayList<>();

    .
    .
    .

    @Transient
	@IndexingDependency(derivedFrom = {
			@ObjectPath(@PropertyValue(propertyName = "bList"))
	})
	@ScaledNumberField(decimalScale = 2, sortable = Sortable.YES)
	private BigDecimal totalObjectBDebit;
	
	public BigDecimal getTotalObjectBDebit() {
		BigDecimal objectBDebit = BigDecimal.ZERO;
		if (CollectionUtils.isNotNullOrEmpty(bList)) {
			for (final ObjectΒ b: bList) {
				objectBDebit = objectBDebit .add(b.getDebit() != null ? b.getDebit() : BigDecimal.ZERO);
			}
		}
		
		return objectBDebit;
	}
	
	@Transient
	@IndexingDependency(derivedFrom = {
			@ObjectPath(@PropertyValue(propertyName = "bList"))
	})
	@ScaledNumberField(decimalScale = 2, sortable = Sortable.YES)
	private BigDecimal totalObjectΒCredit;
	
	public BigDecimal getTotalObjectΒCredit() {
		BigDecimal objectΒCredit = BigDecimal.ZERO;
		if (CollectionUtils.isNotNullOrEmpty(bList)) {
			for (final ObjectΒ b: bList) {
				objectΒCredit = objectΒCredit .add(b.getCredit() != null ? b.getCredit() : BigDecimal.ZERO);
			}
		}
		
		return objectΒCredit ;
	}
	
	@Transient
	@IndexingDependency(derivedFrom = {
			@ObjectPath(@PropertyValue(propertyName = "bList"))
	})
	@ScaledNumberField(decimalScale = 2, sortable = Sortable.YES)
	private BigDecimal totalObjectΒTotalamount;
	
	public BigDecimal getTotalObjectΒTotalamount() {
		BigDecimal objectΒTotalamount = BigDecimal.ZERO;
		if (CollectionUtils.isNotNullOrEmpty(bList)) {
			for (final ObjectΒ b: bList) {
				objectΒTotalamount = objectΒTotalamount .add(b.getTotalamount() != null ? b.getTotalamount() : BigDecimal.ZERO);
			}
		}
		
		return objectΒTotalamount ;
	}
	
	@Transient
	@IndexingDependency(derivedFrom = {
			@ObjectPath(@PropertyValue(propertyName = "bList"))
	})
	@FullTextField
	@GenericField(name = "traderstranFullname_sort", sortable = Sortable.YES)
	private String objectΒFullname;
	
	public String getObjectΒFullname() {
		String objectΒName = null;
		if (CollectionUtils.isNotNullOrEmpty(bList)) {
			for (final ObjectΒ b : bList) {
				objectΒName = b.getFullname();
			}
		}
		return objectΒName;
	}
}

@Indexed(index = "idx_name2")
@Table(name = "tableName2")
public class ObjectΒ {
	
	@Id
	@Column(name = "id", nullable = false, precision = 18)
    @ScaledNumberField(decimalScale = 0, sortable = Sortable.YES)
	private BigInteger id;

    .
    .
    .

    @ManyToOne(fetch = FetchType.LAZY, optional = false)
	@JoinColumn(name = "objectaid", nullable = false)
	@ParentReference
	@IndexedEmbedded(includePaths = {"fielda", "fieldb", "fieldc"})
	@IndexingDependency(reindexOnUpdate = ReindexOnUpdate.SHALLOW)
	private ObjectA objectA;

    @OneToOne(fetch = FetchType.LAZY, optional = false)
	@JoinColumn(name = "relObjectbid")
	@IndexedEmbedded(includePaths = {"id"})
	@IndexingDependency(reindexOnUpdate = ReindexOnUpdate.SHALLOW)
	private ObjectΒ relObjectb;

    @Transient
	@IndexingDependency(derivedFrom = {
			@ObjectPath(@PropertyValue(propertyName = "objectA"))
	})
	@KeywordField(normalizer = Constants.NORMALIZER_LOWERCASE)
	@GenericField(name = "objectADescr_sort", sortable = Sortable.YES)
	private String objectADescr;
	
	public String getComtranDescr() {
		return objectA.getCode() + " " + objectA.getSeqnr();
	}
}

bList has 700k records.

The code I use for mass indexing is

final SearchSession searchSession = Search.session(em);
		final MassIndexer indexer = searchSession.massIndexer()
				.purgeAllOnStart(true)
				.mergeSegmentsAfterPurge(true)
				.batchSizeToLoadObjects(100)
				.threadsToLoadObjects(12)
				.typesToIndexInParallel(1)
				.idFetchSize(250);
		indexer.startAndWait();

Could I improve the above code somehow with the aim of better and faster indexing:?
Thank you !

yrodiere · March 26, 2024, 8:57am

Hello,

I don’t understand what you mean. In any case, the Lucene directory size is rather irrelevant; more relevant are the logs of your mass indexer.

See Hibernate Search 7.1.0.Final: Reference Documentation

There’s no one-size-fits-all solution, you need to have a look at the queries being sent to the database and understand why they’re slow. Whatever solution you come up with will be specific to your mapping and your dataset.

There are plans to optimize out-of-the box experience and to provide more customization in the mass indexer ([HSEARCH-4956] - Hibernate JIRA), but we’re not quite there yet.

Most often (but not always) there’s a problem with unoptimized mapping/SQL, where too many SQL queries are executed when a single one could have been. Have a look at batch fetching if it’s not already enabled: Hibernate ORM User Guide

Topic		Replies	Views
Out of Memory Exception on creating initial Indexing with Mass Indexer Hibernate Search	5	3403	February 22, 2019
Faster indexing rebuild Hibernate Search	12	1141	November 8, 2022
Index creation (Hibernate Search 5.11) (mass indexer) taking a long time Hibernate Search	4	928	February 14, 2022
Indexes in db drives to faster massindexer? Hibernate Search	2	13	April 4, 2025
MassIndexer : A lot of database records are not indexed Hibernate Search	5	682	April 11, 2022

Better and faster mass indexing

Related topics