Better and faster mass indexing

Hello. I use hibernate search 6 and my postgres db is 1gb !
I want to do mass indexing on all entities, but I notice strange behavior. The size of the Lucene directory I’m creating goes to 6MB, 7MB, 8MB, and then back to 6MB, 7MB, 8MB. The problem, I have identified, lies with the following entities

@Indexed(index = "idx_name")
@Table(name = "tableName")
public class ObjectΑ {
	
	@Id
	@Column(name = "id", nullable = false, precision = 18)
    @ScaledNumberField(decimalScale = 0, sortable = Sortable.YES)
	private BigInteger id;
	
	
	@OneToMany(mappedBy = "objectA", cascade = CascadeType.ALL, fetch = FetchType.LAZY, orphanRemoval = true)
	@IndexedEmbedded(structure = ObjectStructure.NESTED, includePaths = {""relObjectb.id", "fielda", "fieldb", ...})
	@IndexingDependency(reindexOnUpdate = ReindexOnUpdate.SHALLOW)
	private List<ObjectB> bList = new ArrayList<>();

    .
    .
    .

    @Transient
	@IndexingDependency(derivedFrom = {
			@ObjectPath(@PropertyValue(propertyName = "bList"))
	})
	@ScaledNumberField(decimalScale = 2, sortable = Sortable.YES)
	private BigDecimal totalObjectBDebit;
	
	public BigDecimal getTotalObjectBDebit() {
		BigDecimal objectBDebit = BigDecimal.ZERO;
		if (CollectionUtils.isNotNullOrEmpty(bList)) {
			for (final ObjectΒ b: bList) {
				objectBDebit = objectBDebit .add(b.getDebit() != null ? b.getDebit() : BigDecimal.ZERO);
			}
		}
		
		return objectBDebit;
	}
	
	@Transient
	@IndexingDependency(derivedFrom = {
			@ObjectPath(@PropertyValue(propertyName = "bList"))
	})
	@ScaledNumberField(decimalScale = 2, sortable = Sortable.YES)
	private BigDecimal totalObjectΒCredit;
	
	public BigDecimal getTotalObjectΒCredit() {
		BigDecimal objectΒCredit = BigDecimal.ZERO;
		if (CollectionUtils.isNotNullOrEmpty(bList)) {
			for (final ObjectΒ b: bList) {
				objectΒCredit = objectΒCredit .add(b.getCredit() != null ? b.getCredit() : BigDecimal.ZERO);
			}
		}
		
		return objectΒCredit ;
	}
	
	@Transient
	@IndexingDependency(derivedFrom = {
			@ObjectPath(@PropertyValue(propertyName = "bList"))
	})
	@ScaledNumberField(decimalScale = 2, sortable = Sortable.YES)
	private BigDecimal totalObjectΒTotalamount;
	
	public BigDecimal getTotalObjectΒTotalamount() {
		BigDecimal objectΒTotalamount = BigDecimal.ZERO;
		if (CollectionUtils.isNotNullOrEmpty(bList)) {
			for (final ObjectΒ b: bList) {
				objectΒTotalamount = objectΒTotalamount .add(b.getTotalamount() != null ? b.getTotalamount() : BigDecimal.ZERO);
			}
		}
		
		return objectΒTotalamount ;
	}
	
	@Transient
	@IndexingDependency(derivedFrom = {
			@ObjectPath(@PropertyValue(propertyName = "bList"))
	})
	@FullTextField
	@GenericField(name = "traderstranFullname_sort", sortable = Sortable.YES)
	private String objectΒFullname;
	
	public String getObjectΒFullname() {
		String objectΒName = null;
		if (CollectionUtils.isNotNullOrEmpty(bList)) {
			for (final ObjectΒ b : bList) {
				objectΒName = b.getFullname();
			}
		}
		return objectΒName;
	}
}

@Indexed(index = "idx_name2")
@Table(name = "tableName2")
public class ObjectΒ {
	
	@Id
	@Column(name = "id", nullable = false, precision = 18)
    @ScaledNumberField(decimalScale = 0, sortable = Sortable.YES)
	private BigInteger id;

    .
    .
    .

    @ManyToOne(fetch = FetchType.LAZY, optional = false)
	@JoinColumn(name = "objectaid", nullable = false)
	@ParentReference
	@IndexedEmbedded(includePaths = {"fielda", "fieldb", "fieldc"})
	@IndexingDependency(reindexOnUpdate = ReindexOnUpdate.SHALLOW)
	private ObjectA objectA;

    @OneToOne(fetch = FetchType.LAZY, optional = false)
	@JoinColumn(name = "relObjectbid")
	@IndexedEmbedded(includePaths = {"id"})
	@IndexingDependency(reindexOnUpdate = ReindexOnUpdate.SHALLOW)
	private ObjectΒ relObjectb;

    @Transient
	@IndexingDependency(derivedFrom = {
			@ObjectPath(@PropertyValue(propertyName = "objectA"))
	})
	@KeywordField(normalizer = Constants.NORMALIZER_LOWERCASE)
	@GenericField(name = "objectADescr_sort", sortable = Sortable.YES)
	private String objectADescr;
	
	public String getComtranDescr() {
		return objectA.getCode() + " " + objectA.getSeqnr();
	}
}

bList has 700k records.

The code I use for mass indexing is

final SearchSession searchSession = Search.session(em);
		final MassIndexer indexer = searchSession.massIndexer()
				.purgeAllOnStart(true)
				.mergeSegmentsAfterPurge(true)
				.batchSizeToLoadObjects(100)
				.threadsToLoadObjects(12)
				.typesToIndexInParallel(1)
				.idFetchSize(250);
		indexer.startAndWait();

Could I improve the above code somehow with the aim of better and faster indexing:?
Thank you !

Hello,

I don’t understand what you mean. In any case, the Lucene directory size is rather irrelevant; more relevant are the logs of your mass indexer.

See Hibernate Search 7.1.0.Final: Reference Documentation

There’s no one-size-fits-all solution, you need to have a look at the queries being sent to the database and understand why they’re slow. Whatever solution you come up with will be specific to your mapping and your dataset.

There are plans to optimize out-of-the box experience and to provide more customization in the mass indexer ([HSEARCH-4956] - Hibernate JIRA), but we’re not quite there yet.

Most often (but not always) there’s a problem with unoptimized mapping/SQL, where too many SQL queries are executed when a single one could have been. Have a look at batch fetching if it’s not already enabled: Hibernate ORM User Guide

1 Like