How to massindexer a specific class with condition

Tony · November 27, 2024, 9:48am

Hi.
I would like to massindexer only a specific class with a conditon like id in (1, 2, 3, …)

final MassIndexer massIndexer = searchSession.massIndexer(clazz);
		massIndexer
				.purgeAllOnStart(false)
				.mergeSegmentsAfterPurge(true)
				.batchSizeToLoadObjects(12)
				.threadsToLoadObjects(16)
				.typesToIndexInParallel(1)
				.idFetchSize(100)
				.type(clazz)
				.reindexOnly(condition);
		massIndexer.startAndWait();

searchSession.massIndexer(clazz); → is it needed here to declare “clazz” or we are ok with the .type(clazz). ?
Also i I wouldn’t want tο remove all the previous index of this class. I just want only to update the indexes that are related to condition.
.purgeAllOnStart(false) → is it needed to declare to massindexer ?

Thank you !

mbekhta · November 27, 2024, 10:03am

Hey Tony,

Yes. When you create the massindexer, i.e. searchSession.massIndexer(clazz), you specify its scope. If you do not provide the class, it would assume that it has to index all indexed entities. Then when you do .type(clazz).reindexOnly(condition) you specify the condition for a type that is (expected) to be within the scope of your massindexer, and that condition will be applied when entities of that type are indexed.

The default for purge-on-start is true, so if you do not want to purge the documents, you have to set it explicitly, yes. Also, mergeSegmentsAfterPurge(true) doesn’t make sense if you do not purge the docs …

Tony · November 27, 2024, 10:24am

final MassIndexer massIndexer = searchSession.massIndexer(clazz);
		massIndexer
				.purgeAllOnStart(false)
				.batchSizeToLoadObjects(12)
				.threadsToLoadObjects(16)
				.typesToIndexInParallel(1)
				.idFetchSize(100)
				.type(clazz)
				.reindexOnly(condition);
		massIndexer.startAndWait();

See that it creates duplicate indexes for ids that exists into condition !

Tony · November 27, 2024, 9:26pm

Hello again. Firstly i use the version 6.1.8.Final.
Βecause I manage processes with a large volume of data, I have disabled the automatic indexes → hibernate.search.automatic_indexing.strategy=none.

So I have set a boolean skipIndexing, by default false, and when I run the above complex processes I change its value to true.

I use a Listener where depending on the value of skipIndexing, I update or not the indexes.

public class MyListener {
	
	
	@PostUpdate
	@PostPersist
	public void post(final Object object) {
		final boolean skip = object instanceof MyBaseEntity  && ((MyBaseEntity) object).isSkipIndexing();
		if (skip) {
			final IndexingService indexingService = ApplicationContextHolder.applicationContext.getBean(IndexingService.class);
			indexingService.indexObjectManually(object, true);
		}
	}
	
	@PostRemove
	public void remove(final Object object) {
		final boolean skip = object instanceof MyBaseEntity && ((MyBaseEntity) object).isSkipIndexing();
		if (!skip) {
			final IndexingService indexingService = ApplicationContextHolder.applicationContext.getBean(MyService.class);
			indexingService.indexObjectManually(object, false);
		}
	}
	
}

My object:

@Indexed(index = "...")
@Table(name = "...")
public class MyObject extends MyBaseEntity {
	
	@Id
	@Column(name = "id", nullable = false, precision = 18)
	@ScaledNumberField(decimalScale = 0, sortable = Sortable.YES)
	private BigInteger id;
	
	@ManyToOne(fetch = FetchType.LAZY, optional = false)
	@JoinColumn(name = "objectBId", nullable = false)
	@IndexedEmbedded(includePaths = {"id", "field1", "field2"})
	@IndexingDependency(reindexOnUpdate = ReindexOnUpdate.SHALLOW)
	private MyObjectB objectB;

    .
    .
    .
}

I have a process that I have to manage a large volume of data

for (final MyObject object : myObjectList) {
	object.setSkipIndexing(true);
}
final int batchSize = 1000;
for (int i = 0; i < myObjectList.size(); i += batchSize) {
		final List<YdrTran> batchList = myObjectList.subList(i, Math.min(i + batchSize, ydrTranList.size()));
		ydrTranRepository.saveAllAndFlush(myObjectList); 
			
		System.out.println(i + batchSize);
		try {
				final Set<BigInteger> ids = batchList.stream().map(MyObject::getId).collect(Collectors.toSet());
					indexingService.indexingByClassAndCondition(MyObject.class, ids);
				} catch (final Exception e) {
					throw new RuntimeException(e);
				}
			}

MyService

public void indexObjectManually(final Object object, final boolean isUpdate) {
		if (AnnotationChecker.hasAnnotation(object.getClass(), Indexed.class)) {
			final SearchSession searchSession = Search.session(em);
			final SearchIndexingPlan indexingPlan = searchSession.indexingPlan();
			if (isUpdate) {
				indexingPlan.addOrUpdate(object);
			} else {
				indexingPlan.delete(object);
			}
		}
	}

public void indexingByClassAndCondition(final Class<?> clazz, final Set<BigInteger> conditionIds) throws InterruptedException {
		final SearchSession searchSession = Search.session(em);
		
		final MassIndexer massIndexer = searchSession.massIndexer(clazz);
		massIndexer
				.purgeAllOnStart(false)
				.type(clazz)
				.reindexOnly("id in ( " + ":ids" + ")")
				.param("ids", conditionIds);
		massIndexer.startAndWait();
		
		em.clear();
	}

Τhe result is that new indexes are updated for the specific ids but the previous ones are not deleted, resulting in duplicates. How can i avoid this ?

Thank you !

Topic		Replies	Views
MassIndexer delete data based on condition Hibernate Search	4	206	February 9, 2024
HS6: About massIndexer Hibernate Search	7	948	May 4, 2021
Problem with massindexer and @IndexedEmbedded Hibernate Search	3	334	July 27, 2023
MassIndexer Changes in Hibernate 6 Hibernate Search	2	908	July 13, 2020
Index only parent class Hibernate Search	1	365	June 24, 2021

How to massindexer a specific class with condition

Related topics