How to massindexer a specific class with condition

Hi.
I would like to massindexer only a specific class with a conditon like id in (1, 2, 3, …)

final MassIndexer massIndexer = searchSession.massIndexer(clazz);
		massIndexer
				.purgeAllOnStart(false)
				.mergeSegmentsAfterPurge(true)
				.batchSizeToLoadObjects(12)
				.threadsToLoadObjects(16)
				.typesToIndexInParallel(1)
				.idFetchSize(100)
				.type(clazz)
				.reindexOnly(condition);
		massIndexer.startAndWait();

searchSession.massIndexer(clazz); → is it needed here to declare “clazz” or we are ok with the .type(clazz). ?
Also i I wouldn’t want tο remove all the previous index of this class. I just want only to update the indexes that are related to condition.
.purgeAllOnStart(false) → is it needed to declare to massindexer ?

Thank you !

Hey Tony,

Yes. When you create the massindexer, i.e. searchSession.massIndexer(clazz), you specify its scope. If you do not provide the class, it would assume that it has to index all indexed entities. Then when you do .type(clazz).reindexOnly(condition) you specify the condition for a type that is (expected) to be within the scope of your massindexer, and that condition will be applied when entities of that type are indexed.

The default for purge-on-start is true, so if you do not want to purge the documents, you have to set it explicitly, yes. Also, mergeSegmentsAfterPurge(true) doesn’t make sense if you do not purge the docs …

1 Like
final MassIndexer massIndexer = searchSession.massIndexer(clazz);
		massIndexer
				.purgeAllOnStart(false)
				.batchSizeToLoadObjects(12)
				.threadsToLoadObjects(16)
				.typesToIndexInParallel(1)
				.idFetchSize(100)
				.type(clazz)
				.reindexOnly(condition);
		massIndexer.startAndWait();

See that it creates duplicate indexes for ids that exists into condition ! :confused:

Hello again. Firstly i use the version 6.1.8.Final.
Βecause I manage processes with a large volume of data, I have disabled the automatic indexes → hibernate.search.automatic_indexing.strategy=none.

So I have set a boolean skipIndexing, by default false, and when I run the above complex processes I change its value to true.

I use a Listener where depending on the value of skipIndexing, I update or not the indexes.

public class MyListener {
	
	
	@PostUpdate
	@PostPersist
	public void post(final Object object) {
		final boolean skip = object instanceof MyBaseEntity  && ((MyBaseEntity) object).isSkipIndexing();
		if (skip) {
			final IndexingService indexingService = ApplicationContextHolder.applicationContext.getBean(IndexingService.class);
			indexingService.indexObjectManually(object, true);
		}
	}
	
	@PostRemove
	public void remove(final Object object) {
		final boolean skip = object instanceof MyBaseEntity && ((MyBaseEntity) object).isSkipIndexing();
		if (!skip) {
			final IndexingService indexingService = ApplicationContextHolder.applicationContext.getBean(MyService.class);
			indexingService.indexObjectManually(object, false);
		}
	}
	
}

My object:

@Indexed(index = "...")
@Table(name = "...")
public class MyObject extends MyBaseEntity {
	
	@Id
	@Column(name = "id", nullable = false, precision = 18)
	@ScaledNumberField(decimalScale = 0, sortable = Sortable.YES)
	private BigInteger id;
	
	@ManyToOne(fetch = FetchType.LAZY, optional = false)
	@JoinColumn(name = "objectBId", nullable = false)
	@IndexedEmbedded(includePaths = {"id", "field1", "field2"})
	@IndexingDependency(reindexOnUpdate = ReindexOnUpdate.SHALLOW)
	private MyObjectB objectB;

    .
    .
    .
}

I have a process that I have to manage a large volume of data

for (final MyObject object : myObjectList) {
	object.setSkipIndexing(true);
}
final int batchSize = 1000;
for (int i = 0; i < myObjectList.size(); i += batchSize) {
		final List<YdrTran> batchList = myObjectList.subList(i, Math.min(i + batchSize, ydrTranList.size()));
		ydrTranRepository.saveAllAndFlush(myObjectList); 
			
		System.out.println(i + batchSize);
		try {
				final Set<BigInteger> ids = batchList.stream().map(MyObject::getId).collect(Collectors.toSet());
					indexingService.indexingByClassAndCondition(MyObject.class, ids);
				} catch (final Exception e) {
					throw new RuntimeException(e);
				}
			}

MyService

public void indexObjectManually(final Object object, final boolean isUpdate) {
		if (AnnotationChecker.hasAnnotation(object.getClass(), Indexed.class)) {
			final SearchSession searchSession = Search.session(em);
			final SearchIndexingPlan indexingPlan = searchSession.indexingPlan();
			if (isUpdate) {
				indexingPlan.addOrUpdate(object);
			} else {
				indexingPlan.delete(object);
			}
		}
	}

public void indexingByClassAndCondition(final Class<?> clazz, final Set<BigInteger> conditionIds) throws InterruptedException {
		final SearchSession searchSession = Search.session(em);
		
		final MassIndexer massIndexer = searchSession.massIndexer(clazz);
		massIndexer
				.purgeAllOnStart(false)
				.type(clazz)
				.reindexOnly("id in ( " + ":ids" + ")")
				.param("ids", conditionIds);
		massIndexer.startAndWait();
		
		em.clear();
	}

Τhe result is that new indexes are updated for the specific ids but the previous ones are not deleted, resulting in duplicates. How can i avoid this ?

Thank you ! :slight_smile: