Polymorphic Mass Indexing with Discriminator

I have looked at Discriminator is ignored during massindexing and Problem using Multitenancy Discriminator with Hibernate Search I don’t think these pertain although maybe are related, not sure?

I am having issues with polymorphic mass indexing with highly nested objects here is a simplified version:

@Entity
@DiscriminatorColumn(name = "baseEntityType")
@Inheritance(strategy = InheritanceType.JOINED)
public class BaseEntity implements Serializable {
	@Id
	@DocumentId
	@GenericField(aggregable = Aggregable.YES, sortable = Sortable.YES, searchable = Searchable.YES)
	@GeneratedValue(strategy = GenerationType.SEQUENCE)
	protected Long id;
}
@Entity
public class BaseVehicle extends BaseEntity {
	private String bodyType;
}
@Entity
public class DooredVehicle extends BaseVehicle {
	private String doorType;
}
@Entity
public class BedVehicle extends DooredVehicle {
	private String bedType;
}
@Entity
@Indexed
public class Car extends DooredVehicle {
	public String carHood;
}
@Entity
@Indexed
public class Truck extends BedVehicle {
	private String truckRoof;
}

When doing a mass indexing on Car’s I get the following two queries:

Car Id Loading Query:

select
   b1_0.id
from
   BaseEntity b1_0
      join BaseVehicle b1_1 on b1_0.id=b1_1.id
      join DooredVehicle b1_2 on b1_0.id=b1_2.id
      left join BedVehicle b1_3 on b1_0.id=b1_3.id
      left join Car b1_4 on b1_0.id=b1_4.id
      left join Truck b1_5 on b1_0.id=b1_5.id
where
   b1_0.baseEntityType in (?)

Car Entity Loading Query:

select
   b1_0.id,
   b1_0.baseEntityType,
   b1_1.bodyType,
   b1_2.bedType,
   b1_3.doorType,
   b1_4.carHood,
   b1_5.truckRoof
from
   BaseEntity b1_0
      left join BaseVehicle b1_1 on b1_0.id=b1_1.id
      left join BedVehicle b1_2 on b1_0.id=b1_2.id
      left join DooredVehicle b1_3 on b1_0.id=b1_3.id
      left join Car b1_4 on b1_0.id=b1_4.id
      left join Truck b1_5 on b1_0.id=b1_5.id
where b1_0.id = any (?)

The truck queries are very similar:

Truck Id Loading Query:

select
   b1_0.id
from
   BaseEntity b1_0
      join BaseVehicle b1_1 on b1_0.id=b1_1.id
      join DooredVehicle b1_2 on b1_0.id=b1_2.id
      left join BedVehicle b1_3 on b1_0.id=b1_3.id
      left join Car b1_4 on b1_0.id=b1_4.id
      left join Truck b1_5 on b1_0.id=b1_5.id
where
   b1_0.baseEntityType in (?)

Truck Entity Loading Query:

select
   b1_0.id,
   b1_0.baseEntityType,
   b1_1.bodyType,
   b1_2.bedType,
   b1_3.doorType,
   b1_4.carHood,
   b1_5.truckRoof
from
   BaseEntity b1_0
      left join BaseVehicle b1_1 on b1_0.id=b1_1.id
      left join BedVehicle b1_2 on b1_0.id=b1_2.id
      left join DooredVehicle b1_3 on b1_0.id=b1_3.id
      left join Car b1_4 on b1_0.id=b1_4.id
      left join Truck b1_5 on b1_0.id=b1_5.id
where
   b1_0.id = any (?)

Where as when doing a regular hibernate query for a Car or Truck I get the following two queries:

entityManager.find(Car.class, id);
select
   c1_0.id,
   c1_1.bodyType,
   c1_2.doorType,
   c1_0.carHood
from
   Car c1_0
      join BaseVehicle c1_1 on c1_0.id=c1_1.id
      join DooredVehicle c1_2 on c1_0.id=c1_2.id
where
   c1_0.id=?
entityManager.find(Truck.class, id);
select
   t1_0.id,
   t1_1.bodyType,
   t1_2.doorType,
   t1_3.bedType,
   t1_0.truckRoof
from
   Truck t1_0
      join BaseVehicle t1_1 on t1_0.id=t1_1.id
      join DooredVehicle t1_2 on t1_0.id=t1_2.id
      join BedVehicle t1_3 on t1_0.id=t1_3.id
where
   t1_0.id=?

Car hibernate query does not include the BedVehicle table, nor the Truck table, and the issue is the mass indexing query does include those tables. As in I would expect the Car query for mass indexing to not include the BedVehicle nor the Truck tables as part of the query, for loading entities.

This is not a problem for small datasets but when you have hundreds of objects and they all extend the “BaseEntity” class, then the mass indexing query, basically joins ALL tables in the database together. Here is a link to the sample project running almost the latest quarkus.

Olin Blodgett / Inheritance · GitLab branch hibernate_search

Hey,

I suspect Hibernate Search is trying to “optimize” things: it sees it needs to index both Car and Truck, and probably a few other entities, and ends up querying BaseEntity directly, to load them all in a single query and avoid having too many concurrent queries that would target the same tables (baseentity, basevehicle, …).

In general it helps, because this setup where all entities extend a single one is quite extreme. Usually you’d make BaseEntity a @MappedSuperclass, and you wouldn’t have this problem.

So, it’s working as intended. Now, can we improve things?

We could remove the optimization altogether, but that would likely affect most users negatively.

We could provide a switch to disable the optimization, but… more tests, more corner cases… I’d rather not if I can avoid it.

Maybe we could adjust the optimization to only do this kind of things when we see that targeting a supertype will result in the exact same scope as targeting entities individually (i.e. we don’t need to exclude any other subtype)? I.e. in your case we’d end up querying Car and Truck separately, because their parent classes (BedVehicle/DooredVehicle) are non-indexed and non-abstract (they can have instances).
If you made BedVehicle/DooredVehicle abstract, and assuming those don’t have any other subclasses, we’d end up querying BedVehicle directly (which would essentially just query Car and Truck).

Does that make sense?

@mbekhta any opinion?

This is not a problem for small datasets but when you have hundreds of objects and they all extend the “BaseEntity” class, then the mass indexing query, basically joins ALL tables in the database together. Here is a link to the sample project running almost the latest quarkus.

This is weird, I’d expect the DB query plan to be optimized to avoid this impacting performance too much, e.g. by applying the where clause first and doing the join after that. At least on PostgreSQL I’ve seen that sort of things happen.
Did you try running these problematic queries manually, and then without the extra joins, to see if this really improves performance?

I’ve looked into this a bit. In the shared example, there’s

MassIndexer indexer = searchSession.massIndexer(Car.class)

In this case, I guess there’s also no need to do the optimization. And then we can also try to make HibernateOrmUtils#targetsAllConcreteSubTypes to look only at indexed types (if I got your idea correctly :smiley:).

So yes the @MappedSuperclass is what we were doing, but it takes all the fields in “BaseEntity” and duplicates them across all the tables. This accentuates the issue, postgres’ setting of: “columns in a result set 1,664” when joining basically the whole database into one query, hibernate puts all the columns in the select part of the query and we can’t perform the query. At least with all the duplicated fields in the same table “BaseEntity” is helps on this front. I suppose we could change this number.

After this change @MappedSuperclass@Entity now the hibernate queries entityManager.find(Truck.class, id); seem to be very optimized but mass indexing fails with the column number issue, and for other queries they just don’t return.

I guess my question is when mass indexing a “Truck” why does it need to bring in other tables “Car” that are not even related to the Truck and all the columns in the select that are not going to be used in Truck objects inside ES? I could understand this if I was mass indexing say DooredVehicle and it would need to get all its subclasses.

My assumption would be that which ever object is being indexed they query would join all its subclass tables, and only its ancestor tables for the query. Not uncle and cousin tables which are not part of the resulting object.

Here is a class diagram so you can see my struggle.

I guess maybe I don’t understand how “abstract” effects the query, but I will play around with that and see what happens.

As a rule, I would use entity inheritance sparingly, only for cases where it makes sense from a business model perspective, and using abstract classes wherever possible. Entity inheritance for technical reasons is a big no-no in my books: that’s what mapped superclasses are for. And if there is technical data that you need to store in the same table for all entities, well that should probably be a dedicated entity. Of course that’s all just general guidelines, and there may be exceptions, but still… those are reasonable defaults.

IMO, given your model, EntityBase and AuditedObject definitely should be mapped superclasses. If you need e.g. audit information to be in a single table for all entities, maybe you need a dedicated entity for that, one that would be the target of an association from AuditedObject?

EDIT: And given the number of classes, in some entity hierarchies you might want to opt for single-table or table-per-class inheritance. You’d need to check, but performance tests might show you that works better in your specific case (you’d really need tests, as results will depend on your model and dataset).

Regardless… As Marko mentioned, there seems to be sub-optimal things going on in Hibernate Search, so we’ll look into that on our side, too: [HSEARCH-5062] - Hibernate JIRA

As seen in the class hierarchy we deal with biological data. And there are lots of relationships between objects. So for instance Gene’s may have a AnatomicalTerm vs having multiple fields for each possible subclass of AnatomicalTerm.

Also we have to track changes to every object that we catalog hence why we are using the “AuditedObject”. Every object could have a One2One relationship with AuditedObject i’m not sure what the impact would be on the queries / column counts. The one thing that is painful in the MappedSuperclass was when there is a change to the AuditedObject the db migration is huge.

Another use case that we will have to deal with is a lookup of a “Curie” at the CurieObject level. Someone provides us with an curie identifier and we have to return the full object for whatever subclass it is. Fortunately that subclass structure is only about half the size of the whole.

I think switching the OntologyTerm structure over to a single table makes a lot of sense, but not sure if we can mix inheritance types.

Anyway thanks for looking into this.