Avoiding unnecessary Join with Many-to-many relation


#1

I have two entities, Article and Category, every article can have multiple categories, so this is a classic many-to-many relationship.

My code looks like this (written in Kotlin, but should not be relevant):

@Entity(name = "ArticleEntity")
@Table(name = "article", uniqueConstraints = [
    UniqueConstraint(name = ArticleEntity.SLUG_KEY_NAME, columnNames = ["slug"])
])
class ArticleEntity(
    @Id
    @GeneratedValue
    var id: UUID? = null,

    @ManyToMany
    @JoinTable(
        name = "article_categories",
        joinColumns = [JoinColumn(name = "article_id", foreignKey = ForeignKey(name = "article_categories_article_id_fk"))],
        inverseJoinColumns = [JoinColumn(name = "category_id", foreignKey = ForeignKey(name = "article_categories_category_id_fk"))]
    )
    var categories: MutableSet<CategoryEntity>?
)

@Entity
@Table(name = "category")
class CategoryEntity(
    @Id
    @GeneratedValue
    val id: UUID?,

    @ManyToMany(fetch = FetchType.LAZY, mappedBy = "categories")
    val articles: Set<ArticleEntity>?
)

I am trying to select only articles, which are in a list of categories (i.e. “give me only articles in categories a, b and c”).
I turned to the JPA CriteriaQuery API (through Spring Specifications, not really relevant either):

override fun createPredicates(predicates: MutableList<Predicate>, root: Root<ArticleEntity>, query: CriteriaQuery<*>, cb: CriteriaBuilder) {
    val category = listOf(categoryId1, categoryId2, ...)
    val sq = query.subquery(Int::class.java).select(cb.literal(1))
    val sqRoot = sq.correlate(root)
    val sqJoin = sqRoot.join<ArticleEntity, CategoryEntity>("categories")
    sq.where(
        sqJoin.get<UUID>("id").`in`(category)
    )

    predicates += cb.exists(sq)
}

This should only generate an inner join of the JoinTable (article_categories), since my where condition is only based on the ID (which is present in the JoinTable). However Hibernate generates two joins, also joining the entire category table:

select
  articleent0_.id              as id1_0_,
from article articleent0_
where exists(select 1
             from article_categories categories1_, category categoryen2_
             where articleent0_.id = categories1_.article_id and categories1_.category_id = categoryen2_.id and (categories1_.id in (?)))
limit ?;

Is there a way to avoid this unnecessary join? I assume it does not matter too much, but I’d like to avoid it if possible. Curiously, it does not happen if I perform a similar query on a many-to-one relation.

Thanks for any insights.


#2

It’s because you explicitly told Hibernate to do a join:

val sqJoin = sqRoot.join<ArticleEntity, CategoryEntity>("categories")

#3

Well, okay, touché, I should have worded my question better.
How can I avoid this join then? I only need it to access the ID column, but that already exists in the join table, but I can’t access the column in it (or can I?).


#4

The default many-to-many association hides the intermediary table, so I’m not sure if you can reference it without joining the Category.

You can avoid that extra join by mapping the intermediary table as an entity: CategoryArticle. This article explains how to do that.

Then, you query will be as easy as this:

select a
from Article a
join a.categoryArticles ca
where ca.category.id in :categoryIds

This should avoid the extra join to Category.


#5

Thanks, I see.
Are there any performance implications when switching from a @ManyToMany to this “manual many-to-many”?


#6

For Lists, you will gain better performance. For Sets, there’s no difference.


#7

Thanks again, will definitely keep an eye on your blog :slight_smile:
I was concerned, since Hibernate now needs to manage an additional entity type, but that probably falls into the category of premature optimization.
Case closed! :slight_smile:


#8

That’s not a problem. It always boils down to what SQL statements are generated.