Django threaded comments (MPTT) optimization issue

I’m using MPTT model to build a nested comments system in my Django Rest project.
Everything works so far but I experience a huge performance issue when querying a database for API response.

The schema of comments layout:

Root Comment 
    /      \
Comment   Comment
  /   \
Child Child

Whenever I’m querying all comments as an array in API response recursively:

{
 "comments": [
    { 
      "id": 1,
      "children": [
                 "id": 4,
                 "children": [
                               "id": 5,
                               "children": ...
                              ]
                   ]
    }

In this case even with “prefetch_related” and “select_related” I’m getting over 50 queries for 6 comments total (2 parent(root) comment and 4 nested comments).

Is there an efficient way to optimize this query?
Even 2 levels comments system is acceptable for me but I can’t figure out the right implementation for this:

Comment
     \ 
   Child1, Child2, Child3, Child4

My code for reference:

class Comment(MPTTModel):

    profile = models.ForeignKey(Profile, related_name="comments", on_delete=CASCADE)
    post = models.ForeignKey(Post, related_name="comments", on_delete=CASCADE)
    parent = TreeForeignKey("self", related_name="children", null=True, on_delete=models.CASCADE)
    text = models.TextField(max_length=350)
    date_published = models.DateTimeField(auto_now_add=True, verbose_name="Date published")


comments = Comment.objects.filter(level=0) \
                .select_related("profile", "post", "profile__specialization") \
                .prefetch_related("children", "parent") \
                .order_by("-date_published")
                                         

UPDATE 1:
I suspect the problem is not only in nested comments only but in prefetch/select related method themselves. Django Debug Tool shows a lot of similar queries for profile and its ForeignKey fields as well.

 class Profile(models.Model):
    first_name = models.CharField(max_length=32, blank=True)
    last_name = models.CharField(max_length=32, blank=True)
    username = models.CharField(max_length=32, unique=True)
    country = models.CharField(max_length=50, blank=True)
    city = models.CharField(max_length=50, blank=True)
    role = models.ForeignKey(
    user = models.OneToOneField(
        User, related_name="user_profile", on_delete=models.CASCADE
    )
    specialization = models.ForeignKey(
        Specialization,
        related_name="user_specialization",
        on_delete=models.SET_NULL,
        null=True,
    )

For optimization, try using a hybrid approach, in this approach orf the main hierarchical structure, you utilize MPTT, but you additionally keep a denormalized field for quicker querying.

Your Comment model should be updated to include a parent_comment field that explicitly references the parent comment. Then, anytime a new comment is added or the comment hierarchy changes, update this field. By doing so, retrieval can be improved while MPTT is still used for other characteristics like tree traversal.

class Comment(MPTTModel):
   profile = models.ForeignKey(Profile, related_name="comments", on_delete=CASCADE)
   post = models.ForeignKey(Post, related_name="comments", on_delete=CASCADE)
   parent = TreeForeignKey("self", related_name="children", null=True, on_delete=models.CASCADE)
   parent_comment = models.ForeignKey("self", related_name="nested_comments", null=True, on_delete=models.CASCADE)
   text = models.TextField(max_length=350)
   date_published = models.DateTimeField(auto_now_add=True, verbose_name="Date published")

# Retrieve all comments and their children efficiently
comments = Comment.objects.filter(post=my_post, parent_comment=None) \
          .select_related("profile") \
          .prefetch_related("nested_comments") \
          .order_by("date_published")

Hope it works 🙂

Leave a Comment