How to address performance issues when using Entity Framework Core and Include() for large amounts of data?

Question 1

I’m developing an application using Entity Framework Core (.NET 6) and I’m encountering performance issues when retrieving data using the GetQuestionByTest() method. I want to load data from TblMultipleChoices related to the TblQuestionTest entity, but the data is quite large, and this is slowing down the application’s response time.

Here is a portion of the code that is causing the problem:

public IEnumerable<TblQuestionTest> GetQuestionByTest(int tesId)
{
    return context.TblQuestionTests
                  .Include(e => e.Test)
                  .Include(e => e.TblMultipleChoices)
                  .Where(e => e.TestId == tesId);
}

How can I address this performance issue given the substantial amount of data? Are there any strategies or techniques I can apply to load the data more efficiently or avoid the delay that’s occurring?

Thank you

Question 2

If you actually need all of the entities then you can use AsSplitQuery() to remove the Cartesian Product. Otherwise consider using Projection /w Select() to reduce the amount of data you retrieve to just what your consuming view needs instead of returning Entities. One risk of returning Entities is that if the consumption causes them to be serialized this can trip you up with lazy loading as other non-eagerly loaded properties get “touched” and loaded.

For instance in MVC using the dreaded:

var model = @Html.Raw(Json.Encode(Model));

When working with view models something like that is useful and fine. When passing entities to a view that is a lazy loading landmine.

Edit: The next step would be to hook up a profiler to the database and run through the scenario where this data is loaded. Inspect the SQL that is generated by EF. If this is SQL Server, run those generated SQL statements with an Execution Plan enabled as this may suggest missing indexes in your schema. Looking at what the profiler reports can also give hints into other problems like unexpected lazy loading hits. For instance, without .AsSplitQuery() something like that should generate a single SQL statement with JOINs between the three tables. With .AsSplitQuery() you would see 3 separate queries. If you are seeing several queries scrolling by after this call as the page loads etc. then your code is tripping lazy load round trips to load relations that aren’t eager loaded. If you are using Select() then you should also be sure to include a .ToList() to materialize the query results.

Beyond that, what are the typical number of questions, and the # of multiple choices per question? Typically most multiple choice questions have 4 or 5 choices, so even if you have 100 questions, that is only 400-500 “choices” which EF should be able to fetch and materialize in a second or so. However, if MultipleChoices references another table that you “touch” when iterating through the questions and choices, the lazy loads would be kicked off for each individual choice, one by one, which can seriously slow down querying.

If you are still having issues, expand your question with the entity definitions in question, Test, QuestionTest, and MultipleChoice so we can see how these are related, and hints at what might be misconfigured or could be optimized. So far there seems to be a possible missing link as “QuestionTest” implies a joining table between a “Question” and a “Test”.

Question 3

u can try to improve the performance of loading
Use explicit loading instead of eager loading with Include(). Retrieve the TblQuestionTest entities first without related data, then use context.Entry(entity).Collection(x => x.TblMultipleChoices).Load() to load the TblMultipleChoices only when needed. This avoids loading unnecessary data upfront.

use projection to select only the needed data properties instead of entire entities. For example:

var questions = context.TblQuestionTests
                      .Where(e => e.TestId == tesId)
                      .Select(q => new {
                          q.Id,
                          q.Text,
                          Choices = q.TblMultipleChoices
                                      .Select(c => new {c.Id, c.Text}) 
                      });

use AsNoTracking() when you don’t need to track entity changes. This avoids the overhead of change tracking.

use explicit compilation of queries to cache and reuse query plans.

batch loading of related entities by ID can be more efficient than loading entire collections.

for very large datasets, consider using a stateless, read-optimized model with DTOs instead of EF entity classes.

set maximum depths for recursive relationships to avoid over-fetching data.

Leave a Comment Cancel reply