LINQ Include slowing down performance when searching

asp.net-core c# entity-framework-core linq sql

Question

We have the following method that allows us to search a table of Projects for a DataGrid:

public async Task<IEnumerable<Project>> GetFilteredProjects(string searchString)
{
    var projects = _context.Projects.Where(p => p.Current);

    projects.Include(p => p.Client);
    projects.Include(p => p.Architect);
    projects.Include(p => p.ProjectManager);

    if (!string.IsNullOrEmpty(searchString))
    {
        projects = projects
            .Where(p => p.NormalizedFullProjectName.Contains(searchString)
                    || p.Client.NormalizedName.Contains(searchString)
                    || p.Architect.NormalizedFullName.Contains(searchString)
                    || p.ProjectManager.NormalizedFullName.Contains(searchString));
    }

    projects = projects.OrderBy(p => p.Name).Take(10);

    return await projects.ToListAsync();
}

If we do not use the Include on the projects then the searching is instantaneous. However, after adding them in the search can take over 3 seconds.

We need to include the other Entities to allow the Users to search on them should they want to.

How are we able to improve performance but still keep the Include to allow searching on them?

Without Incldue the method looks like so:

public async Task<IEnumerable<Project>> GetFilteredProjects(string searchString)
{
    var projects = _context.Projects.Where(p => p.Current);

    if (!string.IsNullOrEmpty(searchString))
    {
        projects = projects
            .Where(p => p.Name.Contains(searchString));
    }

    projects = projects.OrderBy(p => p.Name).Take(10);

    return await projects.ToListAsync();
}

Without Include the performance looks like so:

enter image description here

With Include:

enter image description here

1
2
5/30/2018 10:44:55 AM

Popular Answer

The short answer is that including all the extra entities takes time and effort, thus increasing the load times.

However, there is a flaw in your assumption:

We need to include the other Entities to allow the Users to search on them should they want to.

That is not (necessarily) correct. Filtering happens on the database level. Include tells Entity Framework to load the records from the database. These are two separate things.

Look at the following examples:

_context.Projects
        .Include(p => p.Architect)
        .Where(p => p.Architect.Name == "Bob")
        .ToList()

This will give you a list of projects (and their architects) who have an architect named Bob.

_context.Projects
        .Where(p => p.Architect.Name == "Bob")
        .ToList()

This will give you a list of projects (without architects) who have an architect named Bob; but it does not actually load the Architect object into memory.

_context.Projects
        .Include(p => p.Architect)
        .ToList()

This will give you a list of projects (and their architects). It will contain every project, the list is not filtered.


You only need to use Include when you want to do in-memory filtering, i.e. on a collection that was already loaded from the database.

Whether that is the case for you depends on this part:

    projects = projects
        .Where(p => p.NormalizedFullProjectName.Contains(searchString)
                || p.Client.NormalizedName.Contains(searchString)
                || p.Architect.NormalizedFullName.Contains(searchString)
                || p.ProjectManager.NormalizedFullName.Contains(searchString));

If NormalizedFullProjectName (and the other properties) are database columns, then EF is able to perform the filtering at the database level. You do not need the Include for filtering the items.

If NormalizedFullProjectName (and the other properties) are not database columns, then EF will first have to load the items in memory before it can apply the filter. In this case, you do need the Include, because the architects (and others) need to be loaded in memory.


If you are only loading the related entities for filtering purposes (not display purposes), and you are doing the filtering on the database level; then you can simply remove the include statements.

If you need those related entities to be loaded (for in-memory filtering, or for display purposes), then you can't easily remove the Include statements, unless you write a Select that specifies the fields you need.

For example:

_context.Projects
        .Select(p => new { Project = p, ArchitectName = p.Architect.Name })
        .ToList()

This will load the project entities (in their entirety) but only the name of the architect and none of the other properties. This can be a significant performance boost if your related entities have many properties that you currently do not need.

Note: The current example uses an anonymous type. I generally advocate creating a custom type for this; but that's unrelated to the performance issue we're addressing here.


Update

Based on your update, you seemingly imply that the intended filtering happens after the objects have been loaded from the database.

This is the source of your performance problems. You are fetching a lot of data but only show part of it. The data that does not get shown still needs to be loaded, which is wasted effort.

There are separate arguments for performance here:

  • Load everything once - Load all the data once (which might take a long time), but then allow the user to filter the loaded data (which is very fast)
  • Load chunks - Only load the data that matches the applied filters. If the user changes the filters, you load the data again. The first load will be much faster, but the subsequent filtering actions will take longer compared to in-memory filtering.

What you should do here is not my decision. It's a matter of priorities. Some customers prefer one over the other. I would say that in most cases, the second option (loading chunks) is the better option here, as it prevents needlessly loading a massive dataset if the user never looks through 90% of it. That's a waste of performance and network load.

The answer I gave applies to the "load chunks" approach.

If you decide to take the "load everything once" approach, then you will have to accept the performance hit of that initial load. The best you can do is severely limit the returned data columns (like I showed with the Select) in order to minimize the performance/network cost.

I see no reasonable argument to mix these two approaches. You'll end up with both drawbacks.

8
5/30/2018 11:00:32 AM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow