EF Core multiple left joins

entity-framework-core

Question

I'm using EF Core in a asp.net mvc core 1.1.0 project and have a rather complex query.

_context
.Profiles
.Include(p => p.Blog)
 .ThenInclude(b => b.Network)
.Include(p => p.Blog)
 .ThenInclude(i => i.AgeDistributions)
  .ThenInclude(i => i.AgeRange)
.Include(p => p.Blog)
 .ThenInclude(b => b.GenderDistributions)
.Include(p => p.Instagram)
 .ThenInclude(i => i.Network)
.Include(p => p.Instagram)
 .ThenInclude(i => i.AgeDistributions)
  .ThenInclude(i => i.AgeRange)
.Include(p => p.Instagram)
 .ThenInclude(b => b.GenderDistributions)
.Include(p => p.Youtube)
 .ThenInclude(y => y.Network)
.Include(p => p.Youtube)
 .ThenInclude(i => i.AgeDistributions)
  .ThenInclude(i => i.AgeRange)
.Include(p => p.Youtube)
 .ThenInclude(b => b.GenderDistributions)
.Include(p => p.Snapchat)
 .ThenInclude(s => s.Network)
.Include(p => p.Musically)
.Include(p => p.ProfileCategories)
 .ThenInclude(pc => pc.Category)
.Include(p => p.Tags)
 .ThenInclude(tag => tag.Tag)
.Where(p => !p.Deleted);

Each social platform can have any kind of statistics. For example AgeDistributions are modelled using a base class that has a PlatformId and each derived {Platform}AgeDistribution specifies the navigation property so to set up foreign keys correctly.

public class AgeInterval {
  public int Id { get; set; }
  // At most five length. -18, 18-24, ..., 65-
  public string Interval { get; set; }
}

public class PlatformAgeStatistics {
  public int PlatformId { get; set; }
  public int IntervalId { get; set; } 
  public AgeInterval Interval { get; set; } 
  public decimal Distribution { get; set; }
}

public class InstagramAgeStatistics : PlatformAgeStatistics  {
  [ForeignKey("PlatformId")]
  public Instagram Platform { get; set; } // 
}

Above query sometimes very long time (db execution timeout after 30 seconds) and inspecting the sql make me think either I have a modelling issue that EF cannot determine correctly or EF is just generating suboptimal SQL. The result set is paginated using skip and take and currently fetching ten records take time.

This is the first SQL that is executed

SELECT -- Emitted
FROM [Profiles] AS [p]
LEFT JOIN [BlogChannels] AS [b] ON [b].[ProfileId] = [p].[Id]
LEFT JOIN [InstagramChannels] AS [i] ON [i].[ProfileId] = [p].[Id]
LEFT JOIN [YoutubeChannels] AS [y] ON [y].[ProfileId] = [p].[Id]
LEFT JOIN [BlogChannels] AS [b2] ON [b2].[ProfileId] = [p].[Id]
LEFT JOIN [InstagramChannels] AS [i2] ON [i2].[ProfileId] = [p].[Id]
LEFT JOIN [YoutubeChannels] AS [y2] ON [y2].[ProfileId] = [p].[Id]
LEFT JOIN [BlogChannels] AS [b4] ON [b4].[ProfileId] = [p].[Id]
LEFT JOIN [Networks] AS [n] ON [b4].[NetworkId] = [n].[Id]
LEFT JOIN [InstagramChannels] AS [i4] ON [i4].[ProfileId] = [p].[Id]
LEFT JOIN [Networks] AS [n0] ON [i4].[NetworkId] = [n0].[Id]
LEFT JOIN [YoutubeChannels] AS [y4] ON [y4].[ProfileId] = [p].[Id]
LEFT JOIN [Networks] AS [n1] ON [y4].[NetworkId] = [n1].[Id]
LEFT JOIN [SnapchatChannels] AS [s] ON [s].[ProfileId] = [p].[Id]
LEFT JOIN [Networks] AS [n2] ON [s].[NetworkId] = [n2].[Id]
LEFT JOIN [MusicallyChannels] AS [m] ON [m].[ProfileId] = [p].[Id]
WHERE [p].[Deleted] = 0
ORDER BY [p].[FullName], [p].[Id], [b].[Id], [i].[Id], [y].[Id], [b2].[Id], [i2].[Id], [y2].[Id]
OFFSET @__p_0 ROWS FETCH NEXT @__p_1 ROWS ONLY

Then follows more queries that doesn't look quite right

SELECT [y3].[Gender], [y3].[ChannelId], [y3].[Distribution]
FROM [YoutubeGenderDistribution] AS [y3]
INNER JOIN (
    SELECT DISTINCT [t7].*
    FROM (
        SELECT [p].[FullName], [p].[Id], [b].[Id] AS [Id0], [i].[Id] AS [Id1], [y].[Id] AS [Id2], [b2].[Id] AS [Id3], [i2].[Id] AS [Id4], [y2].[Id] AS [Id5]
        FROM [Profiles] AS [p]
        LEFT JOIN [BlogChannels] AS [b] ON [b].[ProfileId] = [p].[Id]
        LEFT JOIN [InstagramChannels] AS [i] ON [i].[ProfileId] = [p].[Id]
        LEFT JOIN [YoutubeChannels] AS [y] ON [y].[ProfileId] = [p].[Id]
        LEFT JOIN [BlogChannels] AS [b2] ON [b2].[ProfileId] = [p].[Id]
        LEFT JOIN [InstagramChannels] AS [i2] ON [i2].[ProfileId] = [p].[Id]
        LEFT JOIN [YoutubeChannels] AS [y2] ON [y2].[ProfileId] = [p].[Id]
        WHERE [p].[Deleted] = 0
        ORDER BY [p].[FullName], [p].[Id], [b].[Id], [i].[Id], [y].[Id], [b2].[Id], [i2].[Id], [y2].[Id]
        OFFSET @__p_0 ROWS FETCH NEXT @__p_1 ROWS ONLY
    ) AS [t7]
) AS [y20] ON [y3].[ChannelId] = [y20].[Id5]
ORDER BY [y20].[FullName], [y20].[Id], [y20].[Id0], [y20].[Id1], [y20].[Id2], [y20].[Id3], [y20].[Id4], [y20].[Id5]

Another one that "looks" more correct

SELECT [b0].[AgeRangeId], [b0].[ChannelId], [b0].[Distribution], [a].[Id], [a].[Range]
FROM [BlogAgeDistribution] AS [b0]
INNER JOIN (
    SELECT DISTINCT [t2].*
    FROM (
        SELECT [p].[FullName], [p].[Id], [b].[Id] AS [Id0]
        FROM [Profiles] AS [p]
        LEFT JOIN [BlogChannels] AS [b] ON [b].[ProfileId] = [p].[Id]
        WHERE [p].[Deleted] = 0
        ORDER BY [p].[FullName], [p].[Id], [b].[Id]
        OFFSET @__p_0 ROWS FETCH NEXT @__p_1 ROWS ONLY
    ) AS [t2]
) AS [b1] ON [b0].[ChannelId] = [b1].[Id0]
LEFT JOIN [AgeRanges] AS [a] ON [b0].[AgeRangeId] = [a].[Id]
ORDER BY [b1].[FullName], [b1].[Id], [b1].[Id0]

Any idea why EF joins all the other platforms when requesting the statistics for e.g. Instagram.

Thanks!

Edit:

Interestingly the first query for Age generates a join with all three

SELECT [y0].[AgeRangeId], [y0].[ChannelId], [y0].[Distribution], [a1].[Id], [a1].[Range]
FROM [YoutubeAgeDistribution] AS [y0]
INNER JOIN (
    SELECT DISTINCT [t4].*
    FROM (
        SELECT [p].[FullName], [p].[Id], [b].[Id] AS [Id0], [i].[Id] AS [Id1], [y].[Id] AS [Id2]
        FROM [Profiles] AS [p]
        LEFT JOIN [BlogChannels] AS [b] ON [b].[ProfileId] = [p].[Id]
        LEFT JOIN [InstagramChannels] AS [i] ON [i].[ProfileId] = [p].[Id]
        LEFT JOIN [YoutubeChannels] AS [y] ON [y].[ProfileId] = [p].[Id]
        WHERE [p].[Deleted] = 0
        ORDER BY [p].[FullName], [p].[Id], [b].[Id], [i].[Id], [y].[Id]
        OFFSET @__p_0 ROWS FETCH NEXT @__p_1 ROWS ONLY
    ) AS [t4]
) AS [y1] ON [y0].[ChannelId] = [y1].[Id2]
LEFT JOIN [AgeRanges] AS [a1] ON [y0].[AgeRangeId] = [a1].[Id]
ORDER BY [y1].[FullName], [y1].[Id], [y1].[Id0], [y1].[Id1], [y1].[Id2]
1
2
3/1/2017 2:24:50 PM

Popular Answer

The reason EF Core is querying all platforms despite you wishing it'd only query a specific platform is due to how the query is coded. You have combined all of them in the same IQueryable. Take advantage of building your IQueryable over multiple steps in C#, before executing the IQueryable.

var query = _context
.Profiles.Where(p => searching.Contains(p.Name) && !p.Deleted)

if(searching.Contains("Blog"))
{
  query.Include(p => p.Blog)
   .ThenInclude(b => b.Network)
  .Include(p => p.Blog)
   .ThenInclude(i => i.AgeDistributions)
    .ThenInclude(i => i.AgeRange)
  .Include(p => p.Blog)
   .ThenInclude(b => b.GenderDistributions)
}

if(searching.Contains("Instagram"))
{
 .Include(p => p.Instagram)
  .ThenInclude(i => i.Network)
 .Include(p => p.Instagram)
  .ThenInclude(i => i.AgeDistributions)
   .ThenInclude(i => i.AgeRange)
 .Include(p => p.Instagram)
  .ThenInclude(b => b.GenderDistributions)
}

...

var results = query.ToList();

The last thing to remember is to filter as early as possible. Which is why I put a "searching.Contains(p.Name)" at the very beginning. The smaller the memory footprint your query needs to execute. The faster it should execute.

The final note I can add, is that EF Core is still fairly new and not everything will execute entirely within the database. In some cases, it builds a set of queries to execute independently and then combine them into a final result set in the calling client context.

1
5/16/2017 5:36:36 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow