Analyzing slow lazy loading in Entity Framework 6

c# entity-framework entity-framework-6 lazy-loading performance

Question

I have a POCO class with a lazy-loaded collection, which currently takes 10 seconds to load, with about 10.000 entries; using Entity Framework 6.3 and SQL Server 2016.

I realize that loading 10.000 entries takes some time, and perhaps these times are to be expected. But I cannot figure out where the time is actually spent.

I know I can disable lazy-loading. But right now I simply want to understand this problem better. How can I see where the time is lost, and which exact SQL statements take a long time, or are executed too often?


Loading the property, e.g. via Console.WriteLine(parent.Children.First().ID) takes 10 seconds.

If I enable logging via dbContext.Database.Log = s => Console.WriteLine(s);, I only see a single SQL command:

SELECT
    [Extent1].[Foo] AS [Foo], ...
    [Extent1].[Parent_ID] AS [Parent_ID],
    FROM [dbo].[Child] AS [Extent1]
    WHERE [Extent1].[Parent_ID] = @EntityKeyValue1

-- Completed in 1 ms with result: SqlDataReader

The command completes in a millisecond. There do not appear to be 10.000 follow-up SQL commands that would explain the long loading times.

When I query the database in SQL Management Studio, I get similar times: displaying all 10.000 rows takes only a few milliseconds.

I implemented a DbCommandInterceptor, to find any long-running or repeated sql commands I may have missed, but found nothing either.

The property is configured like this:

public class Parent {
    ...
    public virtual ICollection<Children> Children { get; set; }
}
public class Child {
    ...
    public int Parent_ID { get; set; }
    public virtual Parent Parent{ get; set; }
}

...
modelBuilder.Entity<Child>()
            .HasRequired(a => a.Parent)
            .WithMany(b => b.Children)
            .HasForeignKey(c => c.Parent_ID)
            .WillCascadeOnDelete(false);

The database table "child" used for testing contains only the 10.000 rows that should be returned by the lazy-loaded property, and no other rows with any other foreign key.

1
0
9/25/2019 10:56:23 AM

Popular Answer

When EF needs to pull back a large set of data, either eager or lazy loaded, it not only needs to execute the SQL query, but then allocate those entities and resolve any potential references contained in each of those entities against any tracked entities it already knows about. I suspect the crux of time you are seeing being spent may be due to the circular reference from Child back to Parent. When you "touch" the Children collection on the parent, EF's proxy kicks in and triggers the SELECT * from Children WHERE ParentId = 1 which executes quite fast. It then goes and starts allocating those 10k entries, and for each one, it's checking any references to link to a known tracked entity. Since Parent #1 is tracked, each of those entities needs to be wired up to the Parent entity. Obviously if your model includes more references it will check each and every one of them on every entity against it's cache to assign them if it has them. That takes time.

The best advice I can give is:

  • Don't use circular references unless you really need them. If you want to get Parents having a particular child, you can still do that through Parent without the circular reference.

    • Avoid lazy loading in general. Leverage projection with Select and eager loading where necessary. Lazy loading can, and will lead to various issues. One serious one is when dealing with collections. For instance if you were to load a subset of children:

.

var children = context.Children.Where(x => x.Date > start && x.Date < end);

which returned 100 children and then had some code that went to serialize that collection, what would end up happening is there would be up to 100 lazy load SQL calls fired, 1 for each unique parent. The serializer would touch the first child, then SELECT * FROM Parents WHERE ParentId = 31, then touch the second child, SELECT * FROM Parents Where ParentId = 12, then the 3rd, then the 4th, and so on. Each reference on each row would potentially trigger a SQL query. As opposed to an eager load which would have leveraged a JOIN to bring back the Child data and it's Parent. With more complex objects, this can be an absolute killer. Even just having code tripping lazy loading while evaluating logic against a set of entities can be a killer once the code is in production and there are multiple concurrent requests.

Lazy loading is an interesting concept, but a dangerous one from a performance point of view. Projection to ViewModels/Anonymous types will save numerous headaches down the road.

For instance if you need all 10k children, but only need a few properties to send to a view or perform a calculation...

var parent = context.Parents.Where(x => x.ParentId == parentId)
   .Select( new 
   {
      x.ParentId,
      x.ParentName,
      Children = x.Children.Select(c => new 
      {
          c.ChildId,
          c.ChildName
      }).ToList()
   }).Single();

This is a very simple example that loads a parent ID & Name, along with a simple list of children (Id and Names) into anonymous types. If you want something that can be returned to a view or API call then define and populate a POCO ViewModel/DTO class. The data can be summarized, sorted, paged, however you like.

The advantage here is you execute a faster query because the necessary data is fetched, and only the data you need. This can leverage indexes better to pull the necessary data. You also avoid any serialization issues /w circular references or lazy load hits to other referenced entities. You also minimize the amount of storage needed on the server and clients to represent the results, and reduce the data over the wire.

1
9/25/2019 12:38:56 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow