Why is Entity Framework so slow to add multiple items in one SaveChanges()?

c# entity-framework entity-framework-6

Question

This is a follow-up from a previous question, in which I was trying to figure out the main cause for my code running slowly. I think I've narrowed it down to a minimal example below. I have a basic database structure as follows:

public class Foo
{
    public int Id { get; set; }
    public string Bar { get; set; }
}

public class FooContext : DbContext
{
    public DbSet<Foo> Foos { get; set; }
}

Now, if I had a list of Foo objects, and wanted to add them to the database, the suggested way would be to use AddRange(). But I noticed it was taking a long time, and is affected poorly by the number of items in the collection, even with a small amount like 200. So I wrote it out manually, and viola, it runs faster!

class Program
{
    static void Main(string[] args)
    {
        var foos = Enumerable.Range(0, 200).Select(index => new Foo { Bar = index.ToString() });

        // Make sure the timing doesn't include the first connection
        using (var context = new FooContext())
        {
            context.Database.Connection.Open();
        }

        var s1 = Stopwatch.StartNew();
        using (var context = new FooContext())
        {
            context.Foos.AddRange(foos);
            context.SaveChanges();
        }
        s1.Stop();

        var s2 = Stopwatch.StartNew();
        using (var context = new FooContext())
        {
            // Ignore the lack of sanitization, this is for demonstration purposes
            var query = string.Join(";\n", foos.Select(f => "INSERT INTO Foos ([Bar]) VALUES (" + f.Bar + ")"));
            context.Database.ExecuteSqlCommand(query);
        }
        s2.Stop();

        Console.WriteLine("Normal way: {0}", s1.Elapsed);
        Console.WriteLine("Hard way  : {0}", s2.Elapsed);
        Console.ReadKey();
    }
}

My initial thought was that Entity Framework might be using a separate transaction for each entry, but logging the SQL shows that's not the case. So why is there such a difference in execution time?

1
4
7/21/2017 3:20:36 AM

Accepted Answer

While doing some research on your question I came across this enlightening article: http://www.codinghelmet.com/?path=howto/bulk-insert

Here's a quote:

Each object that was inserted required two SQL statements - one to insert a record, and additional one to obtain identity of the new record

This becomes a problem when inserting multiple records. A problem which is intensified by the fact that each record is inserted one at a time (But this is outside the context of your question since you're already testing the one by one insert). So if you're inserting 200 records, that's 400 sql statements being executed one by one.

So from my understanding EF is simply not built for bulk insertion. Even if it's as simple as inserting 200 records. Which to me seems like a big let down.

I started thinking, "Then what is EF good for anyway. It can't even insert a couple of records". Well i'll give EF props in two areas:

  1. Select Queries: It's very easy to write your queries and get data into your application quickly.
  2. Simplifying insertion of complex records. If you've ever had a table with a lot of foreign keys and you've tried inserting all the linked records in one transaction, you know what I'm talking about. Thankfully EF inserts each record in order and links all the related records in one single transaction for you. But as mentioned above, this comes at a cost.

So simply put, it seems like, if you have an operation that requires inserting a bunch of records, it might be best to use SqlBulkCopy. Which can insert thousands of records in seconds.

I know this might not be the answer you want to hear, because believe me it upsets me as well since I use EF alot, but I don't see any way around it

5
8/1/2017 2:06:39 AM

Popular Answer

Since you can't live with it and can't live without it, have you considered calling SaveChangesAsync() instead?

I searched far and wide to find a way to disable primary key synchronization but could not find any for EF 6 and lesser.

EF core passes a true from DBContext.SaveChanges() to what I believe eventually triggers this synchronization. The other overload allows callers to pass false as the controlling parameter.



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow