EF Core 3.0: duplicate insert with huge amount of data

.net-core c# entity-framework-core

Question

I am using EF Core with .NET Core 3.0. I am facing a very weird case. I am transferring data from SQLite to SQL Server.

using (var context1 = new Context1())
using (var context2 = new Context2())
{
    foreach(var tran in context1.Transactions.Distinct())
    {
        var trans = new Model{
            PersonID = tran.PersonID,
            CreatedDate = tran.TranDate,
            TranDate = tran.TranDate,
            CreatedBy = tran.CreatedBy
        };
        context2.Transactions.Add(trans);

    }

    int cc = context2.SaveChanges();
    int count1 = context1.Transactions.Count();
    int count2 = context2.Transactions.Count();
}

The problem is when inserting a small bulk of data (1,000 rows), data integrity is fine, but when inserting 200,000 records at once, the data becomes duplicated but the number of records that should be inserted is correct.

Also I noticed the duplicates happen in different cycles as their IDs are not in sequence and the inserts occur randomly!

ID      TranDate                   PersonID CreatedBy CreatedDate 
-------------------------------------------------------------------------------    
513842  2019-06-17 10:29:11.3368419 93596   NULL    2019-06-17 10:29:11.3368419 
516055  2019-06-17 10:29:11.3368419 93596   NULL    2019-06-17 10:29:11.3368419 
516342  2019-06-17 10:29:11.3368419 93596   NULL    2019-06-17 10:29:11.3368419

UPDATE: just to confirm the duplicates

SQLite Data

enter image description here

SqlServer:

enter image description here

Update2:

Console.WriteLine(context1.Transactions.OrderBy(t => t.TranDate).GroupBy(t => t.TranDate).Count());

194735

However;

foreach(var tran in context1.Transactions.OrderBy(t => t.TranDate).AsEnumerable().GroupBy( x => x.TranDate).Select(g => g.First()))

the total number transfered to SqlServer is

enter image description here

1
-1
10/17/2019 5:18:21 AM

Accepted Answer

I think the problem might be related to the duplication of db context. Since the both db structures are same almost, I duplicated the exiting db context and changed the connection strings. it must be something inside still refering to each other?! I don't know. the problem appeard only after three thousands entries.

However, what I did to solve the issue was I deleted all db context related files including the models and regenrate new ones using scaffold commands for each database

dotnet ef dbcontext scaffold "Data Source=data.db" Microsoft.EntityFrameworkCore.Sqlite --output-dir Models

dotnet ef dbcontext scaffold "connectionstring" Microsoft.EntityFrameworkCore.SqlServer --output-dir Models

the result I did not have duplicates in the data but they were not inserted in order and it seems the normal behavior of EF.

0
10/18/2019 9:33:26 AM

Popular Answer

I think the problem is that Distinct() doesn't do do what you think it does.

I suspect that the original table has 'duplicates' with different ids.

I suggest you use GroupBy instead of Distinct:

context1.Transactions
    .GroupBy( x => 
       new { x.PersonID,
             x.TranDate,
             x.CreatedBy} )
     .Select( g=> g.First());


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow