Check which messages already exist without loading entire table to memory

c# entity-framework entity-framework-core

Question

I want to check if any message already exists before adding it to database, but my current query loads the entire table into memory. Query generated from my code is basically just select * from tableName.

How can I rewrite this query to be evaluated in database?

public void AddMessages(IEnumerable<Message> messages)
{
    if (messages == null)
        throw new ArgumentNullException(nameof(messages));

    var duplicates = (from currMsg in context.Messages
                      where messages.Any(msg =>
                                                msg.Prop1 == currMsg.Prop1 &&
                                                msg.Prop2 == currMsg.Prop2 &&
                                                msg.Prop3 == currMsg.Prop3)
                      select currMsg);

    var messagesWithoutDuplicates = messages.Except(duplicates);

    context.Messages.AddRange(messagesWithoutDuplicates);
    context.SaveChanges();
}

I could also run it in a loop, but then I would create many db calls instead of 1 and I would prefer to do this in a single call.

1
2
8/24/2018 7:58:32 AM

Accepted Answer

Since there is no easy way of doing this in a single call I decided to sacrifice performance and retain readability and testability. This is my solution:

using (var transaction = context.Database.BeginTransaction())
{
    try
    {
        foreach (var message in messages)
        {
            var exists = context.Messages.Any(msg => msg.Prop1 == message.Prop1 &&
                                                     msg.Prop2 == message.Prop2 &&
                                                     msg.Prop3 == message.Prop3 &&);

            if (!exists)
            {
                context.Messages.Add(message);
            }
        }

        context.SaveChanges();
        transaction.Commit();
    }
    catch (Exception ex)
    {
        _logger.Error(ex);
        transaction.Rollback();
        throw;
    }
}
0
8/24/2018 9:46:39 AM

Popular Answer

Depending on your use-case, you may need to insert them one-by-one and trust on the database unique-index (you have one, right?) to throw it back in your face if it's a duplicate.

There is two weaknesses in your code besides memory consumption: concurrency (what if somebody else inserts while you check for duplicates) and the fact that your records to insert might themselves be duplicates that you did not check for.



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow