I want to check if any message already exists before adding it to database, but my current query loads the entire table into memory. Query generated from my code is basically just select * from tableName
.
How can I rewrite this query to be evaluated in database?
public void AddMessages(IEnumerable<Message> messages)
{
if (messages == null)
throw new ArgumentNullException(nameof(messages));
var duplicates = (from currMsg in context.Messages
where messages.Any(msg =>
msg.Prop1 == currMsg.Prop1 &&
msg.Prop2 == currMsg.Prop2 &&
msg.Prop3 == currMsg.Prop3)
select currMsg);
var messagesWithoutDuplicates = messages.Except(duplicates);
context.Messages.AddRange(messagesWithoutDuplicates);
context.SaveChanges();
}
I could also run it in a loop, but then I would create many db calls instead of 1 and I would prefer to do this in a single call.
Since there is no easy way of doing this in a single call I decided to sacrifice performance and retain readability and testability. This is my solution:
using (var transaction = context.Database.BeginTransaction())
{
try
{
foreach (var message in messages)
{
var exists = context.Messages.Any(msg => msg.Prop1 == message.Prop1 &&
msg.Prop2 == message.Prop2 &&
msg.Prop3 == message.Prop3 &&);
if (!exists)
{
context.Messages.Add(message);
}
}
context.SaveChanges();
transaction.Commit();
}
catch (Exception ex)
{
_logger.Error(ex);
transaction.Rollback();
throw;
}
}
Depending on your use-case, you may need to insert them one-by-one and trust on the database unique-index (you have one, right?) to throw it back in your face if it's a duplicate.
There is two weaknesses in your code besides memory consumption: concurrency (what if somebody else inserts while you check for duplicates) and the fact that your records to insert might themselves be duplicates that you did not check for.