Entity Framework Core traverse big blob data without memory overflow, best practice

blob c# entity-framework-core out-of-memory traversal

Question

I'm writing code that's traversing big amounts of picture data, preparing a big delta block containing it all compressed for sending.

Here's a sample on how this data could be

[MessagePackObject]
public class Blob : VersionEntity
{
    [Key(2)]
    public Guid Id { get; set; }
    [Key(3)]
    public DateTime CreatedAt { get; set; }
    [Key(4)]
    public string Mediatype { get; set; }
    [Key(5)]
    public string Filename { get; set; }
    [Key(6)]
    public string Comment { get; set; }
    [Key(7)]
    public byte[] Data { get; set; }
    [Key(8)]
    public bool IsTemporarySmall { get; set; }
}

public class BlobDbContext : DbContext
{
    public DbSet<Blob> Blob { get; set; }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Blob>().HasKey(o => o.Id);
    }
}

When working with this I process everything into a filestream, and I want to keep as little as possible in the memory at any given time.

Is it enough to do it like this?

foreach(var b in context.Where(o => somefilters).AsNoTracking())
    MessagePackSerializer.Serialize(stream, b);

Will this still fill up the memory with all the blob records, or will they be processed one by one as I iterate on the enumerator. It's not using any ToList, only the enumerator, so Entity Framework should be able to process it on the go, but I'm not sure if that's what it does.

Any Entity Framework experts here who can give some guidance on how this is handled properly.

1
8
11/13/2019 8:41:05 AM

Popular Answer

In general when you create a LINQ filter on an Entity it is like writing a SQL statement in code form. It returns an IQueryable that has not actually executed against the database. When you iterate over the IQueryable with a foreach or call ToList() then the sql is executed and all the results are returned, and stored in memory.

https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/ef/language-reference/query-execution

While EF is maybe not the best choice for pure performance there is a relatively simple way to handle this without worrying too much about memory usage:

Consider the following

var filteredIds = BlobDbContext.Blobs
                      .Where(b => b.SomeProperty == "SomeValue")
                      .Select(x => x.Id)
                      .ToList();

Now you have filtered the Blobs according to your requirements, and executed this against the database, but only returned the Id values in memory.

Then

foreach (var id in filteredIds)
{
    var blob = BlobDbContext.Blobs.AsNoTracking().Single(x => x.Id == id);
    // Do your work here against a single in-memory blob
}

The large blob should be available for garbage collection once you are finished with it, and you shouldn't run out of memory.

Obviously you can sense-check the number of records in the id list, or you could add metadata to the first query to help you decide how to process it if you want to refine the idea.

1
11/25/2019 6:57:16 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow