Out of memory exception when adding data to the database using Entity Framework

c# entity-framework-core

Question

I have 27000 images that are stored in a folder and which need to be added to the database using EntityFramework. I have a code

        var files = Directory.GetFiles(path, "*", SearchOption.AllDirectories);

        foreach(var file in files)
        {
            using (ApplicationContext db = new ApplicationContext())
            {
                Image img = Image.FromFile(file);

                var imgRes = ResizeImage(img, ImageSettings.Width, ImageSettings.Height);

                MemoryStream memoryStream = new MemoryStream();

                img.Save(memoryStream, ImageFormat.Png);

                var label = Directory.GetParent(file).Name;
                var bytes = memoryStream.ToArray();

                memoryStream.Close();

                db.Add(new ImageData { Image = bytes, Label = label });

                img.Dispose();
                memoryStream.Dispose();
                imgRes.Dispose();
            }
        }

it only works when there are less than 10,000 images otherwise I get Out of memory exception. how can i upload my 27000 images to the database.

1
0
1/22/2020 1:33:07 PM

Popular Answer

First of all, this code doesn't deal with entities or objects, so using an ORM doesn't help at all. This doesn't cause the OOM though, it only makes the code a lot slower.

The real problem is that MemoryStream is actually a wrapper around a buffer. Once the buffer is full, a new one is reallocated with double the size, the original data are copied over and the old buffer deleted. Growing a 50MB byte buffer this way results in a lot of reallocations, log2(50M). This fragments the free memory to the point the runtime can no longer allocate a large enough contiguous buffer. This results in OOMs with List<T> objects too, not just MemoryStreams.

The quick fix would be to pass the expected size as the stream's capacity through the MemoryStream(Int32) constructor. This cuts down on reallocations and saves a lot of CPU cycles. The number doesn't have to be exact, just large enough to avoid too much garbage :

using(Image img = Image.FromFile(file))
using(var imgRes = ResizeImage(img, ImageSettings.Width, ImageSettings.Height))
using(var memoryStream = new MemoryStream(10_000_000))
{
    img.Save(memoryStream, ImageFormat.Png);
    var label = Directory.GetParent(file).Name;
    var bytes = memoryStream.ToArray();

    db.Add(new ImageData { Image = bytes, Label = label });
}

There's no need to close MemoryStream, it's just a wrapper over an array. That still allocates a big buffer for each file though.

If we know the maximum file size, we can allocate a single buffer and reuse it in all iterations. In this case the size matters - it's no longer possible to resize the buffer :

var buffer=new byte[100_000_000];
using(Image img = Image.FromFile(file))
using(var imgRes = ResizeImage(img, ImageSettings.Width, ImageSettings.Height))
using(var memoryStream = new MemoryStream(buffer))
{
    img.Save(memoryStream, ImageFormat.Png);
    var label = Directory.GetParent(file).Name;
    var bytes = memoryStream.ToArray();

    db.Add(new ImageData { Image = bytes, Label = label });
}
1
1/22/2020 2:04:10 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow