I'm getting confused with IEnumerable memory usage problem, especially compare IEnumerable data source from DB and IEnumerable data source from code yield return
const values.
I have a Memory
function for checking the memory usage.
static string Memory()
{
return (Process.GetCurrentProcess().WorkingSet64 / (1024 *
1024)).ToString();
}
using DataContext context = new DataContext();
Console.WriteLine(Memory()); //21
IEnumerable<User> users = context.Users;
foreach (var i in users) {}
Console.WriteLine(Memory());//101
Console.WriteLine(GC.GetTotalMemory(true));//46620032
for some reason I cannot upload pics, so I need to type the results sorry about that.(results are in the code as comments).
yield return
to generate the IEnumerable data. static IEnumerable<User> Generator(int max)
{
for (int i = 0; i < max; i++)
{
yield return new User { Id = 1, Name = "test" };
}
}
here is the result
Console.WriteLine(Memory());// 21
IEnumerable<User> users = Generator(150000);
foreach (var i in users){}
Console.WriteLine(Memory());// 24
Console.WriteLine(GC.GetTotalMemory(true)); // 658040
Now, I'm very confused by example 1 and 2. My understanding is that for IEnumerable data source, it's going to read one at the time, rather than the whole collection, so it can reduce the memory usage just like the example 2. However, when it comes to using EF CORE(I know this not specific to EF CORE, but I need a concrete example for that.), I think it's still pulling one by one, but my question is why it uses so much more memory than the second example. So is it pulling each record one by one? And at the end, I have all the records from DB in memory is it correct? But why the second use so less memory? I'm yielding the same records. If some could explain this is much appreciated. Thanks !!!
It's indeed EF (Core) specific behavior called (change) tracking, explained in Tracking vs. No-Tracking Queries. Note that tracking is the default behavior if you don't change it explicitly
context.ChangeTracker.QueryTrackingBehavior = QueryTrackingBehavior.NoTracking;
or use AsNoTracking()
on the query source.
The essential is that even though the query result is evaluated one by one, the DbContext
instance adds each created entity instance plus some additional info like state and snapshot of the original values into some internal list. So even without key, status and original values snapshot, the equivalent code for the generator would be something like this:
IEnumerable<User> users = Generator(150000);
var trackedUsers = new List<User>();
foreach (var i in users)
{
trackedUsers.Add(i);
}
So at the end of the loop you would have all created instances during iteration stored in memory.
That's why you might consider using AsNoTracking
option in case all you need it to execute an entity query and iterate it once. Note that non entity (projection) queries and keyless entities do not track their results, so this is really entity query specific behavior.