I am using ASP.NET 4.5, MVC5, C#, LINQ, EF6, SQL Server 2012/SQL Azure.
I need to significantly improve the efficiency of a complex query. In essence the purpose of the task is to copy a "sample" recordset with many child records. I am currently doing this via C# and LINQ. I suspect that I am requerying the database within multiple Foreach blocks, thus I am causing many calls to the database. While each query is small, the number of calls is not. It could be a 200+. I believe they call this is "N+1" problem.
The following layout gives an idea of the relationships and hence queries.
Table1-<Table1.1 -<Table1.2-<Table1.2.1 -<Table1.2.2-<Table126.96.36.199 -<Table188.8.131.52
Instead of using "Foreach" to bring back "Table1.1" etc, I want to bring back all related data in one hit to minimise the number of calls to the DB. I understand I need to use "Include". I have got as far as:
However I am not sure how to alter this statement to bring back data down to "Table184.108.40.206". This is my question.
Thank you in advance.
I found an initial answer.
db.Table1.Include(x=>x.Table1.1) .Include(x=>x.Table1.2) .Include(x=>x.Table1.2.Select(y=>y.Table1.2.1)
However I may not need the middle line, so the following may be fine.
I also need to go to 5 levels down. I am finding that this one retrieve times out !! Whether this is because EF is confused on the compilation, or that the retrieve is too complicated or both I am unsure. There is possibly a limit on how many levels one can use "Include" ? Also I am unsure that by specifying the path to the grandchildren then the parent are automatically retrieved, or do you have to specify the parents separately?
There is possibly a limit on how many levels one can use "Include"?
There is! As I explained here about the generated SQL statement -
SELECTclause is the sum of all columns in all involved tables
That's potentially a huge (long and wide) result set returned from the database. Apart from that, it's hard for the db engine's query optimizer to find a good query plan. The database will have a hard time crunching all the data and it's not surprising that the command times out.
The alternative is to load data in chunks. But that's easier said than done. In a way, you already load data in chunks, but these chunks are too small and the queries too many (yes, N + 1). The chunks should be larger. There's no clear-cut strategy how to do that. It depends on your table structure and the numbers of data. But let me try to point you in the right direction.
5 levels down
For brevity's sake, let's say the tables and associations are
E ("<" meaning 1:n). The root query is something like
var query = As.Where(a => a.Property == value).ToList();
[So you don't want all
As, because that would be easy: then you may as well load all children too.]
Let's suppose you can
Bs without any problems, but that including the
Cs already gets too much. So the query becomes:
var query = As.Where(a => a.Property == value) .Include(a => a.Bs).ToList();
Cs, etc. should be loaded in a chunks of data.
A nice feature of Entity Framework is that it auto-connects all entities that are loaded into a context, by a process that's known as relationship fixup. So if you load the
Cs separately, the collections in their parent
B objects will be populated. That makes it easy to load the required
var cs = Cs.Where(c => c.B.A.Property == value).ToList();
(supposing that the back-references are also part of your model)
No if you can safely include the
Ds, we're almost done:
var cs = Cs.Where(c => c.B.A.Property == value) .Include(c => c.Ds).ToList();
And the last level is loaded by:
var es = Es.Where(e => e.D.C.B.A.Property == value).ToList();
This level of nesting (the dots) may look scary. It will create a query with four joins. However, the great difference with 4
Incudes is that now only
E columns and rows are queried. The query result doesn't explode. And database engines are optimized for executing joins.
So this gives you some handles to play with
Include levels and separate queries until you have a configuration that works well (enough).
One last thing: remember to turn off lazy loading. EF does auto-populate the collections alright, but it doesn't mark them as loaded. If lazy loading is enabled, accessing the collections will still trigger N + 1 queries.