EFCore returning too many columns for a simple LEFT OUTER join

entity-framework entity-framework-core


I am currently using EFCore 1.1 (preview release) with SQL Server.

I am doing what I thought was a simple OUTER JOIN between an Order and OrderItem table.

      var orders = from order in ctx.Order
                   join orderItem in ctx.OrderItem

                   on order.OrderId equals orderItem.OrderId into tmp

                   from oi in tmp.DefaultIfEmpty()

                   select new
                       Sku = (oi == null) ? null : oi.Sku,
                       Qty = (oi == null) ? (int?) null : oi.Qty

The actual data returned is correct (I know earlier versions had issues with OUTER JOINS not working at all). However the SQL is horrible and includes every column in Order and OrderItem which is problematic considering one of them is a large XML Blob.

SELECT [order].[OrderId], [order].[OrderStatusTypeId], [order].[OrderSummary], [order].[OrderTotal], [order].[OrderTypeId], [order].[ParentFSPId], [order].[ParentOrderId], [order].[PayPalECToken], [order].[PaymentFailureTypeId] ....

...[orderItem].[OrderId], [orderItem].[OrderItemType], [orderItem].[Qty], [orderItem].[SKU] FROM [Order] AS [order] LEFT JOIN [OrderItem] AS [orderItem] ON [order].[OrderId] = [orderItem].[OrderId] ORDER BY [order].[OrderId]

(There are many more columns not shown here.)

On the other hand - if I make it an INNER JOIN then the SQL is as expected with only the columns in my select clause:

SELECT [order].[OrderDt], [orderItem].[SKU], [orderItem].[Qty] FROM [Order] AS [order] INNER JOIN [OrderItem] AS [orderItem] ON [order].[OrderId] = [orderItem].[OrderId]

I tried reverting to EFCore 1.01, but got some horrible nuget package errors and gave up with that.

Not clear whether this is an actual regression issue or an incomplete feature in EFCore. But couldn't find any further information about this elsewhere.

Edit: EFCore 2.1 has addressed a lot of issues with grouping and also N+1 type issues where a separate query is made for every child entity. Very impressed with the performance in fact.

3/14/18 - 2.1 Preview 1 of EFCore isn't recommended because the GROUP BY SQL has some issues when using OrderBy() but it's fixed in nightly builds and Preview 2.

3/14/2018 7:10:46 PM

Accepted Answer

The following applies to EF Core 1.1.0 (release).

Although shouldn't be doing such things, tried several alternative syntax queries (using navigation property instead of manual join, joining subqueries containing anonymous type projection, using let / intermediate Select, using Concat / Union to emulate left join, alternative left join syntax etc.) The result - either the same as in the post, and/or executing more than one query, and/or invalid SQL queries, and/or strange runtime exceptions like IndexOutOfRange, InvalidArgument etc.

What I can say based on tests is that most likely the problem is related to bug(s) (regression, incomplete implementation - does it really matter) in GroupJoin translation. For instance, #7003: Wrong SQL generated for query with group join on a subquery that is not present in the final projection or #6647 - Left Join (GroupJoin) always materializes elements resulting in unnecessary data pulling etc.

Until it get fixed (when?), as a (far from perfect) workaround I could suggest using the alternative left outer join syntax (from a in A from b in B.Where(b = b.Key == a.Key).DefaultIfEmpty()):

var orders = from o in ctx.Order
             from oi in ctx.OrderItem.Where(oi => oi.OrderId == o.OrderId).DefaultIfEmpty()
             select new
                 OrderDt = o.OrderDt,
                 Sku = oi.Sku,
                 Qty = (int?)oi.Qty

which produces the following SQL:

SELECT [o].[OrderDt], [t1].[Sku], [t1].[Qty]
FROM [Order] AS [o]
    SELECT [t0].*
    FROM (
        SELECT NULL AS [empty]
    ) AS [empty0]
        SELECT [oi0].*
        FROM [OrderItem] AS [oi0]
        WHERE [oi0].[OrderId] = [o].[OrderId]
    ) AS [t0] ON 1 = 1
) AS [t1]

As you can see, the projection is ok, but instead of LEFT JOIN it uses strange CROSS APPLY which might introduce another performance issue.

Also note that you have to use casts for value types and nothing for strings when accessing the right joined table as shown above. If you use null checks as in the original query, you'll get ArgumentNullException at runtime (yet another bug).

11/21/2016 2:24:47 PM

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow