EF sorting & paging - slow as ordered twice?

c# entity-framework entity-framework-6

Question

I have a simple entity, and I have 100,000 of them in my SQL Sever 2012 database:

public class Entity
{
    public int Id { get; set; }
    public string Field1 { get; set; }
    public string Field2 { get; set; }
}

I want to show these in a grid, paged as 100,000 is far too many to show on one screen (and not very efficient). The grid should allow sorting and filtering - obviously all 3 of these operations are best done on the server, and EF should translate these.

So, lets get the second page of 500 sorted by Field1:

var items = context.Entities.OrderBy(e => e.Field1).Skip(500).Take(500);

When this query is executed, it takes 12 seconds! So I dug into it and found it's translated as below:

SELECT TOP (500) [Extent1].[Id]     AS [Id],
                 [Extent1].[Field1] AS [Field1],
                 [Extent1].[Field2] AS [Field2]               
FROM   (SELECT [Extent1].[Id]     AS [Id],
               [Extent1].[Field1] AS [Field1],
               [Extent1].[Field2] AS [Field2],
               row_number() OVER (ORDER BY [Extent1].[Field1] ASC) AS [row_number]
        FROM   [dbo].[Costs] AS [Extent1]) AS [Extent1]
WHERE  [Extent1].[row_number] > 500
ORDER  BY [Extent1].[Field1] ASC        

Surely this is being sorted twice? I'm no SQL expert, but the sub-query orders by Field1 and assigns this order to row_number. Then we take the TOP 500 row_numbers over 500 to get up to 500 rows for page 2. We don't need to order the results by Field1 again.

If I take out the final ORDER BY [Extent1].[Field1] ASC, the query results seem to be the same and the execution time drops to a circa 150 milliseconds.

So, obviously 150ms is preferable to 12s - is there anything I'm doing wrong? Is there anything I can do to fix this?

Update

The query plan is the same for both. The only difference on the tooltip for the Sort is 'Actual Number of Rows' of 4,604 for the 12s query and 1,134 for the 150ms query. I would add that this is generated data from a fixed list of 15 words (for this test) - i.e. Field1 contains 1 of 15 values, so there are essentially 15 groups of 6,666 rows.

Query Plan

(click for larger image)

SQL Server 2012 backup

1
1
6/24/2014 1:11:47 PM

Accepted Answer

This is due to a bug/idiosyncrasy in SQL Server when TOP and Gather Streams are combined. An index will fix it, as will disabling parallelism (globally, or for this user, or for the query). The clue was that Gather Streams spilled to tempdb which is an exceedingly rare condition. http://web.archive.org/web/20180220120719/http://sqlblog.com:80/blogs/paul_white/archive/2012/05/03/parallel-row-goals-gone-rogue.aspx This is 500 level stuff.

Note, that you can't leave out the final ORDER BY because this makes the order of the results undefined.

6
11/13/2018 11:31:27 PM

Popular Answer

I think your problem because of doing order on a large and not indexed column.

Please make sure that you do Indexing for that column.

Anyway the order by clause would be better with not nvarchar values



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow