SQL - Combine search queries instead of multiple DB trips

asp.net entity-framework entity-framework-core linq sql

Question

A search term comes from UI to search entities of a table. The order that these search results should show up in UI is like this:

  • First: exact match
  • Second: starts with that term
  • Third: contains a word of that term
  • Forth: ends with that term
  • Fifth: contains the term in any matter

So I first got the entities from DB:

result = entities.Where(e => e.Name.Contains(searchTerm)).ToList();

And then I rearranged them in memory:

var sortedEntities = result.Where(e => e.Name.ToLower() == searchTerm.ToLower())
    .Union(result.Where(e => e.Name.StartsWith(searchTerm, StringComparison.OrdinalIgnoreCase)))
    .Union(result.Where(e => e.Name.Contains($" {searchTerm} ")))
    .Union(result.Where(e => e.Name.EndsWith(searchTerm, StringComparison.OrdinalIgnoreCase)))
    .Union(result.Where(e => e.Name.Contains(searchTerm)));

It was working fine until I added paging. Now if an exact match is on page 2 (in data coming from DB) it won't show up first.

The only solution I can think of is to separate the requests (so 5 requests in this case) and keep track of page size manually. My question is that is there a way to tell DB to respect that order and get the sorted data in one DB trip?

1
0
1/20/2019 9:17:35 AM

Accepted Answer

It took me some time to realize that you use Union in an attempt to order data by "match strength": first the ones that match exactly, then the ones that match with different case, etc. When I see Unions with predicates my Pavlov-conditioned mind translates it into ORs. I had to switch from thinking fast to slow.

So the problem is that there is no predictable sorting. No doubt, the chained Union statements do produce a deterministic final sort order, but it's not necessarily the order of the Unions, because each Union also executes an implicit Distinct. The general rule is, if you want a specific sort order, use OrderBy methods.

Having said that, and taking...

var result = entities
    .Where(e => e.Name.Contains(searchTerm))
    .Skip((pageNumber - 1) * pageSize)
    .Take(pageSize).ToList();

...the desired result seems to be obtainable by:

var sortedEntities = result
    .OrderByDescending(e => e.Name == searchTerm)
    .ThenByDescending(e => e.Name.ToLower() == searchTerm.ToLower())
    .ThenByDescending(e => e.Name.StartsWith(searchTerm, StringComparison.OrdinalIgnoreCase))
    ... etc.

(descending, because false orders before true)

However, if there are more matches than pageSize the ordering will be too late. If pageSize = 20 and item 21 is the first exact match this item will not be on page 1. Which means: the ordering should be done before paging.

The first step would be to remove the .ToList() from the first statement. If you remove it, the first statement is an IQueryable expression and Entity Framework is able to combine the full statement into one SQL statement. The next step would be to move Skip/Take to the end of the full statement and it'll also be part of the SQL.

var result = entities.Where(e => e.Name.Contains(searchTerm));

var sortedEntities = result
    .OrderByDescending(e => e.Name == searchTerm)
    .ThenByDescending(e => e.Name.ToLower() == searchTerm.ToLower())
    .ThenByDescending(e => e.Name.StartsWith(searchTerm, StringComparison.OrdinalIgnoreCase))
    ... etc
    .Skip((pageNumber - 1) * pageSize)
    .Take(pageSize).ToList();

But now a new problem blows in.

Since string comparison with StringComparison.OrdinalIgnoreCase isn't supported Entity Framework will auto-switch to client-side evaluation for part of the statement. All of the filtered results will be returned from the database, but most of the the ordering and all of the paging will be done in memory.

That may not be too bad when the filter is narrow, but very bad when it's wide. So, ultimately, to do this right, you have to remove StringComparison.OrdinalIgnoreCase and settle with a somewhat less refined match strength. Bringing us to the

End result:

var result = entities.Where(e => e.Name.Contains(searchTerm));
var sortedEntities = result
    .OrderByDescending(e => e.Name == searchTerm)
    .ThenByDescending(e => e.Name.StartsWith(searchTerm))
    .ThenByDescending(e => e.Name.Contains($" {searchTerm} "))
    .ThenByDescending(e => e.Name.EndsWith(searchTerm))
    .ThenByDescending(e => e.Name.Contains(searchTerm))
    .Skip((pageNumber - 1) * pageSize)
    .Take(pageSize).ToList();

Why "less refined"? Because, according your comments, the database collation isn't case sensitive, so SQL can't distinguish exact matches by case without adding COLLATE statements. That's something we can't do with LINQ.

1
1/20/2019 9:07:11 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow