I have an issue now that a table has had an id removed.
First I had this query below where entity (table) 'RecordsProduct' has a 'DefendnatId' that was mapped to a Defendant table. It was fine!
records = records
.Include(r => r.Employer)
.Include(r => r.Contractor)
.Include(r => r.RecordProducts)
.ThenInclude(rp => rp.Defendant)
.Where(r => EF.Functions.Like(r.Employer.DefendantCode, "%" + input.DefendantCode + "%")
|| EF.Functions.Like(r.Contractor.DefendantCode, "%" + input.DefendantCode + "%")
|| r.RecordProducts.Any(rp => EF.Functions.Like(rp.Defendant.DefendantCode, "%" + input.DefendantCode + "%") && rp.IsActive == true));
That DefendantId
has been removed from the table and replaced with a DefendantProductId
from a translation table called ProductDefendant
like this
ProductDefendant
table:
So I can't do this anymore:
rp.Defendant.DefendantCode
now I have to do this
rp.ProductDefendant.Defendant.DefendantCode
and now my query is blowing up! Is there anything I can do to modify it to make it faster? Or change the way the joins are working?
records = records
.Include(r => r.Employer)
.Include(r => r.Contractor)
.Include(r => r.RecordProducts)
.ThenInclude(rp => rp.ProductDefendant.Defendant)
.Where(r => EF.Functions.Like(r.Employer.DefendantCode, "%" + input.DefendantCode + "%")
|| EF.Functions.Like(r.Contractor.DefendantCode, "%" + input.DefendantCode + "%")
|| r.RecordProducts.Any(rp => EF.Functions.Like(rp.ProductDefendant.Defendant.DefendantCode, "%" + input.DefendantCode + "%")
&& rp.IsActive == true));
here is the generated SQL below. I think the problem is in the "Where" clause
SELECT [t].[Id], [t].[StartDate], [t].[EndDate], [t].[WitnessName], [t].[SourceCode], [t].[JobsiteName], [t].[ShipName], [t].[EmployerCode]
FROM (
SELECT DISTINCT [r].[RecordID] AS [Id], [r].[StartDate], [r].[EndDate], [r.Witness].[FullName] AS [WitnessName], CASE
WHEN [r].[SourceID] IS NOT NULL
THEN [r.Source].[SourceCode] ELSE N'zzzzz'
END AS [SourceCode], CASE
WHEN [r].[JobsiteID] IS NOT NULL
THEN [r.Jobsite].[JobsiteName] ELSE N'zzzzz'
END AS [JobsiteName], CASE
WHEN [r].[ShipID] IS NOT NULL
THEN [r.Ship].[ShipName] ELSE N'zzzzz'
END AS [ShipName], CASE
WHEN [r].[EmployerID] IS NOT NULL
THEN [r.Employer].[DefendantCode] ELSE N'zzzzz'
END AS [EmployerCode]
FROM [Records] AS [r]
LEFT JOIN [Ships] AS [r.Ship] ON [r].[ShipID] = [r.Ship].[ShipID]
LEFT JOIN [Jobsites] AS [r.Jobsite] ON [r].[JobsiteID] = [r.Jobsite].[JobsiteID]
LEFT JOIN [Sources] AS [r.Source] ON [r].[SourceID] = [r.Source].[SourceID]
LEFT JOIN [Witnesses] AS [r.Witness] ON [r].[WitnessID] = [r.Witness].[WitnessID]
LEFT JOIN [Defendants] AS [r.Contractor] ON [r].[ContractorID] = [r.Contractor].[DefendantID]
LEFT JOIN [Defendants] AS [r.Employer] ON [r].[EmployerID] = [r.Employer].[DefendantID]
WHERE ([r].[IsActive] = 1) AND (([r.Employer].[DefendantCode] LIKE (N'%' + @__input_DefendantCode_1) + N'%' OR [r.Contractor].[DefendantCode] LIKE (N'%' + @__input_DefendantCode_3) + N'%') OR EXISTS (
SELECT 1
FROM [Records_Products] AS [rp]
INNER JOIN [Product_Defendant] AS [rp.ProductDefendant] ON [rp].[DefendantProductID] = [rp.ProductDefendant].[DefendantProductID]
INNER JOIN [Defendants] AS [rp.ProductDefendant.Defendant] ON [rp.ProductDefendant].[DefendantID] = [rp.ProductDefendant.Defendant].[DefendantID]
WHERE ([rp.ProductDefendant.Defendant].[DefendantCode] LIKE (N'%' + @__input_DefendantCode_5) + N'%' AND ([rp].[IsActive] = 1)) AND ([r].[RecordID] = [rp].[RecordID])))
) AS [t]
ORDER BY [t].[SourceCode]
OFFSET @__p_6 ROWS FETCH NEXT @__p_7 ROWS ONLY
It's hard to give you a good advice because the generated SQL query looks fine for that model, and nowadays SQL query optimizers (CBO) should not be affected by the way you write the query like the old RBOs (CBO stands for Cost Based Optimizer, RBO - Rule Based Optimizer). They should be able to turn EXISTS
or IN
into JOIN
(produce the same execution plan as JOIN
). The only difference between the current SQL and the original is one additional join, which with clustered PK index seek should not affect the performance significantly.
But since you say that, apparently something unknown is causing the CBO to choose a bad plan. And since the plan depends on the data which I don't have, all I can do is to suggest trying two alternative functionally equivalent queries.
First, your current (slow) query seems to be like this:
var input = new { DefendantCode = "Abc", Skip = 4, Take = 2 };
var defendantCodePattern = "%" + input.DefendantCode + "%";
var query = db.Set<Record>()
.Where(r => r.IsActive)
.Where(r => EF.Functions.Like(r.Employer.DefendantCode, defendantCodePattern)
|| EF.Functions.Like(r.Contractor.DefendantCode, defendantCodePattern)
|| r.RecordProducts.Any(rp => EF.Functions.Like(rp.ProductDefendant.Defendant.DefendantCode, defendantCodePattern))
)
.Select(r => new
{
ID = r.RecordID,
StartDate = r.StartDate,
EndDate = r.EndDate,
WitnessName = r.Witness.FullName,
SourceCode = r.Source != null ? r.Source.SourceCode : "zzzzz",
JobsiteName = r.Jobsite != null ? r.Jobsite.JobsiteName : "zzzzz",
ShipName = r.Ship != null ? r.Ship.ShipName : "zzzzz",
EmployeeCode = r.Employer != null ? r.Employer.DefendantCode : "zzzzz",
})
//.Distinct()
.OrderBy(t => t.SourceCode)
.Skip(input.Skip).Take(input.Take);
Some things to mention. First, the query is using projection (Select
), so the Include
/ ThenInclude
are not needed (because they are ignored). Second, the common search pattern is created and stored outside the query, thus ending up with a sing parameter rather than 3. Third, the Distinct
is not needed for this query, so I've removed it.
Now the potential attempts to improve the generated SQL query executing speed.
(1) If the Defendant
related table is not big, you can prefetch the DefendantID
s matching the search filter, and then use Contains
(translated to SQL IN
) for filtering This would help eliminating some of the joins. e.g.
var defendantIds = db.Set<Defendant>()
.Where(d => EF.Functions.Like(d.DefendantCode, defendantCodePattern))
.Select(d => d.DefendantID)
.ToList();
and then (the second Where
):
.Where(r => defendantIds.Contains(r.Employer.DefendantID)
|| defendantIds.Contains(r.Contractor.DefendantID)
|| r.RecordProducts.Any(rp => defendantIds.Contains(rp.ProductDefendant.Defendant.DefendantID))
)
(2) The following trick will replace the EXISTS
with LEFT JOIN
. Replace the second Where
with:
.SelectMany(r => r.RecordProducts.DefaultIfEmpty(), (r, rp) => new { r, rp })
.Where(x => EF.Functions.Like(x.r.Employer.DefendantCode, defendantCodePattern)
|| EF.Functions.Like(x.r.Contractor.DefendantCode, defendantCodePattern)
|| EF.Functions.Like(x.rp.ProductDefendant.Defendant.DefendantCode, defendantCodePattern)
)
.Select(x => x.r)
and uncomment the .Distinct()
(here it is needed because the LEFT JOIN
(from SelectMany
) multiplies the source records). The generated SQL in this case looks like this:
SELECT [t].[ID], [t].[StartDate], [t].[EndDate], [t].[WitnessName], [t].[SourceCode], [t].[JobsiteName], [t].[ShipName], [t].[EmployeeCode]
FROM (
SELECT DISTINCT [r].[RecordID] AS [ID], [r].[StartDate], [r].[EndDate], [r.Witness].[FullName] AS [WitnessName], CASE
WHEN [r].[SourceID] IS NOT NULL
THEN [r.Source].[SourceCode] ELSE N'zzzzz'
END AS [SourceCode], CASE
WHEN [r].[JobsiteID] IS NOT NULL
THEN [r.Jobsite].[JobsiteName] ELSE N'zzzzz'
END AS [JobsiteName], CASE
WHEN [r].[ShipID] IS NOT NULL
THEN [r.Ship].[ShipName] ELSE N'zzzzz'
END AS [ShipName], CASE
WHEN [r].[EmployerID] IS NOT NULL
THEN [r.Employer].[DefendantCode] ELSE N'zzzzz'
END AS [EmployeeCode]
FROM [Records] AS [r]
LEFT JOIN [Ships] AS [r.Ship] ON [r].[ShipID] = [r.Ship].[ShipID]
LEFT JOIN [Jobsites] AS [r.Jobsite] ON [r].[JobsiteID] = [r.Jobsite].[JobsiteID]
LEFT JOIN [Sources] AS [r.Source] ON [r].[SourceID] = [r.Source].[SourceID]
LEFT JOIN [Witnesses] AS [r.Witness] ON [r].[WitnessID] = [r.Witness].[WitnessID]
LEFT JOIN [Defendants] AS [r.Contractor] ON [r].[ContractorID] = [r.Contractor].[DefendantID]
LEFT JOIN [Defendants] AS [r.Employer] ON [r].[EmployerID] = [r.Employer].[DefendantID]
LEFT JOIN [Records_Products] AS [r.RecordProducts] ON [r].[RecordID] = [r.RecordProducts].[RecordID]
LEFT JOIN [Product_Defendant] AS [r.RecordProducts.ProductDefendant] ON [r.RecordProducts].[DefendantProductID] = [r.RecordProducts.ProductDefendant].[DefendantProductID]
LEFT JOIN [Defendants] AS [r.RecordProducts.ProductDefendant.Defendant] ON [r.RecordProducts.ProductDefendant].[DefendantID] = [r.RecordProducts.ProductDefendant.Defendant].[DefendantID]
WHERE ([r].[IsActive] = 1) AND (([r.Employer].[DefendantCode] LIKE @__defendantCodePattern_1 OR [r.Contractor].[DefendantCode] LIKE @__defendantCodePattern_1) OR [r.RecordProducts.ProductDefendant.Defendant].[DefendantCode] LIKE @__defendantCodePattern_1)
) AS [t]
ORDER BY [t].[SourceCode]
OFFSET @__p_2 ROWS FETCH NEXT @__p_3 ROWS ONLY
As I said at the very beginning, normally this should not affect the CBO plan. But I definitely see different estimated execution plan from the original, so it's worth trying (although the LINQ query looks ugly).