I'm not sure the subject actually makes sense but I wasn't sure how to word it. Here's the set up: I have an Item
, which has many ItemLogic
, each of which has one Field
. Each Item
has, say, 25 ItemLogic
entities. The logic determines whether that Item
matches the given input from the form. For instance, Field X has a value greater than A
and Field Y has a value equal to B
and so on for each of the 25 fields.
In the current version of the app, all related entities are queried out and looped over, returning the first matching Item where all the ItemLogic were true
. It's a bit more expensive, but simple code, and there were never that many Items to look at. Until now.
Now the app needs to filter 3000 Items to find a match. The previous query had at least two joins and takes about 45 seconds on our SQL instance. This is far too long.
A stored procedure seems a natural fit, but here's the catch: the data is dynamic for each set of Items, it comes in as a string value and often needs to be cast as a different type (DateTime or int most commonly) to perform the actual comparisons, and some logic is ignored rather than compared. That's a lot of extra overhead in a stored procedure, at least that's how it strikes me.
Alternately, I could chunk the data, but that doesn't save much for the poor bloke trying to match the last Item in the collection.
What are some approaches that could be taken to speed up the match?
Schema and some sample data:
CREATE TABLE [dbo].[Items](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](255) NOT NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
CREATE TABLE [dbo].[ItemLogic](
[Id] [int] IDENTITY(1,1) NOT NULL,
[ItemId] [int] NOT NULL,
[FieldId] [int] NOT NULL,
[Value] [nvarchar](max) NULL,
[Comparison] [int] NOT NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
CREATE TABLE [dbo].[Fields](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Value] [nvarchar](max) NOT NULL,
[Type] [int] NOT NULL,
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
INSERT INTO [dbo].[Fields] (Value, Type) VALUES ('abc', 0), ('def', 0), ('123', 1)
INSERT INTO [dbo].[Items] (Name) VALUES ('Item 1'), ('Item 2'), ('Item 3')
INSERT INTO [dbo].[ItemLogic] (ItemId, FieldId, Value, Comparison) VALUES (1, 1, 'xyz', 1), (1, 2, 'qrs', 1), (1, 3, '200', 0), (2, 1, 'abc', 1), (2, 2, 'xyz', 1), (2, 3, '123', 2), (3, 1, 'abc', 1), (3, 2, 'def', 1), (3, 2, '100', 0)
For the Comparison
field, it is an enum matching: 0 = Greater Than, 1 = Equal, 2 = Ignore. For the Type
field, it is an enum matching: 0 = string, 1 = int.
The expected result of the above match should be that Item 3
is returned.
It will never be fast. However here is the most simple and compact solution I can imagine:
SELECT *
FROM
Items
WHERE
Id NOT IN (
SELECT IL.ItemId
FROM
Fields F
INNER JOIN ItemLogic IL
ON F.Id = IL.FieldId
WHERE
NOT (
IL.Comparison = 2 -- Ignore
OR
F.Type = 0 AND ( -- string types
IL.Comparison = 0 AND F.Value > IL.Value
OR
IL.Comparison = 1 AND F.Value = IL.Value
)
OR
F.Type = 1 AND ( -- integer types
IL.Comparison = 0 AND TRY_CAST(F.Value AS int) > TRY_CAST(IL.Value AS int)
OR
IL.Comparison = 1 AND TRY_CAST(F.Value AS int) = TRY_CAST(IL.Value AS int)
)
)
)
As an alternative solution, you can retrieve all records and apply the rules on the memory. You can also try to store related records on the memory as well.