We have an events table that is reasonably large (~35 million rows)
The original table is like this,
CREATE TABLE [dbo].[Events]( [EventID] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL, [EventTypeID] int, [SourceID] int, [EventTitle] [varchar](max) NOT NULL, [EventContent] [varchar](max) NULL, [EventDate] [datetime] NOT NULL, CONSTRAINT [PK_Events] PRIMARY KEY CLUSTERED ([EventID] ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]
and it has an index that looks like
CREATE NONCLUSTERED INDEX [ix_EventDate] ON [dbo].[Events] ( [EventTypeID] ASC, [SourceID] ASC, [EventDate] ASC ) INCLUDE (EventTitle, EventContent)
When adding new events, there is a business requirement to check to see if there's a "duplicate" event filed within the last 7 days (duplicate defined as having the same title and content). The above index is used in the check query, but it is still very slow and only rarely finds any duplicates.
The query looks like
SELECT * FROM #NewEvents WHERE EXISTS (SELECT * FROM [dbo].[Events] e WHERE DATEDIFF(dd, #NewEvents.EventXML.value('(/Event/EventDate)', 'DATETIME'), e.EventDate) BETWEEN -7 AND 0 AND e.EventTitle = #NewEvents.EventXML.value('(/Event/EventTitle)', 'VARCHAR(MAX)') AND e.EventContent = #NewEvents.EventXML.value('(/Event/EventContent)', 'VARCHAR(MAX)') )
I had thought maybe a couple of computed columns with hashes on the title and content might speed the check (by making the indexes much more compact). Not wanting to alter the underlying table on a theory (possibly mucking with the existing replication setup), I tried to create an indexed view with those computed columns.
I created an index on the view like the one above INCLUDING the computed columns, but for whatever reason I can't get the query to use that index on the view; it always defaults back to the base table index, and to make matters worse doesn't use the INCLUDE values for the computed values; it joins with the PK to fetch the original columns.
The view index looked like this (where TitleHash and ContentHash are HASHBYTES of the full column text)
CREATE NONCLUSTERED INDEX [ix_viewEvents_EventDate] ON [dbo].[viewEvents] ( [EventTypeID] ASC, [SourceID] ASC, [EventDate] ASC ) INCLUDE (TitleHash, ContentHash)
Anyway, I was wondering if anyone had any suggestions to speed up the query given the business constraints.
EDIT: The first thing I tried was to change that DATEDIFF expression on the idea that it might have been blocking the use of the old index, but the plan for the old query showed that it was doing an Index Seek (not scan).
As Josh pointed out below, yeah the old query was doing an Index Seek – but only on the 2 leading columns. The date was an output column and that filtering was done later. But that stays that way no matter if I phrase the date filter as DATEDIFF() or "e.EventDate BETWEEN DATEADD(DAY, -7, #NewEvents.EventXML.value('(/Event/EventDate)1', 'DATETIME')) AND #NewEvents.EventXML.value('(/Event/EventDate)1', 'DATETIME')"
EDIT 2: Using WITH (NOEXPAND) did, indeed, get sql server to use the new view and the new index, though to my disappointment running the new view-based query and the old query in the same batch came back with the Actual Plan saying the view was 99% of the 2 query batch. So the experiment doesn't appear to have been successful. Changing the phrasing of the date filtering here, though, does seem to be reflected in the Index Seek node.