Added 7/11 The issue is deadlocks occur because of index scan during MERGE JOIN. In this case a transaction trying to get S lock on the whole index at the FK parent table, but previously another transaction puts X lock on a key value of the index.
Let me start with a small example (TSQL2012 DB from 70-461 cource used):
CREATE TABLE [Sales].[Orders]( [orderid] [int] IDENTITY(1,1) NOT NULL, [custid] [int] NULL, [empid] [int] NOT NULL, [shipperid] [int] NOT NULL, ... )
[custid], [empid], [shipperid] are corelated parameters for
[Sales].[Customers], [HR].[Employees], [Sales].[Shippers] accordingly. In each case we have a clustered index on a referred column in a parrent table.
ALTER TABLE [Sales].[Orders] WITH CHECK ADD CONSTRAINT [FK_Orders_Customers] FOREIGN KEY([custid]) REFERENCES [Sales].[Customers] ([custid]) ALTER TABLE [Sales].[Orders] WITH CHECK ADD CONSTRAINT [FK_Orders_Employees] FOREIGN KEY([empid]) REFERENCES [HR].[Employees] ([empid]) ALTER TABLE [Sales].[Orders] WITH CHECK ADD CONSTRAINT [FK_Orders_Shippers] FOREIGN KEY([shipperid])REFERENCES [Sales].[Shippers] ([shipperid])
I am trying to
INSERT [Sales].[Orders] SELECT ... FROM another table called
[Sales].[OrdersCache] which has the same structure as the
[Sales].[Orders] except foreign keys.
Another thing might be important to mention the table
[Sales].[OrdersCache] is a clustered index.
CREATE CLUSTERED INDEX idx_c_OrdersCache ON Sales.OrdersCache ( custid, empid )
As expected when I`m trying to insert a low volumes of data LOOP JOIN works fine making index seek on the foreign keys.
With high volumes of data MERGE JOIN is used by query optimizer as a most efficient way to maintain foregn key in the query.
And there is nothing to do with it exept using OPTION (LOOP JOIN) in our case with foreign keys or INNER LOOP JOIN in explicit JOIN case.
Below is the query I`m trying to run in my environment:
INSERT Sales.Orders ( custid, empid, shipperid, ... ) SELECT custid, empid, 2, ... FROM Sales.OrdersCache
Looking at the plan we can see that all 3 foreing keys validated with MERGE JOIN. It`s not an appropriate way for me since it uses INDEX SCAN with whole index locking.
Using OPTION (LOOP JOIN) isn`t suitable since it costs almost 15% more than MERGE JOIN (I think the regression will be greater with data volumes growing).
In the SELECT statement you can see single value for
shipperid attribute for the entire inserted set. In my opinion there must be a way to make the validation phase for the inserted set faster at least for the immutable attribute.
- make LOOP JOIN,MERGE JOIN,HASH JOIN if we have undefined subset for
- if there is just a single explicit value of the validated column we make the validation only once (INDEX SEEK).
Is there any common pattern to overpass the situation above using code structures, additional DDL objects, etc?
Added 20/07. Solution. Query Optimizer already makes a 'single key – foreign key' validation optimization by using MERGE JOIN. And makes only for Sales.Shippers table, leaving LOOP JOIN for another joins in the query at the same time. Since I have a few rows in the parent table Query Optimizer uses Sort-merge join algorithm and compare each row in the inner table with parent table only once. So that is the answer on my question if there is any particular mechanism to effectively process single values in a set during single key validation. That is not so perfect decision but that`s the way SQL Server optimizes the case.
Performance affect investigation revealed that in my case MERGE JOIN and LOOP JOIN insert statement became approximately equal with 750 simultaneously inserted rows with the following superiority of the MERGE JOIN (in CPU time resource).
So using OPTION (LOOP JOIN) is an appropriate solution for my business process.