For many years we had a simple queue system, where a regularly executed loop would select and lock a next available row and call a stored procedure to actually process it, which, in turn would also select the same row with the same locking hints.
In a simplified form:
begin tran; select top (1) @current_queue_row_id = QueueTable.row_id from dbo.QueueTable with (updlock, rowlock, readpast) inner join <other tables> where QueueTable.status = 'Placed' and <other tables>.something = 'OK'; if @current_queue_row_id is not null exec dbo.ProcessQueueItem @current_queue_row_id; commit tran;
create procedure dbo.ProcessQueueItem @queue_row_id int as begin ... select <@variables> = <columns> from dbo.QueueTable with(rowlock, updlock, readpast) where row_id = @queue_row_id; ... end;
This worked for years, but after updating to SQL Server 2016 we intermittently (several times a week) see a situation where the outer
select top (1) would find a row, call
ProcessQueueItem, and it would fail because its
select returns nothing, so all
null. We have put in all sorts of logging and determined that the value passed to
ProcessQueueItem is a valid one, and that
current_transaction_id() returns the same value in the outer code and inside
The isolation level is
READ COMMITTED, but no one is even trying to delete rows from anywhere, so non-repeatable reads should not be a problem.
Is there something fundamentally wrong with our setup in the first place, so that it's no wonder it broke after a server upgrade, or does this look like a problem in SQL Server 2016?