none
Delete and insert in transaction. Select returns not all records. RRS feed

  • Question

  • Hello,

    Recently I have found some very interesting situation. It seems like a bug. 


    I have table:

                CREATE TABLE [dbo].[_TestTable](
                      [ID] [uniqueidentifier] NOT NULL,
                      [DataID] [uniqueidentifier] NOT NULL,
                      [Value1] [varchar](50) NOT NULL,
                      [Value2] [varchar](50) NOT NULL,
                 CONSTRAINT [PK__TestTable] PRIMARY KEY CLUSTERED 
                (
                        [ID] ASC
                )  ON [PRIMARY]
                ) ON [PRIMARY]

    And .NET code witch do the following:

                var cn = new SqlConnection();
                cn.ConnectionString = "Server=myserver;Database=mydatabase;User Id=myuser;Password=mypass";
                cn.Open();

                while (true)
                {
                    var dataId = new Guid("53767E35-57B1-4729-9363-D14C2BC5D9AE");

                    using (var tr = cn.BeginTransaction())
                    {
                        var cmd = new SqlCommand($"delete from _TestTable WHERE DataID = '{dataId}'", cn, tr);
                        cmd.ExecuteNonQuery();
                        cmd.Dispose();

                        for (var i = 0; i < 10; i++)
                        {
                            var sqlQuery = $"insert into _TestTable(ID, DataID, Value1, Value2) values(newid(), '{dataId}', 'Some value 1 - {i}', 'Some value 2 - {i}')";
                            cmd = new SqlCommand(sqlQuery, cn, tr);
                            cmd.ExecuteNonQuery();
                            cmd.Dispose();
                        }

                        tr.Commit();
                    }
                }

    Then I open SSMS and execute the simple query 

    select * from _TestTable

    In most cases I get 10 records. But sometimes I get 9, 8 or 7 records.

    Thursday, June 20, 2019 5:56 PM

Answers

  • Warning: this is a fairly long post. Yet the explanation is somewhat incomplete, because I don't know all internals well enough to give an exact explanation.

    To start with, I made some changes to your script so that I could determine which rows that are missing, and it is always the rows with the lowest guids. See script at the end of the post.

    There are at least two things to consider here: 1) Should it possible that it happens at all? and 2) How does it actually happen?

    The answer for the first question is yes. This is by no means a bug, but the promises of the isolation level READ COMMITTED are not stronger than this. If there at some point are nine committed rows, you will get back nine rows.

    If you think this is wrong, consider that you turn on READ_COMMITTED_SNAPSHOT. With this setting, you can never get back anything but ten rows (except in the very beginning before the first ten have been committed). However, the ten rows you get back, may all have been deleted, but you don't see this, since you are reading from the snapshot. Keep in mind that turning on READ_COMMITTED_SHAPSHOT does change the isolation level, only the implementation of it. So, obviously if it is correct to return ten deleted rows, it is also correct to return nine non-deleted rows and ignore the then tenth deleted one.

    As a matter of fact, I am not sure that it would be incorrect to return nine rows even in serializable isolation level. All that isolation level really guarantees is that if you run the query a second time in the same transaction, no rows will have been added. Thus, if the DELETE would rollback, that tenth row must not be displayed. In practice, it will not happen, at least not in SQL Server, due to how the isolation level is implemented.

    So to the next question: what is actually happening under the hood? Let me first observe that there are a number of changes to the script which prevents the situation from occurring within 30 seconds when I test. (In contrast: I ran the scripts as posted below on a VM with SQL 2019. It typically not take even a second before I had a result set with missing rows.)

    * I change the default to newsequentialid().
    * I make the primary key nonclustered.
    * I add a TABLOCKX for the DELETE.
    * I add a TRUNCATE TABLE after the DELETE. (But in this case I got one deadlock).

    Next, there are a few things we need to know about the internals for the operation:

    * The DELETE operation takes locks row by row, since there are so few rows.

    * Only rows (or more exactly keys) are locked to the end of the transaction.   Changes to higher points in are performed under latches that released when the operation is completed.

    * Rows are not physically delete, but marked for ghost cleanup. This makes the index wider than just 10 rows.
    When the SELECT operation starts running, it finds the starting point in the clustered index, and then traverses the CI at leaf level. Since the DELETE has released the latches, the SELECT gets the starting as the next row, and it is thus not blocked by the lock held on the deleted row.
    Let's look at the modifications I did above, one by one:
    newsequentialid() - this reduces the number of ghost records, and the ghost records will all be before the ids in the current batch. Whether this prevents the issue from happeing at all, or just makes the window a whole smaller, I don't know.

    nonclustered PK - this changes the implementation of the scan to an IAM scan. Again, I think this is a matter of just reducing the window where the issue can exhibit.

    TABLOCKX - this prevents the issue from appearing at all, since the SELECT will be blocked until the transaction commits.

    TRUNCATE TABLE - this deallocates all pages, and takes the ghost records out of the equation.

    Here is a modified version of your first script:

    DROP TABLE IF EXISTS _TestTable
    DROP TABLE IF EXISTS _LogTable
    go
    CREATE TABLE [dbo].[_TestTable](
                      [ID] [uniqueidentifier] NOT NULL DEFAULT newid(),
                      [DataID] [uniqueidentifier] NOT NULL,
                      [Value1] [varchar](50) NOT NULL,
                      [Value2] [varchar](50) NOT NULL,
    --                  filler char(7000) NOT NULL DEFAULT ' '
                 CONSTRAINT [PK__TestTable] PRIMARY KEY CLUSTERED ([ID] ) 
    )
    SELECT * INTO _LogTable FROM _TestTable
    go
    CREATE TRIGGER logtri ON _TestTable FOR INSERT AS     INSERT _LogTable SELECT * FROM inserted
    go
    declare @dataId uniqueidentifier, @loop int = 1
    set @dataId = '53767E35-57B1-4729-9363-D14C2BC5D9AE'
    declare @started datetime set @started = getdate()
    while @started > DATEADD(ss, -30, getdate())
    begin 
    begin tran
    delete from _TestTable WHERE DataID = @dataId
    insert into _TestTable(DataID, Value1, Value2)    values(@dataId, 'Some value 1 - 1', convert(varchar(10), @loop))
    insert into _TestTable(DataID, Value1, Value2)    values(@dataId, 'Some value 1 - 2', convert(varchar(10), @loop))
    insert into _TestTable(DataID, Value1, Value2)    values(@dataId, 'Some value 1 - 3', convert(varchar(10), @loop))
    insert into _TestTable(DataID, Value1, Value2)    values(@dataId, 'Some value 1 - 4', convert(varchar(10), @loop))
    insert into _TestTable(DataID, Value1, Value2)    values(@dataId, 'Some value 1 - 5', convert(varchar(10), @loop))
    insert into _TestTable(DataID, Value1, Value2)    values(@dataId, 'Some value 1 - 6', convert(varchar(10), @loop))
    insert into _TestTable(DataID, Value1, Value2)    values(@dataId, 'Some value 1 - 7', convert(varchar(10), @loop))
    insert into _TestTable(DataID, Value1, Value2)    values(@dataId, 'Some value 1 - 8', convert(varchar(10), @loop))
    insert into _TestTable(DataID, Value1, Value2)    values(@dataId, 'Some value 1 - 9', convert(varchar(10), @loop))
    insert into _TestTable(DataID, Value1, Value2)    values(@dataId, 'Some value 1 - 10', convert(varchar(10), @loop))  
    SET @loop += 1
    commit tran
    WAITFOR DELAY '00:00:00.05'
    end 
    SELECT * FROM sys.dm_db_index_physical_stats(db_id(), object_id('_TestTable'), NULL, NULL, 'DETAILED')
    go

    And here is the second script:
    declare @tbl table(ID uniqueidentifier, DataID uniqueidentifier, Value1 varchar(50), Value2 varchar(50))
    declare @started datetime, @rowCount int set @started = getdate()
    while 1=1
    begin
    delete from @tbl
    insert into @tbl select * from _TestTable
    set @rowCount = @@ROWCOUNT
    if @rowCount <> 10
    begin
    select * from  _LogTable LT
    LEFT   JOIN @tbl t ON LT.ID = t.ID
    WHERE  EXISTS (SELECT *
                   FROM   @tbl t
                   WHERE  t.Value2 = LT.Value2)
    ORDER BY LT.ID
    break
    end
    WAITFOR DELAY '00:00:00.013'
    end


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    Saturday, June 22, 2019 1:05 PM

All replies

  • When you don't see 10 rows, which values of i are missing?

    If you run DBCC USEROPTIONS in the same query window as you ran the SELECT, what does the row with isolation level return?

    And for that matter, what does this query return:
    SELECT is_read_committed_snapshot_on FROM sys.databases WHERE database_id = db_id()


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    Thursday, June 20, 2019 9:33 PM
  • Missing i values is random.

    Isolation level is read committed.

    is_read_committed_snapshot_on = 0

    If I remove Primary Key from table all will be OK.

    If I set isolation level as serializable in select query I get 10 records or deadlock.

    Friday, June 21, 2019 3:32 AM
  • Hi Akharin,

     

    Thank you for your issue .

     

    Per your description , you do an Insert operation in your .Net cade. Right ?  I am sorry that I am not good at .Net , could you please share us your condition which is more related SQL Server ?

     

    Without your environment , it might be difficult to get a right direction and make some test. BY the way, please check if there are other sessions which are using the current table .

     

    Best Regards,

    Rachel


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Friday, June 21, 2019 6:19 AM
  • Thanks for your reply.

    I reproduced it using SSMS only.

    In one tab SSMS run this query

    --------------------------------------------------------------------------------------------------------------

    declare @dataId uniqueidentifier
    set @dataId = '53767E35-57B1-4729-9363-D14C2BC5D9AE'
    declare @started datetime set @started = getdate()
    while @started > DATEADD(ss, -30, getdate())
    begin 
    begin tran
    delete from _TestTable WHERE DataID = @dataId
    insert into _TestTable(ID, DataID, Value1, Value2) values(newid(), @dataId, 'Some value 1 - 1', 'Some value 2 - 1')
    insert into _TestTable(ID, DataID, Value1, Value2) values(newid(), @dataId, 'Some value 1 - 2', 'Some value 2 - 2')
    insert into _TestTable(ID, DataID, Value1, Value2) values(newid(), @dataId, 'Some value 1 - 3', 'Some value 2 - 3')
    insert into _TestTable(ID, DataID, Value1, Value2) values(newid(), @dataId, 'Some value 1 - 4', 'Some value 2 - 4')
    insert into _TestTable(ID, DataID, Value1, Value2) values(newid(), @dataId, 'Some value 1 - 5', 'Some value 2 - 5')
    insert into _TestTable(ID, DataID, Value1, Value2) values(newid(), @dataId, 'Some value 1 - 6', 'Some value 2 - 6')
    insert into _TestTable(ID, DataID, Value1, Value2) values(newid(), @dataId, 'Some value 1 - 7', 'Some value 2 - 7')
    insert into _TestTable(ID, DataID, Value1, Value2) values(newid(), @dataId, 'Some value 1 - 8', 'Some value 2 - 8')
    insert into _TestTable(ID, DataID, Value1, Value2) values(newid(), @dataId, 'Some value 1 - 9', 'Some value 2 - 9')
    insert into _TestTable(ID, DataID, Value1, Value2) values(newid(), @dataId, 'Some value 1 - 10', 'Some value 2 - 10')
    commit tran
    WAITFOR DELAY '00:00:00.05'
    end 

    --------------------------------------------------------------------------------------------------------------

    And in another tab run this

    --------------------------------------------------------------------------------------------------------------

    declare @rowCount int
    declare @tbl table(ID uniqueidentifier, DataID uniqueidentifier, Value1 varchar(50), Value2 varchar(50))
    declare @started datetime set @started = getdate()
    while 1=1
    begin
    delete from @tbl
    insert into @tbl select * from _TestTable
    set @rowCount = @@ROWCOUNT
    if @rowCount <> 10
    begin
    select * from @tbl
    break
    end
    WAITFOR DELAY '00:00:00.013'
    end

    --------------------------------------------------------------------------------------------------------------

    Friday, June 21, 2019 6:33 AM
  • Hi Akharin,

     

    Thank you for your reply.

     

    As you said that 'is_read_committed_snapshot_on = 0' , 0 = READ_COMMITTED_SNAPSHOT option is OFF (default). And then I tried your script , it will worked well and could not reproduce it.

     

    The first code, it will delete and insert ten data in the table until @started < DATEADD(ss, -30, getdate()) stops.

     

    And the second one , it will not stop and stay in a loop.

     

    Best Regards,

    Rachel


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.


    Friday, June 21, 2019 8:59 AM
  • Rachel,

    which version of SQL Server are you using? 2019?

    On SQL Server 2016 an early  it reproduce very often. On SQL Server 2017 rarely, but happens.

    May be increase time in DATEADD(ss, -30, getdate()) to several minutes?

    DATEADD(ss, -120, getdate())


    • Edited by AKharin Friday, June 21, 2019 9:32 AM
    Friday, June 21, 2019 9:32 AM
  • Warning: this is a fairly long post. Yet the explanation is somewhat incomplete, because I don't know all internals well enough to give an exact explanation.

    To start with, I made some changes to your script so that I could determine which rows that are missing, and it is always the rows with the lowest guids. See script at the end of the post.

    There are at least two things to consider here: 1) Should it possible that it happens at all? and 2) How does it actually happen?

    The answer for the first question is yes. This is by no means a bug, but the promises of the isolation level READ COMMITTED are not stronger than this. If there at some point are nine committed rows, you will get back nine rows.

    If you think this is wrong, consider that you turn on READ_COMMITTED_SNAPSHOT. With this setting, you can never get back anything but ten rows (except in the very beginning before the first ten have been committed). However, the ten rows you get back, may all have been deleted, but you don't see this, since you are reading from the snapshot. Keep in mind that turning on READ_COMMITTED_SHAPSHOT does change the isolation level, only the implementation of it. So, obviously if it is correct to return ten deleted rows, it is also correct to return nine non-deleted rows and ignore the then tenth deleted one.

    As a matter of fact, I am not sure that it would be incorrect to return nine rows even in serializable isolation level. All that isolation level really guarantees is that if you run the query a second time in the same transaction, no rows will have been added. Thus, if the DELETE would rollback, that tenth row must not be displayed. In practice, it will not happen, at least not in SQL Server, due to how the isolation level is implemented.

    So to the next question: what is actually happening under the hood? Let me first observe that there are a number of changes to the script which prevents the situation from occurring within 30 seconds when I test. (In contrast: I ran the scripts as posted below on a VM with SQL 2019. It typically not take even a second before I had a result set with missing rows.)

    * I change the default to newsequentialid().
    * I make the primary key nonclustered.
    * I add a TABLOCKX for the DELETE.
    * I add a TRUNCATE TABLE after the DELETE. (But in this case I got one deadlock).

    Next, there are a few things we need to know about the internals for the operation:

    * The DELETE operation takes locks row by row, since there are so few rows.

    * Only rows (or more exactly keys) are locked to the end of the transaction.   Changes to higher points in are performed under latches that released when the operation is completed.

    * Rows are not physically delete, but marked for ghost cleanup. This makes the index wider than just 10 rows.
    When the SELECT operation starts running, it finds the starting point in the clustered index, and then traverses the CI at leaf level. Since the DELETE has released the latches, the SELECT gets the starting as the next row, and it is thus not blocked by the lock held on the deleted row.
    Let's look at the modifications I did above, one by one:
    newsequentialid() - this reduces the number of ghost records, and the ghost records will all be before the ids in the current batch. Whether this prevents the issue from happeing at all, or just makes the window a whole smaller, I don't know.

    nonclustered PK - this changes the implementation of the scan to an IAM scan. Again, I think this is a matter of just reducing the window where the issue can exhibit.

    TABLOCKX - this prevents the issue from appearing at all, since the SELECT will be blocked until the transaction commits.

    TRUNCATE TABLE - this deallocates all pages, and takes the ghost records out of the equation.

    Here is a modified version of your first script:

    DROP TABLE IF EXISTS _TestTable
    DROP TABLE IF EXISTS _LogTable
    go
    CREATE TABLE [dbo].[_TestTable](
                      [ID] [uniqueidentifier] NOT NULL DEFAULT newid(),
                      [DataID] [uniqueidentifier] NOT NULL,
                      [Value1] [varchar](50) NOT NULL,
                      [Value2] [varchar](50) NOT NULL,
    --                  filler char(7000) NOT NULL DEFAULT ' '
                 CONSTRAINT [PK__TestTable] PRIMARY KEY CLUSTERED ([ID] ) 
    )
    SELECT * INTO _LogTable FROM _TestTable
    go
    CREATE TRIGGER logtri ON _TestTable FOR INSERT AS     INSERT _LogTable SELECT * FROM inserted
    go
    declare @dataId uniqueidentifier, @loop int = 1
    set @dataId = '53767E35-57B1-4729-9363-D14C2BC5D9AE'
    declare @started datetime set @started = getdate()
    while @started > DATEADD(ss, -30, getdate())
    begin 
    begin tran
    delete from _TestTable WHERE DataID = @dataId
    insert into _TestTable(DataID, Value1, Value2)    values(@dataId, 'Some value 1 - 1', convert(varchar(10), @loop))
    insert into _TestTable(DataID, Value1, Value2)    values(@dataId, 'Some value 1 - 2', convert(varchar(10), @loop))
    insert into _TestTable(DataID, Value1, Value2)    values(@dataId, 'Some value 1 - 3', convert(varchar(10), @loop))
    insert into _TestTable(DataID, Value1, Value2)    values(@dataId, 'Some value 1 - 4', convert(varchar(10), @loop))
    insert into _TestTable(DataID, Value1, Value2)    values(@dataId, 'Some value 1 - 5', convert(varchar(10), @loop))
    insert into _TestTable(DataID, Value1, Value2)    values(@dataId, 'Some value 1 - 6', convert(varchar(10), @loop))
    insert into _TestTable(DataID, Value1, Value2)    values(@dataId, 'Some value 1 - 7', convert(varchar(10), @loop))
    insert into _TestTable(DataID, Value1, Value2)    values(@dataId, 'Some value 1 - 8', convert(varchar(10), @loop))
    insert into _TestTable(DataID, Value1, Value2)    values(@dataId, 'Some value 1 - 9', convert(varchar(10), @loop))
    insert into _TestTable(DataID, Value1, Value2)    values(@dataId, 'Some value 1 - 10', convert(varchar(10), @loop))  
    SET @loop += 1
    commit tran
    WAITFOR DELAY '00:00:00.05'
    end 
    SELECT * FROM sys.dm_db_index_physical_stats(db_id(), object_id('_TestTable'), NULL, NULL, 'DETAILED')
    go

    And here is the second script:
    declare @tbl table(ID uniqueidentifier, DataID uniqueidentifier, Value1 varchar(50), Value2 varchar(50))
    declare @started datetime, @rowCount int set @started = getdate()
    while 1=1
    begin
    delete from @tbl
    insert into @tbl select * from _TestTable
    set @rowCount = @@ROWCOUNT
    if @rowCount <> 10
    begin
    select * from  _LogTable LT
    LEFT   JOIN @tbl t ON LT.ID = t.ID
    WHERE  EXISTS (SELECT *
                   FROM   @tbl t
                   WHERE  t.Value2 = LT.Value2)
    ORDER BY LT.ID
    break
    end
    WAITFOR DELAY '00:00:00.013'
    end


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    Saturday, June 22, 2019 1:05 PM