none
Changing two publications at the same time causes deadlock in distribution database

    Question

  • I have transactional replication setup.  Let's say three databases on server A replicate to corresponding three databases on Server B.  The databases are identical (same structure for different clients).  Server B also acts as a distributor.  Everything is working fine.  Now I need to change one of the tables (which is also a publication article).  I need to do it in all databases at the same time.  I follow this process (it is scripted):


    This is where I run into deadlocks.  The deadlock takes place in the distribution database when this stored procedure is executed: sp_MSdrop_subscription.  I don't call this procedure directly, but apparently it gets executed when I run sp_dropsubscription (see step #1 above).  Once again, I modify all publication databases at the same time
    and they all use the same distribution database, so this is where the problem lies.  The two conflicting statements are captured by the deadlock trace (1222) and apparently come from inside sp_MSdrop_subscription. 

    First connection runs:

    if not exists (select * from dbo.MSsubscriptions    where
    publisher_id = @publisher_id and
    publisher_db = @publisher_db and
    (publication_id = @publication_id or @publication_id is NULL or @independent_agent = 0 ) and
    independent_agent = @independent_agent and
    subscriber_id = @subscriber_id and
    subscriber_db = @subscriber_db and
    subscription_type = @subscription_type)

    Second connection runs:

    delete from dbo.MSsubscriptions where
    publisher_id = @publisher_id and
    publisher_db = @publisher_db and
    (publication_id = @publication_id or
    @publication_id is NULL) and
    (@article_id is NULL or
    article_id = @article_id) and
    Message
    ((subscriber_id = @subscriber_id and
    (subscriber_db = @subscriber_db or @subscriber_id = @virtual)) or
    -- Delete virtual anonymous subscription
    -- if deleting virtual subscription for a anonymous publication
    (@subscriber_id = @virtual and subscriber_id = @virtual_anonymous))

    and here is the resource list of what's being locked:

    resource-list

    keylock hobtid=72057594039959552 dbid=34
    objectname=distribution.dbo.MSsubscriptions indexname=ucMSsubscriptions
    id=lock13563b300 mode=X associatedObjectId=72057594039959552

    owner-list
    owner id=process2d7241498 mode=X
    waiter-list
    waiter id=process212b58188 mode=S requestType=wait

    keylock hobtid=72057594040025088 dbid=34
    objectname=distribution.dbo.MSsubscriptions indexname=iMSsubscriptions
    id=lock17ff3af80 mode=X associatedObjectId=72057594040025088

    owner-list
    owner id=process212b58188 mode=X
    waiter-list
    waiter id=process2d7241498 mode=U requestType=wait

    I appreciate any pointers on how to avoid this.  The only work-around  I can think of right now is to have separate distribution databases per publication database, but I can't believe that replication doesn't support modifying multiple publishers at the same time.

    Thank you!

    Friday, March 28, 2014 5:01 PM

Answers

  • To close this topic,I just wanted to share what I did to work-around this problem in case someone else runs into it. Once again, the fundamental issue is caused by simultaneous changes being made to multiple publications, which share the same distribution database. After analyzing the code of the Microsoft queries, which are reported by the deadlock trace (see my original post above), I found out that MSsubscriptions table is missing an index, so I went ahead an added it:
        CREATE NONCLUSTERED INDEX [iMSsubscriptions3] ON [dbo].[MSsubscriptions]
    (
        [publisher_id] ASC,
        [publisher_db] ASC,
        [publication_id] ASC,
        [independent_agent] ASC,
        [article_id] ASC,
        [subscriber_id] ASC,
        [subscriber_db] ASC,
        [subscription_type] ASC,
        [publisher_database_id] ASC
    )

    You can check the execution plan of the queries and verify that this index is actually being leveraged.  In addition, as suggested by Tracy, the indexes should be defragmented regularly.

    The index and defragmentation help, but since I have to make changes to dozens of publications at once, it still was not enough. So the second part of the fix was to simply introduce a random delay. I added up to 3 min delay to my deployment script and this scattered the queries enough to significantly lower probability of deadlocks (but not eliminate it completely :( Something like this:

    DECLARE     @Delay VARCHAR(8),
        @Delay_Sec VARCHAR(2),
        @Dealay_Min VARCHAR(2)
    
    SET @Delay_Sec = RIGHT('0' + CAST(1 + FLOOR(59 * RAND(CONVERT(varbinary, NEWID()))) AS VARCHAR(2)), 2) -- 1 to 59
    SET @Dealay_Min = RIGHT('0' + CAST(0 + FLOOR(3 * RAND(CONVERT(varbinary, NEWID()))) AS VARCHAR(2)), 2) -- 0 to 2
    
    SET  @Delay = '00:' + @Dealay_Min + ':' + @Delay_Sec
    WAITFOR DELAY @Delay

    I hope this helps somebody at some point.

    Monday, April 07, 2014 4:50 PM

All replies

  • I have never seen this kind of problem.  Please post the results of SELECT @@VERSION.

    Also, why are you dropping the article?  There are only a few reasons to actually drop the article when making changes to a table.

    Friday, March 28, 2014 5:09 PM
  • All of the SQL Servers in question have the following version:

    Microsoft SQL Server 2012 (SP1) - 11.0.3339.0 (X64) Jan 14 2013 19:02:10 Copyright (c) Microsoft Corporation
        Enterprise Edition: Core-based Licensing (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1) (Hypervisor)

    The reason we are dropping an article is because we need to turn one of the columns into an IDENTITY column.  As a result there is some DROP/CREATE manipulations we have to do.

    Thank you!

    Friday, March 28, 2014 5:49 PM
  • Instead of doing it in all databases at the same time, try following your same process, but 1 database at a time.  Do you still get the deadlock?  How many articles do you have per publication?

    There should be no need to have a separate distribution database per publication database.

    Replication does support modifying multiple publishers but this is a unique situation.  There is a chance you may need to refine the process.


    Brandon Williams (blog | linkedin)

    Friday, March 28, 2014 5:50 PM
    Moderator
  • We have around 100 articles per publication, but only one of them is being changed (in this case).

    I have already tried running sequentially and know that it works fine.  It's definitely a deadlock, which happens when more than one process accesses underlying system tables in the "distribution" database (as shown in the deadlock output above).  The problem for me is that we actually have hundreds of databases where we need to roll-out changes.  Doing it sequentially is unacceptable from the deployment window perspective.

    Friday, March 28, 2014 6:36 PM
  • Hi,

    It indicates that the deadlock table is MSsubscriptions. You may check the fragmentation of the index and rebuild the corresponding index and see if it reduces the deadlock of this occurring.

    USE distribution;

    GO

    SELECT a.index_id, name, avg_fragmentation_in_percent

    FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(N'MSsubscriptions'),

         NULL, NULL, NULL) AS a

        JOIN sys.indexes AS b ON a.object_id = b.object_id AND a.index_id = b.index_id;

    GO

    -- REBUILD INDEX

    USE distribution;

    GO

    ALTER INDEX ucMSsubscriptions ON MSsubscriptions

    REBUILD;

    GO

    Reorganize and Rebuild Indexes

    http://technet.microsoft.com/en-us/library/ms189858.aspx

    Thanks.

    Tracy Cai
    TechNet Community Support


    Tuesday, April 01, 2014 5:29 AM
    Moderator
  • Thank you, Tracy, the indexes were fragmented and rebuilding them did help, but didn't eliminate the deadlocks.
    • Proposed as answer by Tom Phillips Monday, April 07, 2014 10:18 PM
    Monday, April 07, 2014 4:48 PM
  • To close this topic,I just wanted to share what I did to work-around this problem in case someone else runs into it. Once again, the fundamental issue is caused by simultaneous changes being made to multiple publications, which share the same distribution database. After analyzing the code of the Microsoft queries, which are reported by the deadlock trace (see my original post above), I found out that MSsubscriptions table is missing an index, so I went ahead an added it:
        CREATE NONCLUSTERED INDEX [iMSsubscriptions3] ON [dbo].[MSsubscriptions]
    (
        [publisher_id] ASC,
        [publisher_db] ASC,
        [publication_id] ASC,
        [independent_agent] ASC,
        [article_id] ASC,
        [subscriber_id] ASC,
        [subscriber_db] ASC,
        [subscription_type] ASC,
        [publisher_database_id] ASC
    )

    You can check the execution plan of the queries and verify that this index is actually being leveraged.  In addition, as suggested by Tracy, the indexes should be defragmented regularly.

    The index and defragmentation help, but since I have to make changes to dozens of publications at once, it still was not enough. So the second part of the fix was to simply introduce a random delay. I added up to 3 min delay to my deployment script and this scattered the queries enough to significantly lower probability of deadlocks (but not eliminate it completely :( Something like this:

    DECLARE     @Delay VARCHAR(8),
        @Delay_Sec VARCHAR(2),
        @Dealay_Min VARCHAR(2)
    
    SET @Delay_Sec = RIGHT('0' + CAST(1 + FLOOR(59 * RAND(CONVERT(varbinary, NEWID()))) AS VARCHAR(2)), 2) -- 1 to 59
    SET @Dealay_Min = RIGHT('0' + CAST(0 + FLOOR(3 * RAND(CONVERT(varbinary, NEWID()))) AS VARCHAR(2)), 2) -- 0 to 2
    
    SET  @Delay = '00:' + @Dealay_Min + ':' + @Delay_Sec
    WAITFOR DELAY @Delay

    I hope this helps somebody at some point.

    Monday, April 07, 2014 4:50 PM