DB deadloacks causing crashes? RRS feed

  • Question

  • I've seen a lot of DPM crashes that coincide with DBCC TRACEON 1204 and 1222 deadlock events on the SQL server. I think this may be causing the deadlock victim to get a bad response back which causes the crash.
    The DPM database is the only database on the SQL install which is on the DPM server.
    They seem to occur especially often when there's another job e.g. a consistency check also running and during tape writes.

    Crash log extract.

    1BA8    2744    06/12    14:05:50.460    02    EventManager.cs(98)            NORMAL    Publishing event from ShadowCopy.cs(497): ShadowCopyValidityChanged, [ShadowCopyId=4dd3af0a-f031-42b4-9695-0283cba843c3]
    1BA8    2144    06/12    14:05:50.491    02    EventManager.cs(98)        7FA8EF2B-A075-4278-8608-D2A3A3CD733C    NORMAL    Publishing event from Replica.cs(2367): ReplicaStatusChange, [DataSourceID=1e9a5c51-4510-409a-89d0-c76f501e0d24]
    1BA8    2144    06/12    14:05:50.491    27    OperationTypeLock.cs(902)        7FA8EF2B-A075-4278-8608-D2A3A3CD733C    NORMAL    In Unlock() of ArchiveFromSCOperationTypeLock, returning
    1BA8    2144    06/12    14:05:50.538    68    DpmThreadPool.cs(121)        7FA8EF2B-A075-4278-8608-D2A3A3CD733C    WARNING    Caught unhandled exception : Microsoft.Internal.EnterpriseStorage.Dls.DB.DBCorruptionException: Unable to retrieve ShadowCopy '00000000-0000-0000-0000-000000000000' from the database
    1BA8    2144    06/12    14:05:50.538    68    DpmThreadPool.cs(121)        7FA8EF2B-A075-4278-8608-D2A3A3CD733C    WARNING       at Microsoft.Internal.EnterpriseStorage.Dls.PRMCatalog.Replica.ShadowCopy.GetInstance(DbContext ctx, Guid shadowCopyId)
    1BA8    2144    06/12    14:05:50.538    68    DpmThreadPool.cs(121)        7FA8EF2B-A075-4278-8608-D2A3A3CD733C    WARNING       at Microsoft.Internal.EnterpriseStorage.Dls.PRMCatalog.PrmCatalog.GetShadowCopyInstance(Guid shadowCopyId)
    1BA8    2144    06/12    14:05:50.538    68    DpmThreadPool.cs(121)        7FA8EF2B-A075-4278-8608-D2A3A3CD733C    WARNING       at Microsoft.Internal.EnterpriseStorage.Dls.Prm.SetShadowCopyContextBlock.LoadShadowCopyProperties()
    1BA8    2144    06/12    14:05:50.538    68    DpmThreadPool.cs(121)        7FA8EF2B-A075-4278-8608-D2A3A3CD733C    WARNING       at Microsoft.Internal.EnterpriseStorage.Dls.Prm.SetShadowCopyContextBlock.NewStorageShadowCopyCreated(Message msg)
    1BA8    2144    06/12    14:05:50.538    68    DpmThreadPool.cs(121)        7FA8EF2B-A075-4278-8608-D2A3A3CD733C    WARNING       at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.Fsm.ConnectionPoint.Execute(Message msg)
    1BA8    2144    06/12    14:05:50.538    68    DpmThreadPool.cs(121)        7FA8EF2B-A075-4278-8608-D2A3A3CD733C    WARNING       at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.Fsm.Engine.ChangeState(Message msg)
    1BA8    2144    06/12    14:05:50.538    68    DpmThreadPool.cs(121)        7FA8EF2B-A075-4278-8608-D2A3A3CD733C    WARNING       at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.TaskInstance.Process(Object dummy)
    1BA8    2144    06/12    14:05:50.538    68    DpmThreadPool.cs(121)        7FA8EF2B-A075-4278-8608-D2A3A3CD733C    WARNING       at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.FsmThreadFunction.Function(Object taskThreadContextObj)
    1BA8    2144    06/12    14:05:50.538    68    DpmThreadPool.cs(121)        7FA8EF2B-A075-4278-8608-D2A3A3CD733C    WARNING       at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
    1BA8    2144    06/12    14:05:50.538    68    DpmThreadPool.cs(121)        7FA8EF2B-A075-4278-8608-D2A3A3CD733C    WARNING       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
    1BA8    2144    06/12    14:05:50.538    68    DpmThreadPool.cs(121)        7FA8EF2B-A075-4278-8608-D2A3A3CD733C    WARNING       at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
    1BA8    2144    06/12    14:05:50.538    68    DpmThreadPool.cs(121)        7FA8EF2B-A075-4278-8608-D2A3A3CD733C    WARNING       at System.Threading.ThreadPoolWorkQueue.Dispatch()
    1BA8    2144    06/12    14:05:50.538    68    DpmThreadPool.cs(149)        7FA8EF2B-A075-4278-8608-D2A3A3CD733C    WARNING    Attempting to write error to the event log Unable to retrieve ShadowCopy '00000000-0000-0000-0000-000000000000' from the database
    1BA8    2144    06/12    14:05:50.538    02    EventManager.cs(143)        7FA8EF2B-A075-4278-8608-D2A3A3CD733C    NORMAL    Logging event to NT log from DpmThreadPool.cs(163): FatalServiceError
    1BA8    2144    06/12    14:05:50.553    68    DpmThreadPool.cs(170)        7FA8EF2B-A075-4278-8608-D2A3A3CD733C    WARNING    Attempting to invoke Watson
    1BA8    2144    06/12    14:05:50.569    09    everettexception.cpp(762)        7FA8EF2B-A075-4278-8608-D2A3A3CD733C    CRITICAL    Exception Message = Unable to retrieve ShadowCopy '00000000-0000-0000-0000-000000000000' from the database of type Microsoft.Internal.EnterpriseStorage.Dls.DB.DBCorruptionException, process will terminate after generating dump
    Monday, June 12, 2017 3:37 PM

All replies

  • Hello, I am experiencing pretty much the same behaviour. 

    DPM crashes every 6 hours after restart with DBCC TRACEON 1204 and 1222 deadlocks. There are no visible scheduled jobs in DPM or SQL at this time, DPMAM hangs with event 940, all other DBs and services working fine.

    This started after update to RU13. I've completly reinstalled DPM and updated it to RU13 again, will be able to tell if it fixes the problem in a few hours. 

    Tuesday, June 13, 2017 11:53 AM
  • Forgot to mention, I'm using it on Server 2016 with SQL 2014 locally and DPM version 5.0.322.0 which corresponds to Update Rollup 2 for System Center 2016 Data Protection Manager. As I'm currently unable to write to tape, I've raised a support call for this, I'll post back here if I get an answer.
    Tuesday, June 13, 2017 3:20 PM
  • Than we have different problems, though i see shadow copy mentioned in your case aswell.

    I was able to fix the issue by removing two registry keys from  HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\VSS\Providers\**** (VSS Null Provider keys as stated in default record)

    Once I have removed these keys and reinstalled DPM with latest update - everything started working again. 

    Wednesday, June 14, 2017 4:12 AM
  • Exact same issue here, on a brand new install of DPM, same SQL and Windows versions too. One particular client backup to tape takes down the entire DPM service.
    Friday, June 16, 2017 6:25 AM
  • Same issue here. DPM 2016 UR2, SQL 2016. DB locks are happening daily. Tapes backup takes ages, if I kill DBCC process DPM crashes, if i kill DELETE process DPM don't crash, backups are continuing.

    Friday, June 16, 2017 8:28 AM
  • It turned out that there was a particular problem machine which support advised I take out the PG, delete the recovery points and try the backup to tape again. This worked and the machine was added back in but I don't know why it happened to start with. Hoping for a fix for this in future updates.
    Tuesday, June 20, 2017 3:37 PM
  • Hmm, yes I thought that was the issue, but the problem came back within a week, on the same machine too. Looks like I may need to open a support case too. I'd strongly recommend trying to leave yours open for a week or so until you can make sure it actually is resolved.
    Tuesday, June 20, 2017 3:39 PM
  • Update on my issue: the protection group in question only has 3 client computers in it. 2 out of 3 are now causing a DPM service crash when attempting a tape backup (one of them being the one which started this whole problem).

    SystemInsider, has your issue recurred with the machine that was removed and added back in?

    Out of interest, what sort of volume of data are you dealing with on this machine?

    Friday, June 23, 2017 3:22 PM
  • Hi, We've not seen a re occurrence of the issue since making the change just over a week ago. The machine in question is an end user machine and the weeks replicas consume less than 10GB.

    If it's any consolation to anyone else seeing this issue, Microsoft Support assured me that the case had been passed to their Engineers so that it can at least be handled better when it does occur.

    Monday, June 26, 2017 1:29 PM
  • Well unfortunately it seems the Engineers have done nothing about it as we are still having the exact same issue months later, on another fresh install, as I detailed on a new thread.
    Friday, January 19, 2018 10:15 AM
  • Unfortunately we are also still having this issue on a fully updated DPM 2016 UR4.
    Friday, January 19, 2018 3:36 PM