none
DPM Service Crashing during tape backup RRS feed

  • Question

  • Once again we have a major problem with the DPM service crashing every time it tries to do a scheduled tape backup.

    The whole DPM 2016 server was wiped and reinstalled 2 weeks ago and a fresh DPM database was setup, it was not restored from the previous version. And once again, every time it tries to do a backup to tape of a data from a particular client workstation, the whole service crashes, complaining about database corruption with missing shadow copies. It is possible to get that client to back up to tape by manually creating a recovery point to tape, but because the whole service crashes the entire tape backup job has to be restarted. If that client is not the last one to run it keeps crashing so the job has to be cancelled.

    We are running UR4 which appears to be just completely unusable, do we need to start all over again with the machine and go back to UR3?

    Unable to connect to the DPM database because the database is in an inconsistent state.
    Problem Details:
    <FatalServiceError><__System><ID>19</ID><Seq>188697</Seq><TimeCreated>12/01/2018 1:32:57 PM</TimeCreated><Source>DpmThreadPool.cs</Source><Line>163</Line><HasError>True</HasError></__System><ExceptionType>DBCorruptionException</ExceptionType><ExceptionMessage>Unable to retrieve ShadowCopy '00000000-0000-0000-0000-000000000000' from the database</ExceptionMessage><ExceptionDetails>Microsoft.Internal.EnterpriseStorage.Dls.DB.DBCorruptionException: Unable to retrieve ShadowCopy '00000000-0000-0000-0000-000000000000' from the database
       at Microsoft.Internal.EnterpriseStorage.Dls.PRMCatalog.Replica.ShadowCopy.GetInstance(DbContext ctx, Guid shadowCopyId)
       at Microsoft.Internal.EnterpriseStorage.Dls.PRMCatalog.PrmCatalog.GetShadowCopyInstance(Guid shadowCopyId)
       at Microsoft.Internal.EnterpriseStorage.Dls.Prm.SetShadowCopyContextBlock.LoadShadowCopyProperties()
       at Microsoft.Internal.EnterpriseStorage.Dls.Prm.SetShadowCopyContextBlock.NewStorageShadowCopyCreated(Message msg)
       at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.Fsm.ConnectionPoint.Execute(Message msg)
       at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.Fsm.Engine.ChangeState(Message msg)
       at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.TaskInstance.Process(Object dummy)
       at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.FsmThreadFunction.Function(Object taskThreadContextObj)
       at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
       at System.Threading.ThreadPoolWorkQueue.Dispatch()</ExceptionDetails></FatalServiceError>

    An unexpected error caused a failure for process 'DPMAMService'.  Restart the DPM process 'DPMAMService'.
    Problem Details:
    <FatalServiceError><__System><ID>19</ID><Seq>0</Seq><TimeCreated>8/01/2018 1:39:37 AM</TimeCreated><Source>DpmThreadPool.cs</Source><Line>163</Line><HasError>True</HasError></__System><ExceptionType>DlsException</ExceptionType><ExceptionMessage>exception</ExceptionMessage><ExceptionDetails>Microsoft.Internal.EnterpriseStorage.Dls.Utils.DlsException: exception ---&gt; System.Runtime.InteropServices.COMException: Server execution failed (Exception from HRESULT: 0x80080005 (CO_E_SERVER_EXEC_FAILURE))
    Server stack trace:
       at System.Runtime.InteropServices.Marshal.ThrowExceptionForHRInternal(Int32 errorCode, IntPtr errorInfo)
       at Microsoft.Internal.EnterpriseStorage.Dls.Engine.CProxyUtils.HandleErrors(Int32 hr, tagSAFEARRAY* exceptionResult)
       at Microsoft.Internal.EnterpriseStorage.Dls.Engine.EngineServicesProxy.CheckForPendingReboot()
       at Microsoft.Internal.EnterpriseStorage.Dls.EngineProxyWrapper.EngineServiceProxyWrapper.CheckForPendingReboot()
       at Microsoft.Internal.EnterpriseStorage.Dls.EngineProxyWrapper.EngineServiceProxyWrapper.ConnectAsAdmin(String dpmServerName, AsyncOperation asyncOperation)
       at Microsoft.Internal.EnterpriseStorage.Dls.EngineProxyWrapper.EngineServiceProxyWrapper.GetInstance(String dpmServerName, AsyncOperation asyncOperation)
       at Microsoft.Internal.EnterpriseStorage.Dls.UI.ObjectModel.OMCommon.DpmServer.ReadRegistryKeyOnDPMServer(String registryKeyPath, String registryKeyName, RegistryValueKind registryValueKind)
       at System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr md, Object[] args, Object server, Object[]&amp; outArgs)
       at System.Runtime.Remoting.Messaging.StackBuilderSink.AsyncProcessMessage(IMessage msg, IMessageSink replySink)
    Exception rethrown at [0]:
       at System.Runtime.Remoting.Proxies.RealProxy.EndInvokeHelper(Message reqMsg, Boolean bProxyCase)
       at System.Runtime.Remoting.Proxies.RemotingProxy.Invoke(Object NotUsed, MessageData&amp; msgData)
       at Microsoft.Internal.EnterpriseStorage.Dls.UI.ObjectModel.OMCommon.DpmServer.ReadRegistryKeyAsync.EndInvoke(IAsyncResult result)
       at Microsoft.Internal.EnterpriseStorage.Dls.UI.ObjectModel.OMCommon.DpmServer.ReadRegistryKey(String registryKeyPath, String registryKeyName, RegistryValueKind registryValueKind)
       at Microsoft.Internal.EnterpriseStorage.Dls.UI.ObjectModel.OMCommon.DpmServer.InitializeIgnorableSqlErrorNumbersList()
       at Microsoft.Internal.EnterpriseStorage.Dls.UI.ObjectModel.OMCommon.DpmServer.GetDpmServerObject(String serverName, AsyncOperation asyncOperation, DpmServerScope dpmServerScope)
       at Microsoft.Internal.EnterpriseStorage.Dls.UI.ObjectModel.OMCommon.DpmServerFactory.GetServer(String serverName, AsyncOperation asyncOperation, DpmServerScope dpmServerScope)
       --- End of inner exception stack trace ---
       at Microsoft.Internal.EnterpriseStorage.Dls.UI.ObjectModel.OMCommon.DpmServerFactory.GetServer(String serverName, AsyncOperation asyncOperation, DpmServerScope dpmServerScope)
       at Microsoft.Internal.EnterpriseStorage.Dls.UI.AutoHeal.AutoHeal.DpmStarted()
       at Microsoft.Internal.EnterpriseStorage.Dls.EngineUICommon.DpmThreadPool.Function(Object state)
       at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
       at System.Threading.ThreadPoolWorkQueue.Dispatch()
    *** Mojito error was: ConnectionToServerFailed; 0; None</ExceptionDetails></FatalServiceError>


    • Edited by eforgacs Friday, January 12, 2018 11:57 PM
    Friday, January 12, 2018 11:56 PM

All replies

  • Look at this thread:

    https://social.technet.microsoft.com/Forums/en-US/43dedbb9-993d-4d6c-acab-159dc120e9c7/regular-dpm-2016-service-crash?forum=dataprotectionmanager

    This SQL Script should fix the issue:

    ------------------------------------------------------

    IF EXISTS (SELECT * FROM dbo.sysobjects
               WHERE id = OBJECT_ID(N'prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica')
               AND OBJECTPROPERTY(id, N'IsProcedure') = 1)
    DROP PROCEDURE dbo.prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica
    GO

    -- Returns the number of valid datasets for that datasource on the physical replica for the given datasource
    CREATE PROCEDURE dbo.prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica
    (
      @DatasourceId GUID,
      @ReplicaId GUID,
      @IncludeDatasetsWithoutSC BIT = 1     -- Default value 1
    )
    AS
        DECLARE @error int
        SET @error = 0
       
        SET NOCOUNT ON

        IF @IncludeDatasetsWithoutSC = 1
        BEGIN
            SELECT COUNT(Dataset.DatasetId)
            FROM dbo.tbl_RM_ReplicaDataset AS Dataset
                LEFT OUTER JOIN tbl_RM_ShadowCopy ShadowCopy
                    ON ShadowCopy.ShadowCopyId = Dataset.ShadowCopyId
                JOIN dbo.tbl_RM_RecoverySource AS RecoverySource
                    ON Dataset.DatasetId = RecoverySource.DatasetId
                JOIN tbl_PRM_LogicalReplica ReplicaDS
                    ON Dataset.ReplicaId = ReplicaDS.ReplicaId
            WHERE ReplicaDS.PhysicalReplicaId = @ReplicaId AND
                ReplicaDS.DatasourceId = @DatasourceId AND
                ReplicaDS.Validity <> 4 AND     -- not destroyed
                (ShadowCopy.Validity = 2 OR Dataset.ShadowCopyId IS NULL) AND
                RecoverySource.IsValid = 1 AND
                RecoverySource.IsGCed = 0 AND
                ReplicaDS.IsGCed = 0 AND
                (ShadowCopy.IsGCed = 0 OR ShadowCopy.IsGCed IS NULL) AND
                Dataset.IsGCed = 0
        END
        ELSE
        BEGIN
            SELECT COUNT(Dataset.DatasetId)
            FROM dbo.tbl_RM_ReplicaDataset AS Dataset
                JOIN tbl_RM_ShadowCopy ShadowCopy
                    ON ShadowCopy.ShadowCopyId = Dataset.ShadowCopyId
                JOIN dbo.tbl_RM_RecoverySource AS RecoverySource
                    ON Dataset.DatasetId = RecoverySource.DatasetId
                JOIN tbl_PRM_LogicalReplica ReplicaDS
                    ON Dataset.ReplicaId = ReplicaDS.ReplicaId
            WHERE ReplicaDS.PhysicalReplicaId = @ReplicaId AND
                ReplicaDS.DatasourceId = @DatasourceId AND
                ReplicaDS.Validity <> 4 AND     -- not destroyed
                ShadowCopy.Validity = 2 AND
                RecoverySource.IsValid = 1 AND
                RecoverySource.IsGCed = 0 AND
                ReplicaDS.IsGCed = 0 AND
                ShadowCopy.IsGCed = 0 AND
                Dataset.IsGCed = 0
        END
         
        SET @error = @@ERROR
        SET NOCOUNT OFF

        RETURN @error
    GO

    ------------------------------------------------------


    • Edited by Michael-CM Tuesday, January 16, 2018 5:06 PM
    • Proposed as answer by Michael-CM Tuesday, January 16, 2018 5:06 PM
    • Unproposed as answer by eforgacs Wednesday, January 17, 2018 12:07 AM
    Tuesday, January 16, 2018 5:05 PM
  • The error messages referred to there appear to be quite different (not referring to DBCorruptionException) and my logs have no reference to the stored procedure prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica.

    Wednesday, January 17, 2018 12:11 AM
  • Just to make sure there is no doubt from my earlier response, the issue was not resolved by modifying the stored procedure. This was confirmed by a tape backup job attempting to run today and crashing the DPM service. We are therefore back to square one.

    Hello DPM team are you there???

    Friday, January 19, 2018 10:04 AM