none
Assistance with MSDPM error 945

    Question

  • Hello,

    Last week, we built a new bare-metal server, hosting DPM 1801, with only a handful of protection groups/members. Every night, at midnight, we receive the following error:

    Unable to connect to the DPM database because of a general database failure.  Make sure that SQL Server is running and that it is configured correctly.  
    Problem Details:<FatalServiceError>
    	<__System>
    		<ID>19</ID>
    		<Seq>8765</Seq>
    		<TimeCreated>6/4/2018 7:00:05 AM</TimeCreated>
    		<Source>DpmThreadPool.cs</Source>
    		<Line>163</Line>
    		<HasError>True</HasError>
    	</__System>
    	<ExceptionType>SqlException</ExceptionType>
    	<ExceptionMessage>Violation of PRIMARY KEY constraint 'PK_tbl_SM_Disk_Usage_Trend'. Cannot insert duplicate key in object 'dbo.tbl_SM_Disk_Usage_Trend'. The duplicate key value is (727b4c80-ee5e-41ed-bd18-5b98ad93123d, 6289e98c-8f05-4e2d-b308-073df8e90f65).The statement has been terminated.</ExceptionMessage>
    	<ExceptionDetails>System.Data.SqlClient.SqlException (0x80131904): Violation of PRIMARY KEY constraint 'PK_tbl_SM_Disk_Usage_Trend'. Cannot insert duplicate key in object 'dbo.tbl_SM_Disk_Usage_Trend'. The duplicate key value is (727b4c80-ee5e-41ed-bd18-5b98ad93123d, 6289e98c-8f05-4e2d-b308-073df8e90f65).The statement has been terminated.   at Microsoft.Internal.EnterpriseStorage.Dls.SummaryManager.SummaryManagerMachine.OnStart(ErrorInfo&amp; eInfo)   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.SimpleStateMachine.Start(Message msg)   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.Fsm.Transition.Execute(Message msg)   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.Fsm.Engine.ChangeState(Message msg)   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.TaskInstance.Process(Object dummy)   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.FsmThreadFunction.Function(Object taskThreadContextObj)   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)   at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()   at System.Threading.ThreadPoolWorkQueue.Dispatch()ClientConnectionId:af7601e9-6435-4843-9648-866fb2ca51b8Error Number:2627,State:1,Class:14</ExceptionDetails>
    </FatalServiceError>

    This error causes the DPM service to terminate, which subsequently caused our midnight backups to fail. Any assistance on this issue would be greatly appreciated.

    Server information:

    • O/S: Windows Server 2016 Build 14393.2273
    • SQL version: SQL 2016 SP2
    • DPM version: DPM 1801

    Thank you,

    Mathew


    Monday, June 4, 2018 3:05 PM

All replies

  • Hi!

    Is your DPM server joined in a domain?

    If yes, there might be some registry keys that are incorrectly formatted using UPN syntax.

    The registry keys are found in here:

    HKLM\software\microsoft\microsoft dpm\setup


    Make sure that the registry keys above have the values in the format of domain\username and not username@domain.com.


    Registry Keys:

    • SqlAgentAccoutName 
    • SchedulerJobOwnerName

    See example of how it should look like below:

    Best regards,
    Leon


    Blog: https://thesystemcenterblog.com  LinkedIn:   

    Monday, June 4, 2018 10:36 PM
  • Thank you for your reply, Leon.

    Unfortunately, the two Registry Keys have the correct domain\username information.

    It should be noted that the back up jobs resume at their next run time (01:00), without issue. So, I'm guessing there is a kind of clean up occurring at midnight, which is failing - causing the error.

    There is some kind of referential integrity error when adding/removing entries in: 'dbo.tbl_SM_Disk_Usage_Trend'

    Error 1: Violation of PRIMARY KEY constraint 'PK_tbl_SM_Disk_Usage_Trend'. Cannot insert duplicate key in object 'dbo.tbl_SM_Disk_Usage_Trend'.

    I do not want to modify the SQL database manually, without guidance first. 

    Thank you,

    Mathew

    Monday, June 4, 2018 11:41 PM
  • try moving your SQL server to it own box.  I was having similar issues and thats what the MS tech has me do, might work for you.  
    Monday, June 18, 2018 8:21 PM
  • As mentioned above, you could give it a shot to backup your database and restore it on another server.

    The steps on how to do it can be found in the link below:

    https://docs.microsoft.com/en-us/system-center/dpm/upgrade-dpm?view=sc-dpm-1801


    Blog: https://thesystemcenterblog.com  LinkedIn:   

    Monday, June 18, 2018 11:12 PM
  • Thank you everyone for the advice and assistance!

    I was unable to move the SQL Server instance to another server; however, I did seem to resolve the issue. I had to modify several protection groups - renaming them, and modifying what was protected as well. After I made these modifications, the errors have not returned.

    Take care,

    Mathew

    Monday, June 25, 2018 4:27 PM
  • Hello,

    Last week, we built a new bare-metal server, hosting DPM 1801, with only a handful of protection groups/members. Every night, at midnight, we receive the following error:

    Unable to connect to the DPM database because of a general database failure.  Make sure that SQL Server is running and that it is configured correctly.  
    Problem Details:<FatalServiceError>
    	<__System>
    		<ID>19</ID>
    		<Seq>8765</Seq>
    		<TimeCreated>6/4/2018 7:00:05 AM</TimeCreated>
    		<Source>DpmThreadPool.cs</Source>
    		<Line>163</Line>
    		<HasError>True</HasError>
    	</__System>
    	<ExceptionType>SqlException</ExceptionType>
    	<ExceptionMessage>Violation of PRIMARY KEY constraint 'PK_tbl_SM_Disk_Usage_Trend'. Cannot insert duplicate key in object 'dbo.tbl_SM_Disk_Usage_Trend'. The duplicate key value is (727b4c80-ee5e-41ed-bd18-5b98ad93123d, 6289e98c-8f05-4e2d-b308-073df8e90f65).The statement has been terminated.</ExceptionMessage>
    	<ExceptionDetails>System.Data.SqlClient.SqlException (0x80131904): Violation of PRIMARY KEY constraint 'PK_tbl_SM_Disk_Usage_Trend'. Cannot insert duplicate key in object 'dbo.tbl_SM_Disk_Usage_Trend'. The duplicate key value is (727b4c80-ee5e-41ed-bd18-5b98ad93123d, 6289e98c-8f05-4e2d-b308-073df8e90f65).The statement has been terminated.   at Microsoft.Internal.EnterpriseStorage.Dls.SummaryManager.SummaryManagerMachine.OnStart(ErrorInfo&amp; eInfo)   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.SimpleStateMachine.Start(Message msg)   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.Fsm.Transition.Execute(Message msg)   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.Fsm.Engine.ChangeState(Message msg)   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.TaskInstance.Process(Object dummy)   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.FsmThreadFunction.Function(Object taskThreadContextObj)   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)   at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()   at System.Threading.ThreadPoolWorkQueue.Dispatch()ClientConnectionId:af7601e9-6435-4843-9648-866fb2ca51b8Error Number:2627,State:1,Class:14</ExceptionDetails>
    </FatalServiceError>

    This error causes the DPM service to terminate, which subsequently caused our midnight backups to fail. Any assistance on this issue would be greatly appreciated.

    Server information:

    • O/S: Windows Server 2016 Build 14393.2273
    • SQL version: SQL 2016 SP2
    • DPM version: DPM 1801

    Thank you,

    Mathew


    I have same problem: after upgrade from DPM 2016 to DPM1801 and then to DPM1807. Msdpm service crashing every day at midnight. Usually because of PRIMARY or FOREIGN KEY constraints.

    Tornado

    Thursday, September 13, 2018 6:34 AM
  • Similar thing here - after upgrade to 1801 and then to 1807.

    The INSERT statement conflicted with the FOREIGN KEY constraint "FK_tbl_SM_Disk_Usage_Trend_tbl_SM_Statistics". The conflict occurred in database "DPMDB_SG_DPM155f3e7cf_f177_4105_813e_ac64cdc545f0", table "dbo.tbl_SM_Statistics", column 'SMStatsID'.
    The statement has been terminated.</ExceptionMessage><ExceptionDetails>System.Data.SqlClient.SqlException (0x80131904): The INSERT statement conflicted with the FOREIGN KEY constraint "FK_tbl_SM_Disk_Usage_Trend_tbl_SM_Statistics". 

    Thanks for any hints.

    Markus

    Thursday, September 13, 2018 10:51 AM
  • Same thing here, did an upgrade from DPM2016 to 1801 and 1807, DPM crashing at midnight. We also have a remote sql server setup.

    I'm going to open a call with MS, seems like a bug.

    Marc

    Sunday, September 16, 2018 6:50 PM
  • This does indeed seem like a bug, basically it seems that it's trying to insert a key that already exists in the database, thus complaining about duplication.

    In this matter I believe it's best to create a ticket directly to Microsoft.

    If anyone of you get it solved, it would be very helpful if you could share the solution/workaround to the community!


    Blog: https://thesystemcenterblog.com LinkedIn:

    Sunday, September 16, 2018 6:56 PM
    • I checked jobs running at midnight in SQL Job Activity Monitor:
    • There is one SQL job per protection group,
    • plus two more jobs, which are (i guess) maintanence jobs (prunning, statistics, garbage collection) – run job and check with SQL profiler what it does

    On test server i installed DPM 2016 and there is only one maintanence job with maintanancejobid 1ebebb67-1a20-4e7d-8f54-b79641dc1583 (DPM 2016), after upgrade to 1801 one more maintanence job is added, this one with maintanancejobid 807ef66c-1db0-44cd-af6b-5cc088e15642 (DPM 1801). When they run together MSDPM service crashes because both of this jobs do the same job. In my case problem was in executing stored procedure named prc_SMTE_ComputeDiskUsageTrend.

    I also tried installing DPM 1801 on clean machine and there was only one maintanance job with id 807ef66c-1db0-44cd-af6b-5cc088e15642. 

    Which led me to conclusion: there should be only one maintanence job.

    SQL to get maintanance jobs in DPMDB:

    SELECT [JobDefinitionId]

          ,[ProtectedGroupId]

          ,[Type]

          ,[IsDeleted]

          ,[ContinueOnTaskFailure]

          ,[MaxDuration]

          ,[CreationTime]

          ,[Xml]

          ,[ServerId]

          ,[ServerTimeZone]

          ,[RetryAttemptNumber]

          ,[DatasourceId]

      FROM [DPMDB_DPMTEST].[dbo].[tbl_JM_JobDefinition]

      WHERE [Type] = '282FAAC6-E3CB-4015-8C6D-4276FCCA11D4'

    PLEASE DON'T DO THIS IN PRODUCTION ENVIRONMENT:

    I deleted row with JobDefinitionId 1ebebb67-1a20-4e7d-8f54-b79641dc1583 in table tbl_JM_JobDefinition (and some rows in other tables, because of constraints). Now MSDPM service isn't crashing any more.

    PLEASE DON'T DO THIS until it's confirmed by Microsoft that it is OK.


    Tornado

    Monday, September 17, 2018 11:00 AM
  • Thanks for sharing this Tornado, I will see if I can replicate this issue in one of my labs and try out your workaround. Let's see if Microsoft could give us some answer on this.

    Blog: https://thesystemcenterblog.com LinkedIn:

    Monday, September 17, 2018 3:22 PM
  • Thanks for the information, Tornado!

    Just wanted to add that we are seeing the same 945 error nightly at midnight on our DPM 2016 server that was upgraded from 2016 to 1801 then 1807.

    Current server info:

    • Server 2016
    • SQL 2016 SP2
    • DPM 2016 1807
    I'll be waiting to hear what Microsoft says on this.
    Monday, September 17, 2018 8:50 PM
  • Hi all!

    Many thanks for sharing your results and insights!

    I'd suggest a less intrusive way for testing: Just disable the schedule that fires the job with the JobDefinitionId 1ebebb67...

    SELECT *
      FROM [DPMDB_SG_DPM155f3e7cf_f177_4105_813e_ac64cdc545f0].[dbo].[tbl_SCH_ScheduleDefinition]
      where JobDefinitionId = '1ebebb67-1a20-4e7d-8f54-b79641dc1583'

    On my server, the result is one row with the ScheduleID CB727E56-837D-4B98-8B88-D08AF57C8F0A

    Just find this ID in the Job List of the SQL Server Agent and modify/disable the job. This can easily be reversed. 

    I have not tested this yet, will see tomorrow if it prevents the crash or if the job gets automatically reactivated or recreated by dpm.

    Because of the daily service crash, I have a serious problem not being able to get a complete tape backup for external storage for weeks, so I need a solution urgently. 

    Markus

     


    • Edited by Markus_R. _ Tuesday, September 18, 2018 7:38 AM
    Tuesday, September 18, 2018 7:37 AM
  • I tried disabling SQL job, but it got enabled again somehow :). It's worth a try, maybe it will behave different in your environment.

    BTW: to get the name of the job you need to disable use this SQL:

    SELECT [ScheduleId]
      FROM [DPMDB_DPMTEST].[dbo].[tbl_JM_JobTrail]
      WHERE JobDefinitionId = '1ebebb67-1a20-4e7d-8f54-b79641dc1583'

    ScheduleId is the name of the job in SQL Server Agent\Job Activity Monitor


    Tornado

    Tuesday, September 18, 2018 8:26 AM
  • Hm - it remains disabled here - at least since 2hrs. Maybe changing the execution time could be another option?

    Markus

    Tuesday, September 18, 2018 9:35 AM
  • When i run maintanance job with JobDefinitionId 807ef66c-1db0-44cd-af6b-5cc088e15642, job with JobDefinitionId 1ebebb67-1a20-4e7d-8f54-b79641dc1583 is enabled after few minutes :).

    POSSIBLE SOLUTION:

    PLEASE DON'T DO THIS IN PRODUCTION ENVIRONMENT UNTIL IT IS CONFIRMED AS SOLUTION BY MICROSOFT:

    This disables and then deletes sql job for good: Executing stored procedure prc_SCH_Schedule_DeleteJobDefintionSchedules (parameter: 1ebebb67-1a20-4e7d-8f54-b79641dc1583) and then running the maintanance job with JobDefinitionId 807ef66c-1db0-44cd-af6b-5cc088e15642 twice (first time it disables the sql job and second time it deletes it - wait for 20 minutes between each run)


    USE [DPMDB_NAMEOFYOURDPMDB]
    GO

    DECLARE @return_value int

    EXEC @return_value = [dbo].[prc_SCH_Schedule_DeleteJobDefinitionSchedules]
    @JobDefinitionId = '1ebebb67-1a20-4e7d-8f54-b79641dc1583'

    SELECT 'Return Value' = @return_value

    GO


    Tornado

    Tuesday, September 18, 2018 10:21 AM
  • Quick feedback after 24hrs:

    - DPM Service is still running fine, Tape backup job still active. No service crash. 

    - SQL Server Agent Job for the JobDefinitionID 1ebebb67-1a20-4e7d-8f54-b79641dc1583' is still disabled. 

    Update: Job gets re-enable during the schedules maintenance run at 10:00. :-(

    Markus


    • Edited by Markus_R. _ Wednesday, September 19, 2018 9:44 AM
    Wednesday, September 19, 2018 7:20 AM
  • Just a quick update...

    I disabled the SQL Server Agent job for jobdefinitionid 1ebebb67-1a20-4e7d-8f54-b79641dc1583 and the services did NOT crash. When I checked back in on it the next morning though the maintenance job was re-enabled just as Markus_R experienced. Presumably at midnight the services would crash again so we manually disabled it again. We're looking into automating this until a more permanent fix can be implemented.

    I guess a call to Microsoft will be necessary to resolve this.

    Thursday, September 20, 2018 1:50 PM
  • I followed Tornados procedure to finally remove the job in question. It took more than two attempts to finally delete it by running the maintenance job. Obviously the 20mins in between were too short for my server. At the third run, the service crashed again (!), after that the 1ebe-- job was gone. 

    Since then I see no issues or crashes any more. :-)

    Many thanks to Tornado for your investigations!

    Markus

    PS: By the way, the query for the scheduleID for the Job 1ebb... still delivers about 30 results. (all are CB727E56-837D-4B98-8B88-D08AF57C8F0A, which is gone.). Seems that some references to the job are still there.
    • Edited by Markus_R. _ Friday, September 21, 2018 6:15 AM
    Friday, September 21, 2018 6:06 AM