none
Assistance with MSDPM error 945

    Frage

  • Hello,

    Last week, we built a new bare-metal server, hosting DPM 1801, with only a handful of protection groups/members. Every night, at midnight, we receive the following error:

    Unable to connect to the DPM database because of a general database failure.  Make sure that SQL Server is running and that it is configured correctly.  
    Problem Details:<FatalServiceError>
    	<__System>
    		<ID>19</ID>
    		<Seq>8765</Seq>
    		<TimeCreated>6/4/2018 7:00:05 AM</TimeCreated>
    		<Source>DpmThreadPool.cs</Source>
    		<Line>163</Line>
    		<HasError>True</HasError>
    	</__System>
    	<ExceptionType>SqlException</ExceptionType>
    	<ExceptionMessage>Violation of PRIMARY KEY constraint 'PK_tbl_SM_Disk_Usage_Trend'. Cannot insert duplicate key in object 'dbo.tbl_SM_Disk_Usage_Trend'. The duplicate key value is (727b4c80-ee5e-41ed-bd18-5b98ad93123d, 6289e98c-8f05-4e2d-b308-073df8e90f65).The statement has been terminated.</ExceptionMessage>
    	<ExceptionDetails>System.Data.SqlClient.SqlException (0x80131904): Violation of PRIMARY KEY constraint 'PK_tbl_SM_Disk_Usage_Trend'. Cannot insert duplicate key in object 'dbo.tbl_SM_Disk_Usage_Trend'. The duplicate key value is (727b4c80-ee5e-41ed-bd18-5b98ad93123d, 6289e98c-8f05-4e2d-b308-073df8e90f65).The statement has been terminated.   at Microsoft.Internal.EnterpriseStorage.Dls.SummaryManager.SummaryManagerMachine.OnStart(ErrorInfo&amp; eInfo)   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.SimpleStateMachine.Start(Message msg)   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.Fsm.Transition.Execute(Message msg)   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.Fsm.Engine.ChangeState(Message msg)   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.TaskInstance.Process(Object dummy)   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.FsmThreadFunction.Function(Object taskThreadContextObj)   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)   at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()   at System.Threading.ThreadPoolWorkQueue.Dispatch()ClientConnectionId:af7601e9-6435-4843-9648-866fb2ca51b8Error Number:2627,State:1,Class:14</ExceptionDetails>
    </FatalServiceError>

    This error causes the DPM service to terminate, which subsequently caused our midnight backups to fail. Any assistance on this issue would be greatly appreciated.

    Server information:

    • O/S: Windows Server 2016 Build 14393.2273
    • SQL version: SQL 2016 SP2
    • DPM version: DPM 1801

    Thank you,

    Mathew


    Montag, 4. Juni 2018 15:05

Alle Antworten

  • Hi!

    Is your DPM server joined in a domain?

    If yes, there might be some registry keys that are incorrectly formatted using UPN syntax.

    The registry keys are found in here:

    HKLM\software\microsoft\microsoft dpm\setup


    Make sure that the registry keys above have the values in the format of domain\username and not username@domain.com.


    Registry Keys:

    • SqlAgentAccoutName 
    • SchedulerJobOwnerName

    See example of how it should look like below:

    Best regards,
    Leon


    Blog: https://thesystemcenterblog.com  LinkedIn:   

    Montag, 4. Juni 2018 22:36
  • Thank you for your reply, Leon.

    Unfortunately, the two Registry Keys have the correct domain\username information.

    It should be noted that the back up jobs resume at their next run time (01:00), without issue. So, I'm guessing there is a kind of clean up occurring at midnight, which is failing - causing the error.

    There is some kind of referential integrity error when adding/removing entries in: 'dbo.tbl_SM_Disk_Usage_Trend'

    Error 1: Violation of PRIMARY KEY constraint 'PK_tbl_SM_Disk_Usage_Trend'. Cannot insert duplicate key in object 'dbo.tbl_SM_Disk_Usage_Trend'.

    I do not want to modify the SQL database manually, without guidance first. 

    Thank you,

    Mathew

    Montag, 4. Juni 2018 23:41
  • try moving your SQL server to it own box.  I was having similar issues and thats what the MS tech has me do, might work for you.  
    Montag, 18. Juni 2018 20:21
  • As mentioned above, you could give it a shot to backup your database and restore it on another server.

    The steps on how to do it can be found in the link below:

    https://docs.microsoft.com/en-us/system-center/dpm/upgrade-dpm?view=sc-dpm-1801


    Blog: https://thesystemcenterblog.com  LinkedIn:   

    Montag, 18. Juni 2018 23:12
  • Thank you everyone for the advice and assistance!

    I was unable to move the SQL Server instance to another server; however, I did seem to resolve the issue. I had to modify several protection groups - renaming them, and modifying what was protected as well. After I made these modifications, the errors have not returned.

    Take care,

    Mathew

    Montag, 25. Juni 2018 16:27
  • Hello,

    Last week, we built a new bare-metal server, hosting DPM 1801, with only a handful of protection groups/members. Every night, at midnight, we receive the following error:

    Unable to connect to the DPM database because of a general database failure.  Make sure that SQL Server is running and that it is configured correctly.  
    Problem Details:<FatalServiceError>
    	<__System>
    		<ID>19</ID>
    		<Seq>8765</Seq>
    		<TimeCreated>6/4/2018 7:00:05 AM</TimeCreated>
    		<Source>DpmThreadPool.cs</Source>
    		<Line>163</Line>
    		<HasError>True</HasError>
    	</__System>
    	<ExceptionType>SqlException</ExceptionType>
    	<ExceptionMessage>Violation of PRIMARY KEY constraint 'PK_tbl_SM_Disk_Usage_Trend'. Cannot insert duplicate key in object 'dbo.tbl_SM_Disk_Usage_Trend'. The duplicate key value is (727b4c80-ee5e-41ed-bd18-5b98ad93123d, 6289e98c-8f05-4e2d-b308-073df8e90f65).The statement has been terminated.</ExceptionMessage>
    	<ExceptionDetails>System.Data.SqlClient.SqlException (0x80131904): Violation of PRIMARY KEY constraint 'PK_tbl_SM_Disk_Usage_Trend'. Cannot insert duplicate key in object 'dbo.tbl_SM_Disk_Usage_Trend'. The duplicate key value is (727b4c80-ee5e-41ed-bd18-5b98ad93123d, 6289e98c-8f05-4e2d-b308-073df8e90f65).The statement has been terminated.   at Microsoft.Internal.EnterpriseStorage.Dls.SummaryManager.SummaryManagerMachine.OnStart(ErrorInfo&amp; eInfo)   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.SimpleStateMachine.Start(Message msg)   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.Fsm.Transition.Execute(Message msg)   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.Fsm.Engine.ChangeState(Message msg)   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.TaskInstance.Process(Object dummy)   at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.FsmThreadFunction.Function(Object taskThreadContextObj)   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)   at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()   at System.Threading.ThreadPoolWorkQueue.Dispatch()ClientConnectionId:af7601e9-6435-4843-9648-866fb2ca51b8Error Number:2627,State:1,Class:14</ExceptionDetails>
    </FatalServiceError>

    This error causes the DPM service to terminate, which subsequently caused our midnight backups to fail. Any assistance on this issue would be greatly appreciated.

    Server information:

    • O/S: Windows Server 2016 Build 14393.2273
    • SQL version: SQL 2016 SP2
    • DPM version: DPM 1801

    Thank you,

    Mathew


    I have same problem: after upgrade from DPM 2016 to DPM1801 and then to DPM1807. Msdpm service crashing every day at midnight. Usually because of PRIMARY or FOREIGN KEY constraints.

    Tornado

    Donnerstag, 13. September 2018 06:34
  • Similar thing here - after upgrade to 1801 and then to 1807.

    The INSERT statement conflicted with the FOREIGN KEY constraint "FK_tbl_SM_Disk_Usage_Trend_tbl_SM_Statistics". The conflict occurred in database "DPMDB_SG_DPM155f3e7cf_f177_4105_813e_ac64cdc545f0", table "dbo.tbl_SM_Statistics", column 'SMStatsID'.
    The statement has been terminated.</ExceptionMessage><ExceptionDetails>System.Data.SqlClient.SqlException (0x80131904): The INSERT statement conflicted with the FOREIGN KEY constraint "FK_tbl_SM_Disk_Usage_Trend_tbl_SM_Statistics". 

    Thanks for any hints.

    Markus

    Donnerstag, 13. September 2018 10:51
  • Same thing here, did an upgrade from DPM2016 to 1801 and 1807, DPM crashing at midnight. We also have a remote sql server setup.

    I'm going to open a call with MS, seems like a bug.

    Marc

    Sonntag, 16. September 2018 18:50
  • This does indeed seem like a bug, basically it seems that it's trying to insert a key that already exists in the database, thus complaining about duplication.

    In this matter I believe it's best to create a ticket directly to Microsoft.

    If anyone of you get it solved, it would be very helpful if you could share the solution/workaround to the community!


    Blog: https://thesystemcenterblog.com LinkedIn:

    Sonntag, 16. September 2018 18:56
    • I checked jobs running at midnight in SQL Job Activity Monitor:
    • There is one SQL job per protection group,
    • plus two more jobs, which are (i guess) maintanence jobs (prunning, statistics, garbage collection) – run job and check with SQL profiler what it does

    On test server i installed DPM 2016 and there is only one maintanence job with maintanancejobid 1ebebb67-1a20-4e7d-8f54-b79641dc1583 (DPM 2016), after upgrade to 1801 one more maintanence job is added, this one with maintanancejobid 807ef66c-1db0-44cd-af6b-5cc088e15642 (DPM 1801). When they run together MSDPM service crashes because both of this jobs do the same job. In my case problem was in executing stored procedure named prc_SMTE_ComputeDiskUsageTrend.

    I also tried installing DPM 1801 on clean machine and there was only one maintanance job with id 807ef66c-1db0-44cd-af6b-5cc088e15642. 

    Which led me to conclusion: there should be only one maintanence job.

    SQL to get maintanance jobs in DPMDB:

    SELECT [JobDefinitionId]

          ,[ProtectedGroupId]

          ,[Type]

          ,[IsDeleted]

          ,[ContinueOnTaskFailure]

          ,[MaxDuration]

          ,[CreationTime]

          ,[Xml]

          ,[ServerId]

          ,[ServerTimeZone]

          ,[RetryAttemptNumber]

          ,[DatasourceId]

      FROM [DPMDB_DPMTEST].[dbo].[tbl_JM_JobDefinition]

      WHERE [Type] = '282FAAC6-E3CB-4015-8C6D-4276FCCA11D4'

    PLEASE DON'T DO THIS IN PRODUCTION ENVIRONMENT:

    I deleted row with JobDefinitionId 1ebebb67-1a20-4e7d-8f54-b79641dc1583 in table tbl_JM_JobDefinition (and some rows in other tables, because of constraints). Now MSDPM service isn't crashing any more.

    PLEASE DON'T DO THIS until it's confirmed by Microsoft that it is OK.


    Tornado

    • Als Antwort vorgeschlagen Joerg Ott Dienstag, 2. Oktober 2018 14:21
    Montag, 17. September 2018 11:00
  • Thanks for sharing this Tornado, I will see if I can replicate this issue in one of my labs and try out your workaround. Let's see if Microsoft could give us some answer on this.

    Blog: https://thesystemcenterblog.com LinkedIn:

    Montag, 17. September 2018 15:22
  • Thanks for the information, Tornado!

    Just wanted to add that we are seeing the same 945 error nightly at midnight on our DPM 2016 server that was upgraded from 2016 to 1801 then 1807.

    Current server info:

    • Server 2016
    • SQL 2016 SP2
    • DPM 2016 1807
    I'll be waiting to hear what Microsoft says on this.
    Montag, 17. September 2018 20:50
  • Hi all!

    Many thanks for sharing your results and insights!

    I'd suggest a less intrusive way for testing: Just disable the schedule that fires the job with the JobDefinitionId 1ebebb67...

    SELECT *
      FROM [DPMDB_SG_DPM155f3e7cf_f177_4105_813e_ac64cdc545f0].[dbo].[tbl_SCH_ScheduleDefinition]
      where JobDefinitionId = '1ebebb67-1a20-4e7d-8f54-b79641dc1583'

    On my server, the result is one row with the ScheduleID CB727E56-837D-4B98-8B88-D08AF57C8F0A

    Just find this ID in the Job List of the SQL Server Agent and modify/disable the job. This can easily be reversed. 

    I have not tested this yet, will see tomorrow if it prevents the crash or if the job gets automatically reactivated or recreated by dpm.

    Because of the daily service crash, I have a serious problem not being able to get a complete tape backup for external storage for weeks, so I need a solution urgently. 

    Markus

     


    • Bearbeitet Markus_R. _ Dienstag, 18. September 2018 07:38
    Dienstag, 18. September 2018 07:37
  • I tried disabling SQL job, but it got enabled again somehow :). It's worth a try, maybe it will behave different in your environment.

    BTW: to get the name of the job you need to disable use this SQL:

    SELECT [ScheduleId]
      FROM [DPMDB_DPMTEST].[dbo].[tbl_JM_JobTrail]
      WHERE JobDefinitionId = '1ebebb67-1a20-4e7d-8f54-b79641dc1583'

    ScheduleId is the name of the job in SQL Server Agent\Job Activity Monitor


    Tornado

    Dienstag, 18. September 2018 08:26
  • Hm - it remains disabled here - at least since 2hrs. Maybe changing the execution time could be another option?

    Markus

    Dienstag, 18. September 2018 09:35
  • When i run maintanance job with JobDefinitionId 807ef66c-1db0-44cd-af6b-5cc088e15642, job with JobDefinitionId 1ebebb67-1a20-4e7d-8f54-b79641dc1583 is enabled after few minutes :).

    POSSIBLE SOLUTION:

    PLEASE DON'T DO THIS IN PRODUCTION ENVIRONMENT UNTIL IT IS CONFIRMED AS SOLUTION BY MICROSOFT:

    This disables and then deletes sql job for good: Executing stored procedure prc_SCH_Schedule_DeleteJobDefintionSchedules (parameter: 1ebebb67-1a20-4e7d-8f54-b79641dc1583) and then running the maintanance job with JobDefinitionId 807ef66c-1db0-44cd-af6b-5cc088e15642 twice (first time it disables the sql job and second time it deletes it - wait for 20 minutes between each run)


    USE [DPMDB_NAMEOFYOURDPMDB]
    GO

    DECLARE @return_value int

    EXEC @return_value = [dbo].[prc_SCH_Schedule_DeleteJobDefinitionSchedules]
    @JobDefinitionId = '1ebebb67-1a20-4e7d-8f54-b79641dc1583'

    SELECT 'Return Value' = @return_value

    GO


    Tornado

    Dienstag, 18. September 2018 10:21
  • Quick feedback after 24hrs:

    - DPM Service is still running fine, Tape backup job still active. No service crash. 

    - SQL Server Agent Job for the JobDefinitionID 1ebebb67-1a20-4e7d-8f54-b79641dc1583' is still disabled. 

    Update: Job gets re-enable during the schedules maintenance run at 10:00. :-(

    Markus


    • Bearbeitet Markus_R. _ Mittwoch, 19. September 2018 09:44
    Mittwoch, 19. September 2018 07:20
  • Just a quick update...

    I disabled the SQL Server Agent job for jobdefinitionid 1ebebb67-1a20-4e7d-8f54-b79641dc1583 and the services did NOT crash. When I checked back in on it the next morning though the maintenance job was re-enabled just as Markus_R experienced. Presumably at midnight the services would crash again so we manually disabled it again. We're looking into automating this until a more permanent fix can be implemented.

    I guess a call to Microsoft will be necessary to resolve this.

    Donnerstag, 20. September 2018 13:50
  • I followed Tornados procedure to finally remove the job in question. It took more than two attempts to finally delete it by running the maintenance job. Obviously the 20mins in between were too short for my server. At the third run, the service crashed again (!), after that the 1ebe-- job was gone. 

    Since then I see no issues or crashes any more. :-)

    Many thanks to Tornado for your investigations!

    Markus

    PS: By the way, the query for the scheduleID for the Job 1ebb... still delivers about 30 results. (all are CB727E56-837D-4B98-8B88-D08AF57C8F0A, which is gone.). Seems that some references to the job are still there.
    • Bearbeitet Markus_R. _ Freitag, 21. September 2018 06:15
    Freitag, 21. September 2018 06:06
  • Gentlemen, thank you, this is a nice one (again!).

    Already was in doubt about myself and was thinking back and forth where I made a mistake during migration... turns out, I did not. What a relief. ;-)

    Migrated from 2016 to 1801 to 1807, encountering the same issue. Now removed the schedule from the maintenance job and disabled it. Will see tomorrow if that helped...

    @Microsoft: Come on guys, get this f***ing backup software running again. Took months to remediate the ReFS bug(s), now here comes the next shot in the foot. Already curious what will happen next... :-\

    Dienstag, 2. Oktober 2018 14:16
  • I opened a call on september 17th for this issue and there is no solution yet. The only mails I receive have this in their body:

    Hope you have a great day, blablabla (I would if this problem gets fixed because now every "great" day I have to connect to my customers dpm servers and disable this job)

    and then this:

    In case there are no updates, the next contact date will be on the 25<sup>th</sup> September

    In case there are no updates, the next contact date will be on the 28<sup>th</sup> September.

    In case there are no updates, the next contact date will be on the 03<sup>th</sup> October.

    In case there are no updates, the next contact date will be on the 04<sup>th</sup> October.

    In case there are no updates, the next contact date will be on the 10<sup>th</sup> October

    and finally this

    In case there are no updates, the next contact date will be on the 11<sup>th</sup> OctoberIt makes me wonder if Microsoft still cares about their on-prem install base.

    Montag, 8. Oktober 2018 08:52
  • I'm sorry to hear that this issue has still not been resolved, if a ticket to Microsoft isn't helping either you could try to create a feedback/bug report on the link below and everyone who is having this issue could go there and vote.

    https://feedback.azure.com/forums/258995-azure-backup-and-scdpm


    Blog: https://thesystemcenterblog.com LinkedIn:

    Montag, 8. Oktober 2018 08:58
  • Hi

    To disable duplicated Garbage Collection job you should first update the field "IsDeleted" from TBL_SCH_ScheduleDefinition and then disable the ScheduleID(which will be the SQL Agent Job).

    Example - Please do this as your own risk:

    /* Checking  Garbage Collection */

     

    select TaskID, TaskDefinitionID, JobID, LastStateName, StartedDateTime, StoppedDateTime

    from tbl_TE_TaskTrail

    where VerbID = '282faac6-e3cb-4015-8c6d-4276fcca11d4'

    order by startedDateTime

     

     

    /* This shows us two jobs running at same time when it should be one. 

     

    TaskID                                  TaskDefinitionID                           JobID                                   LastStateName StartedDateTime          StoppedDateTime

    1BE9DBB1-0EAE-485A-8127-9A2B1028F4E3    E5F56C04-CB8E-4F3C-8792-D3D7BDD72DF7       86A69426-AEB3-4391-87BE-B278923B025C    Failure       2018-09-09 22:00:01.560  2018-09-09 22:07:07.370

    4A9C008E-1024-492A-A641-29B0B0CB7829    3D8E0A11-1B79-4378-B1C1-9F3EFBC3C477       3B6E15EE-9D76-40C0-BD05-9BD39DEA5F3C    Failure       2018-09-09 22:00:01.577  2018-09-09 22:07:07.357

     

    ----------------------------------------------------------------------------

     

    /* Get the JobDefinitionId for Garbage Collection */

     

    select JobDefinitionId from tbl_JM_JobDefinition where Type = '282faac6-e3cb-4015-8c6d-4276fcca11d4'

     

    /* JobDefinitionId

       807EF66C-1DB0-44CD-AF6B-5CC088E15642

       1EBEBB67-1A20-4E7D-8F54-B79641DC1583

       9B30D213-B836-4B9E-97C2-DB03C3EB39D7 */

     

    ----------------------------------------------------------------------------

     

    /* Get the ScheduleID where the JobDefinitionId is not 0. The 0 is false. For 1 is true. Meaning if we get different values from 0 we should have duplicated jobs as reflected on the SQL Query for Checking the Garbage Collection */

     

    select ScheduleId from tbl_SCH_ScheduleDefinition where JobDefinitionId in (

    '807EF66C-1DB0-44CD-AF6B-5CC088E15642',

    '1EBEBB67-1A20-4E7D-8F54-B79641DC1583',

    '9B30D213-B836-4B9E-97C2-DB03C3EB39D7') and IsDeleted <> 1

     

    /* ScheduleID

       BE69EE80-B593-4FEC-A354-605B8232D2AD

       F87C4438-622F-49CA-B767-E07785690B72 */

     

    ----------------------------------------------------------------------------

      

    /* Disable the oldest JobDefinitionID from tbl_SCH_ScheduleDefinition */

     

    UPDATE tbl_SCH_ScheduleDefinition

    SET IsDeleted = 1

    WHERE JobDefinitionId LIKE '1EBEBB67-1A20-4E7D-8F54-B79641DC1583'

     

    ----------------------------------------------------------------------------

     

    /* Disable the ScheduleID from SQL Agent */

     

    The ScheduleId is the SQL Jobs under SQL Agent:

     

    UPDATE MSDB.dbo.sysjobs

    SET Enabled = 0

    WHERE [Name] LIKE 'EC4DDFAD-A450-4BD7-B543-DC66DECE5BAA'

     

    Note: To enable change the value 0 to 1 on the “SET Enable”.

    • Als Antwort vorgeschlagen Tome Lopes Sonntag, 21. Oktober 2018 11:23
    Sonntag, 21. Oktober 2018 11:23
  • Hi all,

    It seems that the problem on my dpm server is resolved by MS Support, they actually used the same procedure as Tome Lopes posted. If anyone has the problem and want to go through MS Support to resolve this issue on production servers, you can refer to case number 118091719025441 

    The issue is not considered a bug(???), so be aware that it's possible you will get charged for the call.

    Best regards,

    Marc

    Freitag, 26. Oktober 2018 09:53