none
DPM 2012 R2 synchronizations fail after 1hour 20 min. Error Code 60 RRS feed

  • Question

  • Clean install of both Server 2012 R2, DMP 2012 R2, and SQL 2012 SP1

    Everything is patched and current.  

    However, all backup jobs fail after 1 hour and 20 min with: The protection agent on MYDPMSERVER.WTF was temporarily unable to respond because it was in an unexpected state. (ID 60 Details: Internal error code: 0x809909B0)

    This happens regardless of the server OR client i'm attempting to backup.  Backup jobs WILL complete successfully, if the amount of data is being backup'd up can be moved in under 1 hour and 20 min.

    Anyone having a similar issue?  I'm at a total loss.

    Friday, January 10, 2014 7:43 PM

Answers

  • I resolved the issue.

    It required me to set up SQL 2012 SP1 on another piece of hardware (an older Dell 2950).  I installed MS Server 2012 R2 as the OS.  

    Once this was up and running, I reinstalled DPM 2012 R2 on the actual Backup Server Hardware, and had it use the remote SQL server instead of running locally.

    The Local SQL database was NOT on the same array as the backup data and was installed exactly as the setup documentation dictates.  So I'm still not sure why the Database couldn't run locally.  The ONLY thing I can think of: DPM 2012 R2 generates a ton logging errors in C:\Program Files\Microsoft System Center 2012 R2\DPM\DPM\Temp.  See Thread Here

    Maybe the rapid log writes on C: was to much for the SQL install, also on C:  ??  Seems silly, as the system has 64GB of RAM (only 10% was ever in use), and should have been able to handle some disk IO delay?  

    Regardless, It's working now.  Wish I didn't have to start up another piece of hardware though...



    • Marked as answer by Terafloppy Monday, January 27, 2014 7:52 PM
    • Edited by Terafloppy Monday, January 27, 2014 7:56 PM
    Monday, January 27, 2014 7:52 PM

All replies

  • Hi,

    I have not heard of any issue like this before where jobs fail after 90 minutes like clockwork.  The error 0x809909B0 = E_AGENT_WORKITEM_NOT_ACTIVE:  The operation failed because the agent WorkItem does not exist.

    This usually means that the dpmra agent crashed.

    Do you see any .crash logs on the protected server ?

    CD C:\Program Files\Microsoft Data Protection Manager\DPM\Temp
    Dir DPMRA*.Crash


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.


    Friday, January 10, 2014 9:13 PM
    Moderator
  • That is my issue.  Every 90 min the process on the DPM server NOT the target of the backup fails.

    DPMRA.exe never crashes on my file server.  Nothing in the logs.  However, on the DPM server you can see this error: 

    Faulting application name: DPMRA.exe, version: 4.2.1205.0, time stamp: 0x5226e15e
    Faulting module name: KERNELBASE.dll, version: 6.3.9600.16408, time stamp: 0x523d557d
    Exception code: 0x80070057
    Fault offset: 0x000000000000ab78
    Faulting process id: 0x404
    Faulting application start time: 0x01cf106f2f6a99a6
    Faulting application path: C:\Program Files\Microsoft System Center 2012 R2\DPM\DPM\bin\DPMRA.exe
    Faulting module path: C:\Windows\system32\KERNELBASE.dll
    Report Id: ffa692d5-7c6e-11e3-80be-002590d797bd

    Monday, January 13, 2014 5:01 PM
  • Hi,

    We have a known code defect in DPM 2012 R2 that can lead to backups failing with (ID 104 Details: The parameter is incorrect (0x80070057)) 

    I don't know if the crash with the same Exception code: 0x80070057 is the same cause or not.  See the following post and see if it matches.

    http://social.technet.microsoft.com/Forums/en-US/b53b82b1-639b-4561-9ed2-f9749abd84f8/dpm-2012-r2-an-unexpected-error-occurred-while-the-job-was-running-id-104-details-the-parameter?forum=dpmfilebackup


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Monday, January 13, 2014 11:07 PM
    Moderator
  • I resolved the issue.

    It required me to set up SQL 2012 SP1 on another piece of hardware (an older Dell 2950).  I installed MS Server 2012 R2 as the OS.  

    Once this was up and running, I reinstalled DPM 2012 R2 on the actual Backup Server Hardware, and had it use the remote SQL server instead of running locally.

    The Local SQL database was NOT on the same array as the backup data and was installed exactly as the setup documentation dictates.  So I'm still not sure why the Database couldn't run locally.  The ONLY thing I can think of: DPM 2012 R2 generates a ton logging errors in C:\Program Files\Microsoft System Center 2012 R2\DPM\DPM\Temp.  See Thread Here

    Maybe the rapid log writes on C: was to much for the SQL install, also on C:  ??  Seems silly, as the system has 64GB of RAM (only 10% was ever in use), and should have been able to handle some disk IO delay?  

    Regardless, It's working now.  Wish I didn't have to start up another piece of hardware though...



    • Marked as answer by Terafloppy Monday, January 27, 2014 7:52 PM
    • Edited by Terafloppy Monday, January 27, 2014 7:56 PM
    Monday, January 27, 2014 7:52 PM
  • I have DPM 2012 R2 currently running in a lab environment and it's started exhibiting the same fault over the past few weeks. I've uninstalled the agent, re-created the protection group, re-installed the agent to no effect. The error on the target server during synchronisation is:

    Log Name:      Application
    Source:        Application Error
    Date:          13/03/2014 08:08:09
    Event ID:      1000
    Task Category: (100)
    Level:         Error
    Keywords:      Classic
    User:          N/A
    Computer:      [removed]
    Description:
    Faulting application name: DPMRA.exe, version: 4.2.1217.0, time stamp: 0x52de699f
    Faulting module name: ntdll.dll, version: 6.2.9200.16579, time stamp: 0x51637f77
    Exception code: 0xc0000374
    Fault offset: 0x00000000000ebd59
    Faulting process id: 0x34
    Faulting application start time: 0x01cf3e9353cfd959
    Faulting application path: C:\Program Files\Microsoft Data Protection Manager\DPM\bin\DPMRA.exe
    Faulting module path: C:\Windows\SYSTEM32\ntdll.dll
    Report Id: 9d0bd234-aa86-11e3-942c-00241dd64b21
    Faulting package full name:
    Faulting package-relative application ID:

    DPM itself fails with this error:

    Type: Synchronization
    Status: Failed
    Description: The protection agent on [removed] was temporarily unable to respond because it was in an unexpected state. (ID 60 Details: Internal error code: 0x809909B0)
     More information
    End time: 13/03/2014 08:09:00
    Start time: 13/03/2014 08:07:50
    Time elapsed: 00:01:10
    Data transferred: 0 MB
    Cluster node -
    Source details: E:\
    Protection group: Servers

    Most posted as a "for your information" as it is only a lab. Only thing that did prompt me to reply was that the lab is relatively low powered and the comment about performance was interesting.

    Thursday, March 13, 2014 8:28 AM
  • Downgraded to agent 4.2.1205.0 - no difference. I've just looked at the logs and the errors started occurring exactly one month ago on 13th Feb 2014. There were no updates on the target server however there *were* updates on 12th Feb 2014 on the DPM server. Hmm....

    Thursday, March 13, 2014 9:22 AM
  • are you using a local instance of SQL on the same machine running DPM?  If so, i would recommend you host the SQL database on a separate SQL server.  This resolved the issue for me.

    Thursday, March 13, 2014 9:24 PM
  • Yes actually I am - in this lab set-up, DPM 2012 with SP1 was first installed with the local SQL installation option (a version of SQL Express I think). It was then upgraded to R2. I do know that this configuration isn't supported out of the box anymore, i.e. if you install DPM 2010 R2 from scratch, there isn't the option to install a local copy of SQL anymore and you have to use a standalone SQL server or clustered system.

    As this is the lab, I'll try provisioning a separate SQL Server and DPM R2 installation and see what happens. Right now, the lab is pretty much dead anyway. In the newly created protection group, it can't even create the replica...

    I do still feel something has changed though as it had worked fine for several months.

    Thursday, March 13, 2014 9:50 PM
  • I've rebuilt the DPM lab server using Windows 2012 R2, SQL 2012 Standard and DPM 2012 R2 - yes, all on the same virtual server at the moment. Exactly the same problem. I'm going to try again with DPM 2012 SP1 just to remove the target server from the equation.

    My gut feeling on this is that it isn't the fact that SQL is on the same server but I will try separating it out later.

    Friday, March 14, 2014 2:57 PM
  • Bit more information: another system built using DPM 2012 with SP1 with local SQL and same problem.

    As an experiment, I copied the data off the failing disk to another disk (all this is virtual) using robocopy and that disk backs up fine. It wasn't a perfect copy as the original disk is running SQL and IIS so there are locked files, i.e. the copy isn't perfect.

    This may suggest there is a problem with the disk structure itself. I've run chkdsk and it reports all is well (not tried the deep check). I'm sure I read somewhere about bad ACLs causing problems - orphan SIDs maybe? I can't recall where I read that. So I've used setacl to check the disk and it did find a problem with a primary group on one folder (will have to read up on that!) and I've resolved that. So will try adding in the failing drive again.

    Monday, March 17, 2014 9:16 AM
  • My gut instinct is now more and more that there is a fault on the disk system which the DPMRA.EXE/ntdll.dll combo cannot handle, e.g. DPMRA.EXE is passing NTDLL.DLL some bad structure because it's not error checking enough or something like that.
    Monday, March 17, 2014 9:17 AM
  • A little bit more information on this. I went into "divide and conquer" mode to see if it was part of the disk system that was causing the DPMRA.EXE crash. So I went through adding sub-folders one at a time, leaving it for a while inbetween and I got to the point whereby every sub-folder was selected except the entire drive - that worked fine. So I basically have this:

    Ignore the "Replica inconsistent" as that's because I've just switched from "E:\" to sub-folders. I'm pretty much sure it will sort out the replica and backup fine.

    Switch back to just E:\ and DPMRA.EXE will start crashing. So it fails when I try and do the entire drive but works when I select sub-folders.

    Does that help?



    Tuesday, March 25, 2014 3:38 PM
  • We had a similar issue ID 3115 and ID 60 , with DPMRA.exe crashes on the protected servers , today we applied

    http://support.microsoft.com/kb/2934897/EN-US  , and it looks like that took care of our issue.

    Another issue I ran into was I had a Windows 2012 R2 Server with Dedup and volumes over a terabyte each, on some of the volumes I could no longer backup to tape , ID 3311 and ID 2019 an existing connection was forcibly disconnected by a remote host.  To fix this issue I had to make sure the April 2014 Windows 2012 R2 patch was installed, and delete and recreate the replica before I could get tape backups working again.

    Thursday, April 10, 2014 3:05 AM
  • That looks hopeful - will give it a go. Thanks, Rob.
    Thursday, April 10, 2014 8:15 AM
  • Early indications look good! Switched back to backup of entire E: drive (as opposed to lots of individual folders) and the full backups have completed twice so far without complaint.
    Friday, April 11, 2014 9:04 AM
  • Hi all,

    found this via google searching for a the same error message. This could be a permission problem, too.

    In my case i wasn't able to backup the system databases on an sql server 2012 r2 on windows server 2012 r2 with data protection manager 2012 r2 update rollup 7.

    I've added NT-AUTHORITY\SYSTEM as an additional Logon for this instance with 'sysadmin' permission wich fixed this issue.

    Thursday, August 20, 2015 11:10 AM