none
Secondary DPM Servers' Recovery Points Fail for Primary DPM Server's File Volumes RRS feed

  • Question

  • This problem seems quite unusual...

    Any of our secondary DPM servers fail to make recovery points for file volumes from one of our primary DPM servers. Error:

           "Description: The job completed successfully with the following warning:
           The synchronization job failed due to an internal error. (ID 117)"

    Details:

    • Six DPM 2010 RTM servers, three primaries and three secondaries. Each primary is backed up by a secondary.
    • All six servers are configured identically: Windows Server 2008 R2, latest WSUS updates, same hardware, drivers, firmware, etc.
    • Two of the primaries are backed up by their secondaries just fine, no problems
    • One of the secondaries (DPM4) reports the above error every time it tries to make a recovery point for file volumes on its primary (DPM1)

    The problem is with the primary server, DPM1, and not its secondary server, DPM4. This was determined as follows:

    • We added a protected server (SRV1) to a different primary server, DPM2. This worked fine. Recovery points were created successfully for SRV1.
    • We then added SRV1 to DPM4 for secondary protection. This worked fine. DPM4 successfully created recovery points for SRV1 from DPM2.
    • We then deleted SRV1 from DPM2 and DPM4.
    • Next we added SRV1 to primary server DPM1. This worked fine. Recovery points were created successfully for SRV1 (just like when SRV1 was on DPM2).
    • We then added SRV1 to DPM4 for secondary protection. The replica was created successfully, but with the ID 117 warning shown above. DPM4 failed to create recovery points for SRV1.
    • Reboots of DPM1 and DPM4 did not help. Running an extra consistency check for SRV1 on DPM1 and DPM4 did not help.
    • Syncs for SRV1 from DPM1 to DPM4 are always successful, but always with the ID 117 warning, and the recovery points always fail to be created for SRV1 on DPM4.
    • To be sure the problem is not DPM4, we added SRV1 from DPM1 to another secondary server, DPM6. It had the same problem as DPM4. Yet, DPM6 successfully creates recovery points for protected servers on its normal primary server (DPM3).
    • SRV1 is just one of a few dozen protected servers on DPM1 whose recovery points on DPM4 (or any other secondary server) always fail with ID 117.
    • The above problem only applies to recovery points for file volumes. System state protection for servers from DPM1 to DPM4 works fine.
    • The problem persists regardless of protection group configuration.

    So, the problem is definitely specific to DPM1. Yet, what's weird is that DPM1 has zero errors with its own recovery points for protected servers. The errors only show up on the secondary server that protects DPM1.

    Our three primary servers are all configured identically. There are no relevant errors in the event logs on any of the servers involved. The DPM 2010 Error Code Catalog says for ID 117: "Create a valid recovery point on the primary DPM server and rerun the synchronization job" (http://technet.microsoft.com/en-us/library/ff399290.aspx). Yet, the recovery points are valid on DPM1. We've tested restores from its recovery points and they work fine.

    So, what could be the cause of this problem?!? Is there some obscure VSS problem on DPM1? Are there DPM logs that I can examine to dig deeper into this? Any help you can provide would be much appreciated.

    -Taylorbox

    • Moved by Praveen D [MSFT] Monday, July 19, 2010 6:56 AM Moving to DPM Disaster Protection Forum (From:Data Protection Manager)
    Monday, July 12, 2010 8:32 PM

Answers

  • I think this needs to be investiagated by Microsoft Support team, can you please  open a case for this?

    so that we can nail down the issue and unblock your deployment.

    1 more thing i want to know whether you have case sensitivity enabled any where?(SRV1 / DPM1 ?)


    Thanks, NikhilKumar.R [MSFT] - This posting is provided "AS IS" with no warranties, and confers no rights.
    • Marked as answer by Taylorbox Friday, March 11, 2011 3:05 PM
    Thursday, February 10, 2011 7:08 PM

All replies

  • Could someone please comment on this thread? We're still having this issue and any help you could provide would be greatly appreciated...

    -Taylorbox

    Monday, July 26, 2010 6:17 PM
  • Any progress?

    I have simillar error - on secondary DPM 2010 server, the recovery point creation fails - only for file volumes. The  primary DPM server seems OK.

    The event log has one error for every volume:

    Creation of recovery points for C:\ on DPM.xxxxxx.cz have failed. The last recovery point creation failed for the following reason: (ID: 3114)

    Ludek

    Monday, August 9, 2010 5:52 PM
  • The only thing I can think of is that the data on the primary DPM1 might have hit some space limitations. DPM requires 128 MB per 100,000 files being tracked on that NTFS volume of the server you are protecting. So if DPM4 is protecting SRV1 on DPM1 then for every 100,000 files you are protecting you need 128MB of disk space on DPM1.

    The other situation I've seen with similar errors (not identical) is where the VSS snap volume on DPM1 (which requires a minimum of 300MB of available space to run) has a limitation set which is greater than the amount of available disk space on DPM1. So if DPM1 C: has 6GB available free space and the VSS snap volume is set to C: but has a limit of 9GB I've seen a similar error.

    Tuesday, August 10, 2010 11:45 PM
  • I think this needs to be investiagated by Microsoft Support team, can you please  open a case for this?

    so that we can nail down the issue and unblock your deployment.

    1 more thing i want to know whether you have case sensitivity enabled any where?(SRV1 / DPM1 ?)


    Thanks, NikhilKumar.R [MSFT] - This posting is provided "AS IS" with no warranties, and confers no rights.
    • Marked as answer by Taylorbox Friday, March 11, 2011 3:05 PM
    Thursday, February 10, 2011 7:08 PM
  • Wow, this problem has finally been fixed...

    The cause was that file system case sensitivity had been enabled on DPM1, the primary DPM server in question. For who knows why, this caused the secondary DPM servers to fail to make recovery points for protected file volumes from DPM1.

    We had enabled case sensitivity on a file server protected by DPM1 due to some issues with special file types on the file server. Enabling case sensitivity had helped avoid synchronization errors and subsequent inconsistent replicas. DPM cannot sync a server's volumes if that server has case sensitivity enabled unless case sensitivity is also enabled on the DPM server. So, we had enabled it on DPM1 too.

    We've now disabled case sensitivity DPM1 and the file server. To do so, we set this registry key back to 1 (which is the default setting):
    HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\kernel, DWORD obcaseinsensitive

    Recovery points on the secondary DPM server for the protected file volumes from DPM1 now work properly.

    References: http://technet.microsoft.com/en-us/library/cc725747.aspx and http://support.microsoft.com/kb/929110

    Thanks to Nikhil Kumar for asking about case sensitivity, which sparked the thought to check into that on our DPM server.

    -Taylorbox

    Friday, March 11, 2011 3:05 PM