none
Protection of random databases fails every time DPM runs RRS feed

  • Question

  • Hey all,

    I'm using DPM 2012 R2 with UR5 backing up a SharePoint 2013 farm. I have the DPM agent installed on one web server in the farm (there are 2 web servers, 3 app servers) and the SQL backend. When I created the protection group, DPM was able to take the first backup without issues. However, every night, we get alerts from SCOM about how a backup for a particular database failed on the SQL box. Sometimes when I check DPM, I do see the replica is inconsistent, but sometimes it's just fine.

    Lately, however, we are having far more critical issues. Almost every night, DPM is saying the replica is inconsistent and the reasons why is this:

    Change Tracking has been marked inconsistent due to one of the following reasons
    1. Unexpected shutdown of the protected server
    2. Unforeseen issue in DPM Bitmap failover during cluster failover of one or more datasources sharing the tracked volume. (ID 30501 Details: Unknown error (0xe0062040) (0xE0062040))

    I verified neither of the servers is crashing. Not sure what to check for #2. And on the SQL server, we see many VSS timeouts, errors about shadowcopy databases left mounted, write and flush timeouts on the data volume, errors like:

    BackupVirtualDeviceFile::SendFileInfoBegin:  failure on backup device '{97BCAB2B-4637-441D-B686-A39206584141}1'. Operating system error 995(The I/O operation has been aborted because of either a thread exit or an application request.)

    Volume Shadow Copy Service error: The I/O writes cannot be held during the shadow copy creation period on volume \\?\Volume{a4828730-fb21-4b94-893a-9b62c2cfb3e7}\. The volume index in the shadow copy set is 0. Error details: Open[0x00000000, The operation completed successfully.

    ], Flush[0x00000000, The operation completed successfully.

    ], Release[0x80042314, The shadow copy provider timed out while holding writes to the volume being shadow copied. This is probably due to excessive activity on the volume by an application or a system service. Try again later when activity on the volume is reduced.

    ], OnRun[0x00000000, The operation completed successfully.

    ].

     

    Operation:

       Executing Asynchronous Operation

     

    Context:

       Current State: DoSnapshotSet

     

    Volume Shadow Copy Service error: The shadow copy could not be committed - operation timed out. Error context: DeviceIoControl(\\?\Volume{a4828730-fb21-4b94-893a-9b62c2cfb3e7} - 0000000000000344,0x0053c010,000000F33CBF1E00,0,000000F33CBF3E20,4096,[0]).

     Operation:

       Committing shadow copies

     Context:

       Execution Context: System Provider

     

    So I'm trying to figure out if I'm looking at disk performance issues here or if it could be something else causing the failures? I have our storage team checking disk activity for when DPM runs but I figure I'd also reach out here and see what others think.

    Thanks in advance,

    Aaron

    Friday, May 22, 2015 1:29 PM

All replies

  • Hi Aaron,

    I'm trying to discuss with a related team about this issue. Will post back when I got any update. 


    Please remember to mark the replies as answers if they help and un-mark them if they provide no help. If you have feedback for TechNet Support, contact tnmff@microsoft.com.

    Monday, May 25, 2015 11:07 AM
    Moderator
  • I opened a case for this. No results yet, mostly due to having limited time to work on it but we've involved the SQL support guys as well.
    Friday, July 24, 2015 1:58 PM
  • Hello, did you get any solution to this? we're facing the same issue.
    Monday, June 27, 2016 5:43 AM