none
Online Recovery Point creation errors - DCOM errors RRS feed

  • Question

  • Hello,

    I suddenly started getting a bunch of "Online recovery point creation failed" errors.  When I went to the Monitoring screen, several protected servers seem to have simply vanished.  When I try to re-add them in such a way that they talk to the existing DPM agent (or try to install a new agent) on those servers, I am met with failure.

    Further research seems to suggest this might be a DCOM error.  If I try to refresh the DPM console to communicate with the agent on the target server, the System log on the target server shows an Event ID 10027: "The machine wide limit settings do not grant Remote Activation permission for COM Server applications to the user DOMAIN\DPMSERVER$ SID (S-1-5-21-1670625886-1014624808-1798593698-2806) from address 10.200.100.89. This security permission can be modified using the Component Services administrative tool."

    Googling this information leads to many confusing pages that don't adequately explain what's happening, and provide fragmented solutions which skip steps.  I'm assuming that a recent Windows update hosed something, but I don't know how to proceed.  This is affecting my Windows Server 2012 R2 machines (but not all of them) and also a Windows Server 2007 machine.

    Any advice would be greatly appreciated.

    Thanks,

    Zack Hamilton

    Wednesday, June 26, 2019 8:49 PM

Answers

  • Thank you for the log, could you provide a screenshot of the errors and the agent's status in the Management tab?

    Are you running the latest update rollup for DPM 2012 R2?

    Do you recall any changes that have been done recently? A few days/hours before or during the same day these errors started to appear?

    Have you tried running the command below on a protected computer to see if it helps?

    SetDpmServer.exe -dpmServerName <DPMServerName>


    Blog: https://thesystemcenterblog.com LinkedIn:

    Thursday, June 27, 2019 5:25 PM
  • I concur DPM is indeed a bit different, but a lot has changed since DPM 2012 R2, I've also used all of the ones you've mentioned above.

    When protecting a cluster, you need to make sure to install the DPM agent on all cluster nodes, when you want to add workloads to a new/existing protection group, you should select the cluster resource name instead of the cluster nodes.

    What errors are you now receiving from your cluster nodes: vdovcetq200 and vdovcetq300?

    Could you also tell me how it was set up from the beginning? (I mean the cluster workloads)


    Blog: https://thesystemcenterblog.com LinkedIn:

    Monday, July 1, 2019 2:49 PM
  • I tried following the directions here:

    https://social.technet.microsoft.com/Forums/en-US/6ce64707-e1d6-47d0-bd67-68b52f9a04f7/replica-is-inconsistent-corrupt-folder-on-replica-volume?forum=dataprotectionmanager

    The chkdsk ran successfully and looks like it cleaned up a few minor things, but I'm still getting "Replica is inconsistent".

    Tuesday, July 2, 2019 6:20 PM
  • Okay, you should never let any checkpoints linger around, I would recommend deleting it (from the Hyper-V Manager), as this can also cause issues for your backups.

    The disk appears to be a CSV (Cluster Shared Volume), please note that DPM does support this, however DPM only supports protecting Hyper-V workloads stored on CSV disks, so if you have normal file shares/folders, that won't be supported and will fail.



    Blog: https://thesystemcenterblog.com LinkedIn:

    Wednesday, July 3, 2019 2:09 PM
  • Leon,

    It's all fixed now.  I'm going to mark this as answer as well as some of your posts above.

    You'll recall that earlier you had me remove everything from the TEMP folder.  Apparently that caused some issues which were fixed by creating an MTA subfolder as outlined here (it's a short read):

    http://backupexec-hell.blogspot.com/2016/12/dpm-replica-is-inconsistent-error-3106.html

    I also ran a chkdsk on the affected volume, and it ran for quite some time (unlike the previous one).

    As for which of the other things helped, I'm not sure which ones made the difference or not.  Thanks for the help, though.  It's been a learning experience!

    Zack

    Monday, July 8, 2019 12:15 PM

All replies

  • Hello,

    Please investigate the DPM logs for any more clues, you'll find them over here:

    DPM 2016 log location:

    • %ProgramFiles%\Microsoft System Center 2016\DPM\DPM\Temp\MSDPMCurr.errlog

    DPM 2019 log location:

    • %ProgramFiles%\Microsoft System Center\DPM\DPM\Temp\MSDMCurr.errlog

    Best regards,
    Leon


    Blog: https://thesystemcenterblog.com LinkedIn:

    Thursday, June 27, 2019 4:53 AM
  • I'm running DPM 2012.  It clearly says it can't reach the agent, but the other information is more cryptic.

    It looks like it tried to redeploy the agent in the wee hours of the morning, but failed:

    WARNING Failed: Hr: = [0x80070005] CCommandProcessor::CCIE failed on server server.domain.net, mqi.hr = 0x80070005, No further retry

    WARNING CCommandProcessor::SendOutboundCommand this:[0000000019F871A0], ServerName: server.domain.net

    There are other lines, but it's so cryptic I'm not even sure if they pertain to this server or this issue.

    I've rebooted both the DPM server and one of the target servers having issues, but that didn't fix anything.  Both servers are in the same subnet and VLAN and are able to talk to each other.  Nothing has changed with our network infrastructure or configuration.

    Thanks,

    Zack

    Thursday, June 27, 2019 2:00 PM
  • So your agent's status within the Management tab in the DPM console are showing errors?

    What are the workloads you're trying to protect? Server operating system, function etc...

    Please do as follows:

    1. Move out all the logs from %ProgramFiles%\Microsoft System Center 2012 R2\DPM\DPM\Temp.

    2. Try performing an online recovery point within the DPM console and wait for the error to appear.

    3. A new log file should be created and should now have limited information which makes it easier to read the log.

    4. Upload the MSDPMCurr.errlog to OneDrive and share the link here.


    Blog: https://thesystemcenterblog.com LinkedIn:

    Thursday, June 27, 2019 2:10 PM
  • There are different workloads depending on which server you're talking about.  Some file shares, some SQL databases, etc.

    I wasn't able to evacuate the folder because DPM has the files open.  I got a couple of them moved after shutting down some services, but I can't get the file lock on MSDPMCurr.errlog to release.  I copied the whole thing here:

    https://thorneresearch-my.sharepoint.com/:u:/g/personal/zhamilton_thorne_com/ESPcqx4iEp1FjBMGpTlimocB3AtNJRaleYsMWr8t3UxIgA?e=RbA4A0

    Thanks,

    Zack

    Thursday, June 27, 2019 3:43 PM
  • Thank you for the log, could you provide a screenshot of the errors and the agent's status in the Management tab?

    Are you running the latest update rollup for DPM 2012 R2?

    Do you recall any changes that have been done recently? A few days/hours before or during the same day these errors started to appear?

    Have you tried running the command below on a protected computer to see if it helps?

    SetDpmServer.exe -dpmServerName <DPMServerName>


    Blog: https://thesystemcenterblog.com LinkedIn:

    Thursday, June 27, 2019 5:25 PM
  • Partly successful!  I ran this on one of the servers and got a positive result back.  I then refreshed that computer in DPM Management and it shows as OK.  It also now shows if I try to modify the protection group, which it didn't before.

    However, DPM doesn't seem to be enumerating everything for me to add back.  I had removed this server from the protection group a couple days ago and couldn't add it back.  Now at least it shows up, but I can't get the right folders enumerated.  They are technically on the E: drive, which doesn't show up at all, and they didn't previously enumerate that way anyway.  I added a Before and After screenshot to OneDrive, and I also created a folder that I shared for all the files:

    https://thorneresearch-my.sharepoint.com/:f:/g/personal/zhamilton_thorne_com/ElXl_bc9ZtxGm9Z5GXtslJIBVG5I3d8fdMZFYRuukEBZ5A?e=RDU8mE

    Thanks!

    Zack

    Thursday, June 27, 2019 7:25 PM
  • Thanks for the screenshots, did you try to refresh the datasources?


    Blog: https://thesystemcenterblog.com LinkedIn:

    Thursday, June 27, 2019 7:46 PM
  • I get this:

    DPM is unable to enumerate contents in \\?\Volume{1307ecf2-7cd2-11e5-80d4-00155d697537}\ on the protected computer vdovcetq300.thorneresearch.net. Recycle Bin, System Volume Information folder, non-NTFS volumes, DFS links, CDs, Quorum Disk (for cluster) and other removable media cannot be protected. (ID: 38)

    Thursday, June 27, 2019 7:52 PM
  • This can occur if the data source is already protected in another protection group, then it cannot be included in this protection group.

    If you click close, it should still list shares, volumes, and system protection under volumes, is the volume drive letter in question listed? If it isn't then you can test by creating a share on the root of a volume, then try re-enumerating and then check if the share is listed.


    Blog: https://thesystemcenterblog.com LinkedIn:

    Thursday, June 27, 2019 8:02 PM
  • The data source is not and has never been part of another protection group.

    After clicking "Close", it does list some shares and volumes, but it does not list the E: drive of that server.  The E: drive has three folders on it, all of which are shared, but it also does not list the shares.  Finally, I tried creating a share on the root, but it gives me the same error when I refresh data sources.  I am able to see these shares from File Explorer of multiple other computers as well as the DPM server.


    Friday, June 28, 2019 12:31 PM
  • Just trying to get a better understanding, in the "Before issue" picture you have the following shared folders backed up in a protection group "Protection Group: Tier I (Long Term)":

    • Files
    • Index
    • Logs

    This is currently protected from the "Cluster Network Name: EtQ.VDOVCETQ200.domain.net", so you're using the cluster name object here, while in the "After issue" it seems that you're looking on a single cluster node "VDOVCETQ300"? 

    By looking from your "E drive of target server" picture, it looks like the shares are located on a clustered shared disk, so you need to query the cluster object within DPM to see the E: drive, and not a single cluster node.


    Blog: https://thesystemcenterblog.com LinkedIn:

    Sunday, June 30, 2019 10:05 PM
  • I see what you're getting at, but I'm not sure how to proceed.  My previous experience is with BackupExec, NetWorker and Veeam.  I thought Networker was bad, but DPM is a completely different animal.

    I went to the Management area, and added vdovcetq200, but I'm not sure that was the correct course of action.  I'm now having issues getting the DPM agent to respond on either vdovcetq200 or vdovcetq300.

    Also from the Management area, I tried adding a computer.  There is an ETQ in the list, but it won't let me add it.  I'm not even sure this is what I should be trying to do.

    

    Monday, July 1, 2019 2:22 PM
  • I concur DPM is indeed a bit different, but a lot has changed since DPM 2012 R2, I've also used all of the ones you've mentioned above.

    When protecting a cluster, you need to make sure to install the DPM agent on all cluster nodes, when you want to add workloads to a new/existing protection group, you should select the cluster resource name instead of the cluster nodes.

    What errors are you now receiving from your cluster nodes: vdovcetq200 and vdovcetq300?

    Could you also tell me how it was set up from the beginning? (I mean the cluster workloads)


    Blog: https://thesystemcenterblog.com LinkedIn:

    Monday, July 1, 2019 2:49 PM
  • I've only been working here since April, so I'm not entirely sure why things are set up the way they are.

    The actual host name is vdovcetq201.  I can RDP to it via that name, or also vdovcetq200 and vdovcetq300.  It is the only host in the cluster, so I'm not really sure why it's clustered in the first place.

    From Failover Cluster Manager, it lists vdovcetq200 as a failed cluster.

    If I click on "1 failed, 1 total" it shows me this.  Aha!

    But the node is up:

    I think I'm on to something, but I don't know what.  :-)

    Thanks.


    Monday, July 1, 2019 3:21 PM
  • I've never tried/experienced using DPM to back up a cluster that has only one (1) cluster node, but I still believe it shouldn't matter, DPM goes by the cluster name, same as for your RDP sessions (it uses the active owner cluster node).

    So you have a cluster role that is a generic service, is this in use anymore as it's in a "failed" state? (if not it could simply be removed).

    I assume you want to back up the files/folders that reside on the clustered storage?


    Blog: https://thesystemcenterblog.com LinkedIn:

    Monday, July 1, 2019 3:27 PM
  • Yes, I need to backup the folders in that original "Before" screenshot.  Unfortunately, no matter how I try to add a group member (by any of its three names), I get this error:

    DPM is unable to enumerate contents in \\?\Volume{1307ecf2-7cd2-11e5-80d4-00155d697537}\ on the protected computer vdovcetq201.thorneresearch.net. Recycle Bin, System Volume Information folder, non-NTFS volumes, DFS links, CDs, Quorum Disk (for cluster) and other removable media cannot be protected. (ID: 38)

    I'm coming up empty so far on how to address this.

    Monday, July 1, 2019 6:24 PM
  • If this cluster has not been protected before, could you try to uninstall the DPM agents from the DPM console, check the cluster node (the only one left in the cluster) that everything got uninstalled, also the DPM agent folder.

    Restart the DPM server and the cluster node (if possible), then try to reinstall the DPM agent from the DPM console and finally try to create a protection group?


    Blog: https://thesystemcenterblog.com LinkedIn:

    Monday, July 1, 2019 6:29 PM
  • This cluster was protected before by this DPM server.  This all started because I started getting synchronization failures and recovery point creation failures.  In the past, I fixed this by removing the affected machine/object from the protection group, then re-adding it.  This time, that process has caused me a great deal of trouble since I'm not able to re-add the affected shares.

    Unfortunately this is a production server so my ability to work with it during the week is limited.  I worked on it this weekend and tried uninstalling and reinstalling DPM and rebooting.

    Whoa, hold on.  Strangely, I now have access to those shares on vdovcetq200.  I was able to enumerate those shares again and DPM is now running a replica consistency check.  We will see how it goes.

    Monday, July 1, 2019 8:47 PM
  • Usually a removal from the protection group and re-adding solves many problems, while this is indeed a very strange behavior.

    Let us know how it goes!


    Blog: https://thesystemcenterblog.com LinkedIn:

    Monday, July 1, 2019 8:52 PM
  • Well, I tried "Run a synchronization job with consistency check" twice, and both times it has come back and said the replica is inconsistent.  Since I just removed and re-added it to the protection group, I'm guessing I need to try something else?
    Tuesday, July 2, 2019 1:12 PM
  • I tried following the directions here:

    https://social.technet.microsoft.com/Forums/en-US/6ce64707-e1d6-47d0-bd67-68b52f9a04f7/replica-is-inconsistent-corrupt-folder-on-replica-volume?forum=dataprotectionmanager

    The chkdsk ran successfully and looks like it cleaned up a few minor things, but I'm still getting "Replica is inconsistent".

    Tuesday, July 2, 2019 6:20 PM
  • Is the cluster disk a "normal" clustered disk or a cluster shared volume (CSV)?


    Blog: https://thesystemcenterblog.com LinkedIn:

    Tuesday, July 2, 2019 11:31 PM
  • I'm not sure.  Does this look like a CSV?

    Wednesday, July 3, 2019 1:31 PM
  • Doesn't look like a CSV, if you have the ability to "Add to Cluster Shared Volumes" when you select the ETQDB disk then it is a normal cluestered disk.

    You can also identify if the disk is a CSV, by checking the path of the disk, if it's located in C:\ClusterStorage\VolumeX\ then it's a CSV disk.


    Blog: https://thesystemcenterblog.com LinkedIn:

    Wednesday, July 3, 2019 1:35 PM
  • "Add to Cluster Shared Volumes" is not an option when I right-click it.  It's not grayed out - it's just not there.

    OK, I found the path in Hyper-V Manager.  It is pointing to C:\ClusterStorage\VMPCL1\VDOVCETQ201\VDOVCETQ201_Disk0.vhd.

    It looks like there's a checkpoint there from the end of March, which was just before I started here.  I'm *guessing* it could probably be deleted if it's causing a problem.


    Wednesday, July 3, 2019 1:56 PM
  • Okay, you should never let any checkpoints linger around, I would recommend deleting it (from the Hyper-V Manager), as this can also cause issues for your backups.

    The disk appears to be a CSV (Cluster Shared Volume), please note that DPM does support this, however DPM only supports protecting Hyper-V workloads stored on CSV disks, so if you have normal file shares/folders, that won't be supported and will fail.



    Blog: https://thesystemcenterblog.com LinkedIn:

    Wednesday, July 3, 2019 2:09 PM
  • I deleted checkpoints on a few different servers, so none of my VM servers have checkpoints anymore, but I'm still getting "Replica is inconsistent".  I've looked in the Windows logs but I'm having trouble finding events that might correlate to what's going on.
    Wednesday, July 3, 2019 5:06 PM
  • Is your problem only related to this cluster which has one cluster node?

    Blog: https://thesystemcenterblog.com LinkedIn:

    Wednesday, July 3, 2019 5:50 PM
  • Leon,

    It's all fixed now.  I'm going to mark this as answer as well as some of your posts above.

    You'll recall that earlier you had me remove everything from the TEMP folder.  Apparently that caused some issues which were fixed by creating an MTA subfolder as outlined here (it's a short read):

    http://backupexec-hell.blogspot.com/2016/12/dpm-replica-is-inconsistent-error-3106.html

    I also ran a chkdsk on the affected volume, and it ran for quite some time (unlike the previous one).

    As for which of the other things helped, I'm not sure which ones made the difference or not.  Thanks for the help, though.  It's been a learning experience!

    Zack

    Monday, July 8, 2019 12:15 PM
  • I'm glad to hear that your issue is fixed!

    Blog: https://thesystemcenterblog.com LinkedIn:

    Monday, July 8, 2019 1:34 PM