DPM 2010 fails to backup virtual guests on a Hyper-V Cluster Shared Volume using hardware VSS provider RRS feed

  • Question

  • We have been battling with this problem for 2 years now and have raised calls with Microsoft and EMC both of which have not resulted in a resolution.

    We have an 8 node CSV with about 40 virtual servers on it.  The Hyper-V hosts are Windows 2008 R2 SP1 servers with the Hyper-V role installed and the SAN is a Clarion CX3-10c and we are using the EMC 4.7.1 hardware provider for snaphots. The problem we have is that snapshots are not 'always' getting created on the SAN and the recovery points in DPM fail.  It often takes several attempts to re-run the job before it successfully works.

    The MaxParallelBackups registry key is set to 1 on the DPM server so we are running jobs serially on each node and we have aligned all the VMs so the Hyper-V owner of each VM is the same owner of the CSV that those VMs have their resources on. We have done this to avoid any ownership changes to reduce the risk of failures.  This has had some success and it is not always the same servers that fail which gives this problem an unfortunate degree on intermittency!

    There is no DataSourceGroups.xml anymore after I have tried to implement previously (it's effectiveness unproven though). We get a few days grace sometimes where backups are running with no problems but the longest run of success we have had is about 2-3 weeks which is our most recent run and it's cessation has led me to post this on the forum. Since Wednesday last week we have had regular failures with between 9 - 12 failures each night.  The annoying part of this is that we hadn't made any changes to either DPM or the Cluster which makes no real sense.

    One thing I could do is split the Protection Groups so all the VMs on Hyper-V host1 are in one group, all the ones on Hyper-V host2 in another and so forth, however if anyone could advise me if this will be any help or not before I do it that would be much appreciated as I don't want to do this unless I have to in case I hit capacity problems (most VMs are in one large protection group at present)

    Any help would be much appreciated (I have logs I can attach if anyone would like to see a sample of when the failures occur),



    Thursday, May 1, 2014 2:30 PM