none
TEST disaster recovery

    问题

  • We offer DR for our customers to our second datacenter. All VMs are replicated by synchronous storage replication to the second DC. Previously we had one SCVMM that would see both DCs and all Hyper-V clusters in it, but when we would test a DR for a customer, SCVMM would get very confused. 

    Procedure with 2012R2 was to snapshot the storage, mount the snap CSV on the DR cluster, scan the CSV for VM files, register those VM, connect to DR network (isolated) and boot the VMs. However, because of the ID inside of the VM files, SCVMM would get confused and see double VMs. In the production cluster the VM would then disappear from SCVMM until we finished the DR test and remove the VM from the DR cluster.

    Also register the VMs would sometimes be difficult because we would read the XML file, search for the disks and register the VM and re-attach the disks. Problem is that the CSV could be "disk15" in Prod Cluster and "disk10" in DR cluster. That is very difficult to script.

    Now with 2016 and the encrypted XML we have to look for different ways to do a DR and most importantly also a Test DR without interfering with production.

    For safety we're looking at at least implementing a second VMM for DR, but we can't seem to find a reliable method to register the VMs and their disks, because the CSV volume would still have different names. We could make it very complicated by writing a text file in each volume, the says what the original name is and do mappings based on that. But I think that is very risky.

    Any tips on how to do this are very welcome.

    2018年6月19日 7:59

全部回复

  • Hi!

    The best would be to provide High Availability (HA) solution for the SCVMM infrastructure as well.

    You can build a HA SCVMM infrastructure by using a SCVMM fail-over cluster for example. You will have one active SCVMM server on the primary site and a Passive SCVMM server on the DR site.

    You will have the active VMM Database on the primary site (SQL Node 1) and a replica database on the DR site.

    When the primary site is down, you will be bale to switch to your SCVMM infrastructure to the DR site.


    More information on deploying a high available SCVMM management server in the link below:
    https://docs.microsoft.com/en-us/system-center/vmm/ha-server?view=sc-vmm-1801

    I have experience using an asynchronous replication from the primary site to the DR site.
    We did only use one SCVMM server though, but the failover/disaster recovery worked fine when testing to the DR site.

    Basically we were using two identical disk systems (HP 3PAR) on each site, we had two SAN link paths from each site to each CSV disk to make it redundant. When one site went down, the CSV disk would still be online and see the disk from the other Site.

    Best regards,
    Leon


    Blog: https://thesystemcenterblog.com  LinkedIn:   

    2018年6月19日 9:44
  • Hi,

    Thank you for your reply. 

    Having a passive SCVMM, would mean that I can't do test failovers without issues in production, can I ? We have several customers and each one gets a test failover once a year and not all at the same time.

    Gabrie

    2018年6月19日 12:53
  • I believe you can not perform a simulation/"test" I'm afraid, it would be testing in production.

    Blog: https://thesystemcenterblog.com  LinkedIn:   

    2018年6月19日 16:04