locked
Clustered Shared Volumes and Failover RRS feed

  • Question

  • I have a question regarding CSV (Clustered Shared Volumes).

    We are introducing CSV for our virtual servers; we will setup several Hyper-V VMs on a LUN accessed by two host servers (Windows 2012 R2).
    We would like to know for sure how applications (most notably Exchange and Skype for Business) running on VMs and their client softwares running on our users' Windows PCs could be affected/disrupted should one of the hosts fails over to the other. Our plan is to distribute VMs to two hosts to spread the load, but occasionally we might

    1. Temporarily serve all the VMs on one host so that we can shut down and service the other host for maintenance purpose (planned failover).
    2. Or one of the host might suffer hardware malfunction and fail (unplanned failover).

    We would also like to know, in each of the two scenarios above, any manual operations we would have to perform on A). host server, B). VM, and C). users' terminals (Windows 8.1 Pro).

    Regards,

    Jon

    Friday, January 15, 2016 12:59 PM

All replies

  • If one node fail, all VMs will be restarted on surviving node. So dirty start will occur. VMware has fault-tolerance but limited to 4 cpu now (1 CPU until version 6). But on Microsoft there is no technology like this (if yes, please all others share this info, something can be in new W2016).

    But if you can plan the maintenance window you can do this without service disruption. You can do Live migration (in Vmware vMotion). So Live migration will copy currently used operating memory of the VM to the second host and if succesful, VM is migrated without disruption.

    Very same thing you can do when you plan to move the storage (from one lun to another), this is called Storage Live Migration. Also without disruption.

    Both of them are available now also without using traditional shared storage. Storage migration and VM migration can be done together live to different host. Or you can use SMB share now as shared storage.

    If you have license for System Center products those operations can be automated using PRO technology - this is similar to VMWare DRS. So you can balance all physical hardware to the best utilization according some plans.

    But consider one very important thing. If you will have your VMs under pressure (high memory and CPU load) in this case Live migration can fail if you are using gigabit connection. It is due nature of LM which can use just one line (on teamed interfaces). So if you will have 4 gig team of NICs in this case just 1 gig will be available for LM. So consider 10gig network for LM in case you think that your VMs will be under stress.

    It was tested on W2012 R2 with 128GB RAM with prime95 in one VM with 32G allocated and 2 vCPUs.

    Friday, January 15, 2016 1:47 PM
  • Thank you so much for the detailed answer.

    We are using Hyper-V exclusively, but I assume the fundamentals of what you said about the live migration and storage migration apply, whether we use Hyper-V or VMware (Correct me if I'm wrong).
    We will have to consider whether our network allows us to opt for live migration or not, so thanks for the heads up on the network capacity issue.

    Regarding the maintenance window, as long as all VMs (we will have up to several on a single LUN storage) fails over within 10 minutes or so, we are good.
    A typical failover scenario will be, for example
    - Each of our two host servers is running 3 VMs (i.e. 6 VM in total).
    - Host server A fails and the 3 VMs on host A are moved to host B.
    Each VM on avarage carries 8GB RAM, so the way we estimate the LM time requirement is to calculate the time it takes  for our network to transfer data of 8GB x 3 size, I guess?
    I'd appreciate if you told me if I'm doing something way off.

    Regards,

    Jon

    Friday, January 22, 2016 7:47 AM