locked
Failover Cluster Simulation Failed Due To Network Problem RRS feed

  • Question

  • Hi guys,

    Currently I am testing Failover Cluster with Hyper-V using Starwind Native SAN. I am trying to simulate the failover in power, network and storage. I have succeed in testing the failover in power supply (clean shut down), and storage (disabling the Starwind service). Unfortunately, for network failover the thing does not work so good.

    This is how I test the network failover:

    1. I have 2 nodes and 4 cluster network:

                   Cluster Network 1 : Heartbeat (172.16.18.0/24)

                   Cluster Network 2 : Synchronization (172.16.20.0/24)

                   Cluster Network 3 : Management interface (192.168.1.0/24)

                   Cluster Network 4 : External Adapter for VM communication (192.168.2.0/24)

    2. I assign only External adapter for each VM. The live migration is working correctly.

    3. If I unplug the Cluster Network 3 or 4 from 1 node, the failover does not happen. I cannot connect to the VMs and cannot see them migrated in the Failover Cluster Manager. Do I need to assign all the cluster network adapters to the VMs?

    Please let me know if you need further information. I will happily provide you that. Thanks for your help in advance!

    Regards,

    Ridhuan

    Wednesday, March 28, 2012 2:44 AM

Answers

  • Yes, the cluster will failover when a cluster member fails, or becomes inaccessible.

    You can uncheck the box "allow this network adapter to be shared with management operating system" in the virtual network manager on each node. The adapter should not be configured with an IP address at host level, in fact TCP/IP and other protocols used by host will be disables, leaving only the "Microsoft Virtual Switch Protocol" enabled.

    Thursday, March 29, 2012 6:32 AM
  • I would guess this is due to IPv6 addresses being automatically generated on different ranges, since you probably haven't configured static IPv6 addresses or have an DHCP server providing IPv6. The IPv4 addresses you've set manually should be sufficient.


    • Edited by Mike_Andrews Monday, April 2, 2012 11:48 AM
    • Marked as answer by Ridhuan Amri Thursday, April 5, 2012 2:26 AM
    Monday, April 2, 2012 11:48 AM

All replies

  • Hi!

    The cluster will only failover virtual machines if communication to a node is lost. The VM's will not failover because your cluster is still considered intact through the heartbeat network.

    If you disconnect the heartbeat network, and configured cluster networking to be allowed through the management interface, heartbeat will be established there instead.

    The cluster does not consider a virtual machines failure to reach the network as a cluster failure, only the absence of its nodes, in which case failover occurs according to your quorum configuration.


    By the way, physical NIC bound to external virtual network should be dedicated and invisible to the cluster. Since its detected by your cluster with an IP range, it appears that the NIC is shared with the management operating system, and is not a best practice recommendation.
    • Edited by Mike_Andrews Wednesday, March 28, 2012 8:03 PM
    • Proposed as answer by VR38DETTMVP Friday, March 30, 2012 2:16 AM
    Wednesday, March 28, 2012 8:00 PM
  • Hi,

    Thanks for the reply. Does that mean only if all of the network communication between the 2 nodes being cut off then the node failover will happen? Meanwhile, for the external virtual network, yes it is shared with the management operating system. Do I need to set a fixed IP for the physical adapter in both node?

    Thursday, March 29, 2012 1:20 AM
  • Yes, the cluster will failover when a cluster member fails, or becomes inaccessible.

    You can uncheck the box "allow this network adapter to be shared with management operating system" in the virtual network manager on each node. The adapter should not be configured with an IP address at host level, in fact TCP/IP and other protocols used by host will be disables, leaving only the "Microsoft Virtual Switch Protocol" enabled.

    Thursday, March 29, 2012 6:32 AM
  • Hi Mike,

    Thanks for the info. I did test your suggestion and the connection seems to be OK now. But I curious in 1 thing, why does 1 of my Cluster network keep on showing it is failed. I run Validation test for Network Configuration and found the message below:

    =======================================================================================
    Network interfaces MSHV1-SWNSAN1.BullRun.com - Local Area Connection 6 and MSHV2-SWNSAN2.BullRun.com - Local Area Connection 6 are on the same cluster network, yet either address fe80::8dc9:6450:b0de:a603%30 is not reachable from fe80::252e:b00d:7704:4565%34 or the ping latency is greater than the maximum allowed 500 milliseconds.

    Network interfaces MSHV2-SWNSAN2.BullRun.com - Local Area Connection 6 and MSHV1-SWNSAN1.BullRun.com - Local Area Connection 6 are on the same cluster network, yet either address fe80::252e:b00d:7704:4565%34 is not reachable from fe80::8dc9:6450:b0de:a603%30 or the ping latency is greater than the maximum allowed 500 milliseconds.

    =======================================================================================

    Is it because there is a loopback in my network configuration? How can I check that?

    -ridhuan-

    Monday, April 2, 2012 8:29 AM
  • I would guess this is due to IPv6 addresses being automatically generated on different ranges, since you probably haven't configured static IPv6 addresses or have an DHCP server providing IPv6. The IPv4 addresses you've set manually should be sufficient.


    • Edited by Mike_Andrews Monday, April 2, 2012 11:48 AM
    • Marked as answer by Ridhuan Amri Thursday, April 5, 2012 2:26 AM
    Monday, April 2, 2012 11:48 AM