locked
Unable to add a fourth node to Cluster RRS feed

  • שאלה

  • I've got 3 servers all HP DL180 G6 running Windows Server 2008 R2 SP1 DataCentre Edition and 1 server HP DL180 G7 all running Windows Server 2008 R2 SP1 Enterprise Edition.

    Initially I created a cluster with 3 servers (2 DataCentre + 1 Enterprise) for the purpose of sharing access to a SAN which configured without issue and is working successfully. I'm attemtping to add the fourth (1 datacentre), but it continues to fail.

    The problem is they are all configured exactly the same on both the working and non-working nodes. IE:

    -The same firewall ports are opened
    -All the servers have 4 nics. 1=VM Traffic 1 = Management 2=Teamed for private communication with SAN (Note the Enterprise edition server doesn't do VM Management so that nic is not active.
    -All servers configured with same network configuration.
                           -VM Traffic & Management using public network ip address
                            -Team nic using 10.0.0.X ip address
    -problematic node via iscsi initiator is able to talk to SAN via 10.X address
    -problematic node able to ping/tracert other nodes on 10.X range

    From the failover cluster manager the node begins to add to the cluster with a down stance, but I get the message "Waiting for notification that node X is a fully functional member of the cluster" and "the operation is taking longer then expected"

    As this point it just sits there. I have to force it quit. Within the event viewer I receive the error

    "Node X failed to join the cluster because it could not send and receive failure detection network messages with other cluster nodes. Please run the Validate a Configuration wizard to ensure network settings. Also verify the Windows Firewall 'Failover Clusters' rules."

    and then the critical error
    "Cluster Shared Volume 'Volume1' ('Cluster Disk 1') is no longer available on this node because of 'STATUS_MEDIA_WRITE_PROTECTED(c00000a2)'. All I/O will temporarily be queued until a path to the volume is reestablished." The weird thing here is cluster/SAN communication is fine despite this error.

    I ran the validation against the defective node, but it found the networking to be fine. I've configured the cluster to only use the 10.X cluster network.

    Any idea's?

    יום שני 09 יולי 2012 15:50

תשובות

  • I have not tried to create a cluster with one node having a different network configuration than the others.  That would be the first thing I would change.  Generally, whenever I am having problems with a cluster that has VMs running, the first thing I check is to ensure that all networks are defined EXACTLY the same.  In your case, you have disabled one of the NICs on one node.  That's why I asked if you had run validation on the cluster - my guess is that the validation is going to report that as a problem.  And, since the validation report tells you if you have a supported cluster or not, it's always a good idea to run the wizard to ensure compliance.

    Yes, I understand there are no issues with NIC teaming for network usage.  However, for SAN communication, it is recommended to use MPIO.  At a minimum, you would need to check with the vendor of the NIC teaming software to ensure they support your configuration.  I know that MPIO is supported, as that comes from Microsoft.  As to teamed NICs for SAN access, and within a clustered environment, that is entirely up to the vendor of the teaming software.

    And, as I said in my first response, mixing Enterprise and Datacenter is not a best practice.  It should work with no issues.  Often that comes down to a licensing issue when running VMs in a cluster.  It is really easy to end up with more instances of the operating system running on a node licensed for Enterprise if all the other nodes are licensed for Datacenter.


    tim

    • סומן כתשובה על-ידי Vincent Hu יום חמישי 09 אוגוסט 2012 07:54
    יום שלישי 10 יולי 2012 15:25

כל התגובות

  • First, you should have all networks exactly the same.  You say that you have the one on the Enterprise Edition node inactive.  Does that mean that it is not working?

    Mixing Enterprise and Datacenter is not a best practice, but as you found, it can be a working cluster.  Don't think that is the issue here, but just an FYI.

    It is preferred to use MPIO for dual NICs communicating to the SAN instead of NIC teaming.  In fact, NIC teaming is not supported.

    You say you ran validation on the defective node.  You have to run the wizard against the whole cluster.  What does the validation wizard against the whole cluster tell you?  Did you run the validation for the initial cluster with three nodes?


    tim

    יום שני 09 יולי 2012 18:37
  • In regards to the networking. the 4 servers have 4 nics each.

    3 Hyper-V Hosts (Data Centre) with
    1 Nic for Mgmt
    1 Nic for VM Traffic
    2 nics for SAN communication

    1 Managment Server (Enterprise) with
    1 Nic for Mgmt
    1 Nic disabled
    2 nics for SAN communication

    I don't think the issue is related to windows version as I've got 2 data centre + 1 enterprise working and since both versions are working I don't see an issue with adding another data centre

    In regards to MPIO/Nic Teaming debate I found this morning and found that in Server 2008 R2 nic teaming is allowed

    "In Windows Server 2008 and Windows Server 2008 R2, there are no restrictions that are associated with NIC Teaming and the Failover Clustering feature. In Windows Server 2008, the new Microsoft Failover Cluster Virtual Adapter is compatible with NIC Teaming and allows it to be used on any network interface in a Failover Cluster."
    http://support.microsoft.com/kb/254101

    I did run the validation server against the server the private 10.X network returned no errors which is what the SAN communication/cluster is running.




    יום שני 09 יולי 2012 23:02
  • I have not tried to create a cluster with one node having a different network configuration than the others.  That would be the first thing I would change.  Generally, whenever I am having problems with a cluster that has VMs running, the first thing I check is to ensure that all networks are defined EXACTLY the same.  In your case, you have disabled one of the NICs on one node.  That's why I asked if you had run validation on the cluster - my guess is that the validation is going to report that as a problem.  And, since the validation report tells you if you have a supported cluster or not, it's always a good idea to run the wizard to ensure compliance.

    Yes, I understand there are no issues with NIC teaming for network usage.  However, for SAN communication, it is recommended to use MPIO.  At a minimum, you would need to check with the vendor of the NIC teaming software to ensure they support your configuration.  I know that MPIO is supported, as that comes from Microsoft.  As to teamed NICs for SAN access, and within a clustered environment, that is entirely up to the vendor of the teaming software.

    And, as I said in my first response, mixing Enterprise and Datacenter is not a best practice.  It should work with no issues.  Often that comes down to a licensing issue when running VMs in a cluster.  It is really easy to end up with more instances of the operating system running on a node licensed for Enterprise if all the other nodes are licensed for Datacenter.


    tim

    • סומן כתשובה על-ידי Vincent Hu יום חמישי 09 אוגוסט 2012 07:54
    יום שלישי 10 יולי 2012 15:25