none
cluster network * is partitioned. some attached failover cluster nodes cannot communicate wit each other over the network...HELP!!!!

    Question

  • ok a client of mine has a 2 node cluster using 2008 R2.

    2 networks (private & public) public is obviously LAN, & the Private is a direct cable (non crossover) to nic port 2, no teaming involved, no SCsi networks involved

    shared disks are done by the way with Fiber to a EMC san.

    This was running solid back in nov 2011 when it was setup, then around 12/19/12 they got a few of the partitioned network errors. 

    that only lasted a day or so & the client said nothing changed.

    then fast forward to 5/12/12 & the errors came back...with a vengeance, they seem to happen every few minutes. NEedless to say, NONE of the cluster resources seem to be affected, both networks are up on both nodes. This has been going on now since May & i have tried everything in my power to fix it. 

    the internet has no help, since i only see post with people having teaming or Iscsi stuff involved.  M$ website on errors 1126 & 1129 are of no help whatsoever. 

    even when the errors are happening i can still ping from each node on both the private & public connections.

    on the private, i have disabled all protocols except IP4 & the client for M$ networks, 

    there is no gateway , only the IP & subnet, all of the options under advanced are disabled (that i am able to disable) Both the HB on each node has been set to 100 half 

    i am at a completed loss...the ^^ above ^^ settings for the HB have not always been in place, i recently implemented them to see if it would fix the issues & it has not.

    also it is both networks that are saying that they are partitioned, but i dont know how or why...please ANY help would be great. 

    Wednesday, September 05, 2012 3:57 PM

Answers

  • 96Primera, we have a couple known issues regarding networking that can cause symptoms you are describing.  Can you ensure that SP1 is installed on both nodes and that hotfixes listed below are installed on each node.  This will address all well known issues and surely be recommended by CTS if you engage them.

    2545685 Recommended hotfixes and updates for Windows Server 2008 R2 SP1 Failover Clusters
    http://support.microsoft.com/kb/2545685/EN-US


    Dave Guenthner [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights. http://blogs.technet.com/b/davguents_blog

    Thursday, September 06, 2012 11:20 AM
  • all issues have been rectified 

    the applying the IBM firmware & driver updates has fixed the issues.

    • Marked as answer by Steve_Lindsey Tuesday, September 18, 2012 5:33 PM
    Tuesday, September 18, 2012 5:33 PM

All replies

  • I almost never make network changes.  Turning off protocols and resetting values is almost always a way to create more problems.  In actuality, failover clusters prefer to use IPv6.  That doesn't mean that the cluster won't work correctly since you turned it off, but I never like to turn off something I know an application prefers.

    A non-crossover cable between two NICs?  That seems strange.  It seems like the tx/rx would not be recognized correctly.

    As for the other network, can you get it back to the defaults?  I don't have a 2008 cluster anymore (everything is on 2012) so I can't show you what the defaults are.  As I say, I never change them and the clusters have worked fine.  As for the partitioned public network, it could be an intermittent problem with the switch.   The cluster will send communication over each network it is connected to, and if it can't reach a partner on any network, it will show as a partitioned network.  As soon as it can talk again, it will reset it back to active.

    Or, you might be having intermittent drops on your NICs.  Do you have a couple spare NICs to throw in to see if that fixes the issue?


    tim

    Wednesday, September 05, 2012 10:19 PM
  • 96Primera, we have a couple known issues regarding networking that can cause symptoms you are describing.  Can you ensure that SP1 is installed on both nodes and that hotfixes listed below are installed on each node.  This will address all well known issues and surely be recommended by CTS if you engage them.

    2545685 Recommended hotfixes and updates for Windows Server 2008 R2 SP1 Failover Clusters
    http://support.microsoft.com/kb/2545685/EN-US


    Dave Guenthner [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights. http://blogs.technet.com/b/davguents_blog

    Thursday, September 06, 2012 11:20 AM
  • I almost never make network changes.  Turning off protocols and resetting values is almost always a way to create more problems.  In actuality, failover clusters prefer to use IPv6.  That doesn't mean that the cluster won't work correctly since you turned it off, but I never like to turn off something I know an application prefers.

    A non-crossover cable between two NICs?  That seems strange.  It seems like the tx/rx would not be recognized correctly.

    As for the other network, can you get it back to the defaults?  I don't have a 2008 cluster anymore (everything is on 2012) so I can't show you what the defaults are.  As I say, I never change them and the clusters have worked fine.  As for the partitioned public network, it could be an intermittent problem with the switch.   The cluster will send communication over each network it is connected to, and if it can't reach a partner on any network, it will show as a partitioned network.  As soon as it can talk again, it will reset it back to active.

    Or, you might be having intermittent drops on your NICs.  Do you have a couple spare NICs to throw in to see if that fixes the issue?


    tim

    all problems existed with the default settings as well. the problems 90% of the time are on the Heartbeat.

    the IBM 3650's have auto switching nic ports from what i have read & been told, thats why having a straight through cable from nic 2 to nic 2 on the nodes should be fine.

    the nics being used are the internal/onbaord ones. i have dont a packet trace on the interfaces, & they show no signs of packet drops.

    Thursday, September 06, 2012 12:46 PM
  • 96Primera, we have a couple known issues regarding networking that can cause symptoms you are describing.  Can you ensure that SP1 is installed on both nodes and that hotfixes listed below are installed on each node.  This will address all well known issues and surely be recommended by CTS if you engage them.

    2545685 Recommended hotfixes and updates for Windows Server 2008 R2 SP1 Failover Clusters
    http://support.microsoft.com/kb/2545685/EN-US


    Dave Guenthner [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights. http://blogs.technet.com/b/davguents_blog

    thank you very much,

    i had the client last night push out some IBM updates that were available, so i will connect with him today to see if that resolved the issue,

    it seems like every hour or so they have about 20 of these alerts for about 10 min, then it stops.

    a  new issue is that resources have been failing over. with no physical signs of failure, all of a sudden a IP will fail a health check & then the resources (they have SQL 2008 r2 clustered & a fileserver/share) will jump, or the cluster group will jump....its very weird.

    but i will check the link above for the KB's

    Thursday, September 06, 2012 1:26 PM
  • Can you try moving the NIC 2 connections to the switch?  Personally, I've never been a fan of the cross-over cable.  It limits expansion, if that should ever become a need.  And, given that 90% of your problems are coming from that configuration, it wouldn't hurt to try a different configuration.

    tim

    Thursday, September 06, 2012 2:12 PM
  • all issues have been rectified 

    the applying the IBM firmware & driver updates has fixed the issues.

    • Marked as answer by Steve_Lindsey Tuesday, September 18, 2012 5:33 PM
    Tuesday, September 18, 2012 5:33 PM
  • A non-crossover cable between two NICs?  That seems strange.  It seems like the tx/rx would not be recognized correctly.

    @Tim,

    For Gigabit NICs a crossover cable is not required, the NIC will negogiate the appropriate connection on it's own.

    Tuesday, September 18, 2012 10:33 PM