none
Node showing down in Cluster.

    Question

  • I will try to make this as simple as possible.  Here is the setup.

    2 node cluster setup as Node and Disk majority.  Both servers ServerA and ServerB were online and functioning previously.  Sometime back there was a disk corruption and the cluster ran a chkdsk.  The disk issue was resolved.

    Since then one of the nodes ServerA is showing up as Down (red x) within Failover Cluster manager.  ServerA was rebooted serveral times, but it still fails to be recognized as part of the cluster and come online.

    Here is what I have checked.

    On Server A.  When I run cluster node command it shows.

    Node           Node ID Status
    -------------- ------- ---------------------
    ServerA             1 Joining
    ServerB             2 Down

    On Server B.  When I run cluster node command it shows.  ServerB is running all applications and services.

    Node           Node ID Status
    -------------- ------- ---------------------
    ServerA             1 Down
    ServerB             2 Up

    I think ServerA is trying to join the cluster but cant lock the Quorum disk or see ServerB.  Why would this happen and how to fix this.

    Cluster service on both servers is at automatic and in running state.

    Nothing has changed on the firewall and i can ping both public and private ips from both servers. 

    Thanks.

    Tuesday, December 06, 2011 1:36 PM

Answers

  • Perhaps, taking a look at the cluster log from both nodes starting with node 2 would be helpful.  Also, what does the system event log day about failed attempts to join.

    cluster . Log /gen /node:Node2     (C:\windows\cluster\report\Cluster.log)

    Search for ERR in uppercase in the log.


    Dave Guenthner [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights. http://blogs.technet.com/b/davguents_blog
    Wednesday, December 07, 2011 11:29 AM
  • It sounds like the NodeA is having network communications issues with NodeB. Node A should be atteming to join the cluster when you start the service but this is failing for some reason. As Dave suggests, look in the cluster log for more hints, though you'll likely want to examine the NodeA cluster.log for the details. Hope this helps
    Visit my blog about multi-site clustering - http://msmvps.com/blogs/jtoner
    Wednesday, December 07, 2011 2:45 PM
    Moderator

All replies

  • Hi,

    In these situation I normally run the cluster validation tool against the existing cluster as it will provide "usually" good information on what it not configured correctly.

    http://technet.microsoft.com/en-us/library/cc772450.aspx

    When you mention "The disk issue was resolved". How exactly was that resolved?


    Sean Massey | Consultant, iUNITE

    Feel free to contact me through My Blog or Twitter.
    Please click the Mark as Answer button if a post solves your problem!

    Tuesday, December 06, 2011 11:18 PM
  • The cluster ran a chkdsk and reported no errros.  Also the disk is showing online on ServerB.

    I have tried to run cluster validation on ServerB but with the disks being in use.  It skips some checks.

    Do i need to stop cluster service on ServerB and validate the cluster?  As these are live servers, it may not be possilbe.

    Is there any way to add the faulty server back to the cluster?

    Wednesday, December 07, 2011 10:55 AM
  • Hi, Which disk was presenting issues, the quorum disk ?


    Regards, Samir Farhat Infrastructure Consultant
    Wednesday, December 07, 2011 11:01 AM
  • Perhaps, taking a look at the cluster log from both nodes starting with node 2 would be helpful.  Also, what does the system event log day about failed attempts to join.

    cluster . Log /gen /node:Node2     (C:\windows\cluster\report\Cluster.log)

    Search for ERR in uppercase in the log.


    Dave Guenthner [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights. http://blogs.technet.com/b/davguents_blog
    Wednesday, December 07, 2011 11:29 AM
  • It sounds like the NodeA is having network communications issues with NodeB. Node A should be atteming to join the cluster when you start the service but this is failing for some reason. As Dave suggests, look in the cluster log for more hints, though you'll likely want to examine the NodeA cluster.log for the details. Hope this helps
    Visit my blog about multi-site clustering - http://msmvps.com/blogs/jtoner
    Wednesday, December 07, 2011 2:45 PM
    Moderator
  • I had same issue.

    DAG in two site. Primary site have 2 member server and DR site has 1 member server.

    we rebooted one member server in primary site for software update. after rebooting database copy failed and below error occurred.

    test-RecplicationHealth -Server server
    Server          Check                      Result     Error                                                           
    ------          -----                      ------     -----                                                           
    MKE01BL09407652 ClusterService             Passed                                                                     
    MKE01BL09407652 ReplayService              Passed                                                                     
    MKE01BL09407652 ActiveManager              *FAILED*   Active Manager is in an unknown state on server 'MKE01BL094076...
    MKE01BL09407652 TasksRpcListener           Passed                                                                     
    MKE01BL09407652 TcpListener                Passed                                                                     
    MKE01BL09407652 DagMembersUp               Passed     

    The Full result of the failed test
    Result           : *FAILED*
    Error            : Active Manager is in an unknown state on server 'MKE01BL09407652'. Basic database administrative ope

    Error: The NetworkManager has not yet been initialized. Check the event logs to determine the cause..

     interface show status "unavailable" in dag window in EMC

    Solution:

    restarted the cluster service on DR site server. now everything works fine


    kesav

    Thursday, September 12, 2013 8:05 AM