none
cluster node Fail Over Issues

    Question

  • Hello all, I am new to clustering. I have active/active clustering, Node2 had failed, now the both are working on Node1. I need to find out, why the cluster Node2 has failed? What are the issues responsible to failover? Where shall i see the logs to analyse the problem? How will i check ,the server fine to undergo fail back? Thanks, Sujit
    Thursday, July 09, 2009 6:30 AM

Answers

  • Open the cluster logs and get to the point of the failure. Remember cluster logs are in GMT to convert the time into your timezone and see what caused the cluster to failover.. Compare the timestamp of the system logs with the cluster logs .. I am sure you will find the RC of the failover

    Few Important points to check:

    Networking -- Cluster Network Priority\Network Binding Orders and PSS should be disable back in registry
    Storage -- Latest version of storport.sys drivers MS KB 957910

    Let me know, if you need any help

    Regards,
    Aresh

    • Marked as answer by tsujit Monday, July 20, 2009 5:55 AM
    Monday, July 13, 2009 8:51 AM
  • Hi Aresh, Thanks!!! Now the Node2 in cluster is in UP state, but how will i come to know that this node2 is good and a failback can happen? how can i check that this node2 is ready for failback? i have checked the Server logs after the failover the Quorum disk got full. but now when i looked in Quorum disk, there is lot of space available. How will i troubleshoot this situation? Thanks!!! Sujit
    • Marked as answer by tsujit Monday, July 20, 2009 5:55 AM
    Wednesday, July 15, 2009 4:13 AM

All replies

  • In case of Windows 2003, look into the System Event log on either node, you will see that event log messages are replicated between nodes. You will see events from source "ClusSvc", these are the cluster messages. Look for the first resource failing, which is an event ID 1069.

    In case of Windows 2008, there is not replication of System Event log, so you must look into the System Event log of the node where the problem occurred. Look for events with source "Microsoft-Windows-FailoverClustering", again look for the first resource failing with an event ID 1069.

    Rgds,
    Edwin.
    Thursday, July 09, 2009 9:57 AM
    Moderator
  • hi Edwin, i have found these logs when node2 failed, Source: CLusSvc EventID 1069: Computer:Node2 1>Cluster resource 'SQL DB1' in Resource Group 'SQLGRP1' failed. 2>Cluster resource 'Master & Temp1' in Resource Group 'SQLGRP1' failed. 3>Cluster resource 'Backup Exec Device and Media Service' in Resource Group 'Backup Exec' failed. But, How will i reslove these issues and make the failback possible on Node2? Node2 needs to be errorless, else it will not failback. Thanks!!! Sujit
    • Edited by tsujit Friday, July 10, 2009 4:19 AM
    Friday, July 10, 2009 4:18 AM
  • Now I am getting so many errors in event log viewer. mostly these are Source:Userenv Event id: 1095 Description: Windows could not log all the RSOP (Resultant Set of Policy) Data. Group Policy processing will continue but the RSOP data might not be accurate. Source:Userenv Event id: 1030 Description: Windows cannot query for the list of Group Policy objects. Check the event log for possible messages previously logged by the policy engine that describes the reason for this. Source:Userenv Event id: 1006 Description: Windows cannot bind to xyz.com domain. (No Memory). Group Policy processing aborted. Also the Node2 In the cluster has stoped. how can i resolve these issues? please suggest me otherwise Node1 would also failover, bringing all down. Thanks!!! Sujit
    • Proposed as answer by Aresh Sarkari Monday, July 13, 2009 8:44 AM
    Monday, July 13, 2009 6:09 AM
  • Open the cluster logs and get to the point of the failure. Remember cluster logs are in GMT to convert the time into your timezone and see what caused the cluster to failover.. Compare the timestamp of the system logs with the cluster logs .. I am sure you will find the RC of the failover

    Few Important points to check:

    Networking -- Cluster Network Priority\Network Binding Orders and PSS should be disable back in registry
    Storage -- Latest version of storport.sys drivers MS KB 957910

    Let me know, if you need any help

    Regards,
    Aresh

    • Marked as answer by tsujit Monday, July 20, 2009 5:55 AM
    Monday, July 13, 2009 8:51 AM
  • Time to open a support case with us.
    Chuck Timon Senior, Support Escalation Engineer (SEE) Microsoft Corporation
    Monday, July 13, 2009 5:49 PM
    Moderator
  • Hi Aresh, Thanks!!! Now the Node2 in cluster is in UP state, but how will i come to know that this node2 is good and a failback can happen? how can i check that this node2 is ready for failback? i have checked the Server logs after the failover the Quorum disk got full. but now when i looked in Quorum disk, there is lot of space available. How will i troubleshoot this situation? Thanks!!! Sujit
    • Marked as answer by tsujit Monday, July 20, 2009 5:55 AM
    Wednesday, July 15, 2009 4:13 AM