none
cluster service fail many time

    Question

  • Problem : cluster service fail many time

    Action :
    Check the event log of MBx1 server and MBX2 sever respectively

    the system log of MBX1

    2/25/2012 1:35      Service Control Manager      7036      None      The DPMRA service entered the stopped state.
    2/25/2012 1:30      Service Control Manager      7036      None      The WinHTTP Web Proxy Auto-Discovery Service service entered the running state.
    2/25/2012 1:30      Service Control Manager      7036      None      The DPMRA service entered the running state.
    2/25/2012 1:27      Service Control Manager      7036      None      The WinHTTP Web Proxy Auto-Discovery Service service entered the stopped state.
    2/25/2012 1:25      Microsoft-Windows-Iphlpsvc

         4200      None      Isatap interface isatap.{E187D6F8-2C88-4C34-B7F3-DA16A279E203} with address fe80::5efe:169.254.1.195 has been brought up.
    2/25/2012 1:25      Microsoft-Windows-Time-Service      35      None      The time service is now synchronizing the system time with the time source adddomaincontroller.mydomaindc.local(ntp.d|0.0.0.0:123->x.x.x.x:123).
    2/25/2012 1:25      Microsoft-Windows-Iphlpsvc     4201      None      Isatap interface isatap.{E187D6F8-2C88-4C34-B7F3-DA16A279E203} is no longer active.
    2/25/2012 1:23      Microsoft-Windows-Iphlpsvc     4200      None      Isatap interface isatap.{E187D6F8-2C88-4C34-B7F3-DA16A279E203} with address fe80::5efe:169.254.1.195 has been brought up.
    2/25/2012 1:23      Microsoft-Windows-Iphlpsvc     4201      None      Isatap interface isatap.{E187D6F8-2C88-4C34-B7F3-DA16A279E203} is no longer active.
    2/25/2012 1:23      Microsoft-Windows-FailoverClustering     1135      Node Mgr      Cluster node 'mbx2' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
    2/25/2012 1:15      Service Control Manager      7036      None      The Windows Modules Installer service entered the stopped state.
    2/25/2012 1:15      Service Control Manager      7040      None      The start type of the Windows Modules Installer service was changed from auto start to demand start.


    the system log of  MBX2

    2/25/2012 1:26      Microsoft-Windows-Iphlpsvc     4200      None      Isatap interface isatap.{D4E20B45-B4F1-4414-BED1-1BA8F1B0E72D} with address fe80::5efe:169.254.2.79 has been brought up.
    2/25/2012 1:26      Service Control Manager      7036      None      The Cluster Service service entered the running state.
    2/25/2012 1:26      Microsoft-Windows-Iphlpsvc     4201      None      Isatap interface isatap.{D4E20B45-B4F1-4414-BED1-1BA8F1B0E72D} is no longer active.
    2/25/2012 1:26      Service Control Manager      7031      None      The Cluster Service service terminated unexpectedly.  It has done this 1 time(s).  The following corrective action will be taken in 60000 milliseconds: Restart the service.
    2/25/2012 1:26      Service Control Manager      7024      None      The Cluster Service service terminated with service-specific error A quorum of cluster nodes was not present to form a cluster..
    2/25/2012 1:26      Service Control Manager      7036      None      The Cluster Service service entered the stopped state.
    2/25/2012 1:26      Microsoft-Windows-FailoverClustering     1177      Quorum Manager      "The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk.
    Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges."
    2/25/2012 1:25      Microsoft-Windows-FailoverClustering     1069      Resource Control Manager      Cluster resource 'File Share Witness (\\hc1.mydomaindc.local\DAG1.mydomaindc.local)' in clustered service or application 'Cluster Group' failed.
    2/25/2012 1:25      Microsoft-Windows-FailoverClustering     1564      File Share Witness Resource      File share witness resource 'File Share Witness (\\hc1.mydomaindc.local\DAG1.mydomaindc.local)' failed to arbitrate for the file share '\\hc1.mydomaindc.local\DAG1.mydomaindc.local'. Please ensure that file share '\\hc1.mydomaindc.local\DAG1.mydomaindc.local'exists and is accessible by the cluster.
    2/25/2012 1:25      Service Control Manager      7036      None      The Application Experience service entered the stopped state.
    2/25/2012 1:24      Microsoft-Windows-Time-Service      35      None      The time service is now synchronizing the system time with the time source adddomaincontroller.mydomaindc.local(ntp.d|0.0.0.0:123->x.x.x.x:123).
    2/25/2012 1:24      Microsoft-Windows-FailoverClustering     1135      Node Mgr      Cluster node 'MBX01' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
    2/25/2012 1:24      Microsoft-Windows-Kernel-General      1      None      The system time has changed to ‎2012‎-‎02‎-‎24T23:24:50.552000000Z from ‎2012‎-‎02‎-‎24T23:23:38.649192500Z.

    I don’t know till now what cause cluster service to be down .why Isatab went down
    What logs I have to enable ,what else I can do to diagnose this problem
    Kindly advise


     Our topology

    2 clustered mailbox server with  exchange 2010 service pack 1
    2 network balanced hubcas server with  exchange 2010 service pack 1

    Sunday, February 26, 2012 6:19 PM

All replies

  • Hi,

    It looks there is a problem with your File Share Witness. Do you have a Witness configured within your DAG? And if so do you see any files within this share. Also dump a screenshot of permissions on this share (share and ntfs permissions).

    Regards,

    Bart Timmermans


    KPN Consulting - Technical Consultant www.bart-timmermans.nl Mark as Answer, if it is answer for your Question. Vote as Helpful, if it is helpful to you.

    Sunday, February 26, 2012 8:57 PM
  • witness configured on my hubcas server number 1 ,my DAGFileShareWitnesses has folder named DAG1.mydomain.local and contains b54708c3-e71b-4a4c-aae7-20ad1a9cdb17 folder that has 2 files Witness.log and VerifyShareWriteAccess


    ntfs permission

    c:\DAGFileShareWitnesses NT AUTHORITY\SYSTEM:(OI)(CI)(ID)F
                             BUILTIN\Administrators:(OI)(CI)(ID)F
                             BUILTIN\Users:(OI)(CI)(ID)R
                             BUILTIN\Users:(CI)(ID)(special access:)
                                                   FILE_APPEND_DATA

                             BUILTIN\Users:(CI)(ID)(special access:)
                                                   FILE_WRITE_DATA

                             CREATOR OWNER:(OI)(CI)(IO)(ID)F

    share properties:

    Note that exchange is on virtual machines of vmware.


    • Edited by om zeyad Monday, February 27, 2012 8:39 AM
    Monday, February 27, 2012 8:27 AM
  • Maybe this blog post can help you?

    http://www.thecabal.org/2010/08/manually-creating-a-dag-fsw-for-exchange-2010/

    Compare it with your setup


    Jonas Andersson | Microsoft Community Contributor Award 2011 | MCITP: EMA 2007/2010 | Blog: http://www.testlabs.se/blog | Follow me on twitter: jonand82

    Monday, February 27, 2012 9:11 AM
  • Hi,

    Can you check the following. Is the Exchange Trusted Subsystem member of the local administrators group of the server hosting your file witness share?

    Regards,

    Bart


    KPN Consulting - Technical Consultant www.bart-timmermans.nl Mark as Answer, if it is answer for your Question. Vote as Helpful, if it is helpful to you.

    Monday, February 27, 2012 9:22 AM
  • yes Bart ,exchange trusted subsystem is member of the local administrator group

    Monday, February 27, 2012 12:53 PM
  • thanks jonas but I already have a file share witness
    Monday, February 27, 2012 12:56 PM
  • I think more time needs to be spent looking at the logs to see IF RCA can be done.  I have seen many cases where the service just fails and most of the time it's been down to network issue or Exch on 2008 which is missing some hotfixes OS level.

    Have you got this applied - http://support.microsoft.com/kb/2550886

    http://blogs.technet.com/b/exchange/archive/2011/11/20/recommended-windows-hotfix-for-database-availability-groups-running-windows-server-2008-r2.aspx

    http://social.technet.microsoft.com/wiki/contents/articles/2008.list-of-cluster-hotfixes-for-windows-server-2008-r2.aspx

    Sukh

    Monday, February 27, 2012 2:39 PM
  • thanks jonas but I already have a file share witness

    What I meant was that you should compare the permissions from the blog with the ones in your environment


    Jonas Andersson | Microsoft Community Contributor Award 2011 | MCITP: EMA 2007/2010 | Blog: http://www.testlabs.se/blog | Follow me on twitter: jonand82

    Monday, February 27, 2012 3:46 PM
  • Hello,

    I see this information "The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk.", please follow this document to check for network configuration and quorum configuration:

    Event ID 1177 — Quorum and Connectivity Needed for Quorum
    http://technet.microsoft.com/en-us/library/cc773498(v=ws.10).aspx

    Thanks,

    Evan


    Evan Liu

    TechNet Community Support

    Tuesday, February 28, 2012 9:00 AM
  • thanks jonas i'll do that
    Tuesday, February 28, 2012 4:54 PM
  • Are there any update on the issue?


    Jonas Andersson | Microsoft Community Contributor Award 2011 | MCITP: EMA 2007/2010 | Blog: http://www.testlabs.se/blog | Follow me on twitter: jonand82

    Monday, March 05, 2012 1:44 PM
  • I've seen similar issues on SQL and Exchange 2010 clusters where people resolved this by disabling the IP Helper Service. In case someone else runs into this issue too.

    Did my post help? Please use "Vote As Helpful", "Mark as answer" or "Propose as answer". Thank you!

    Tuesday, January 22, 2013 7:27 AM