Exchange 2010 DAG and FailoverClustering 1135 & 1177
-
Wednesday, March 14, 2012 2:52 PM
Hi all.
We have the current Exchange 2010 configuration, using a DAG:
SITE A - the US
(2) CAS servers
(2) Mailbox ServersSITE B - in Europe
(2) CAS servers
(2) Mailbox ServersThis morning, all of the mailboxes on the second node in SITE A failed over to node 1 in SITE A. SITE B in Europe was unaffected.
We are getting FailoverClustering events in the System log on nodes in SITE A and SITE B. Node B in SITE A (which currently has no databases mounted on it) had 1177 events in the event log:
FailoverClustering 1135 (appearing on ALL servers)
Cluster node '___MAIL01' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.FailoverClustering 1177 (appearing on Node B in SITE A)
The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk.
Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridgesThe DAG Hotfix (http://blogs.technet.com/b/exchange/archive/2011/11/20/recommended-windows-hotfix-for-database-availability-groups-running-windows-server-2008-r2.aspx) was installed about a month ago, as we were seeing these events. It happened twice already this week.
Any ideas where to start or the possible cause?
Brandon Carder
All Replies
-
Wednesday, March 14, 2012 6:27 PM
Hello Brandon,
This issue will occur if the cluster service creashes intermittently on the active node or of if the server loses connection with the witness server.
Few steps that you can try is add the Exchange Trusted Subsystem to the Witness directory and give full permissions to it, also you can add the administrator account and give full permissions.
Regards,
Deepak
-
Wednesday, March 14, 2012 6:58 PM
You're saying it needs certain permissions for what reason? To keep the connection? I'm somewhat confused. Can you please elaborate.
Thanks in advance, Deepak!
Brandon Carder
-
Wednesday, March 14, 2012 6:58 PM
Check out:
Decreasing Exchange 2010 DAG Failover Sensitivity by Increasing Cluster Timeout Values
So SiteB did not have any issues?
You may want to also genrerate the cluster logs. At least keep them if you need to open a case with Microsoft
cluster log /g
- Marked As Answer by Gavin-Zhang Wednesday, April 04, 2012 5:34 AM
-
Wednesday, March 14, 2012 6:59 PM
Few things you can try here. - http://social.technet.microsoft.com/Forums/hu-HU/exchange2010/thread/d2cf133d-052e-429f-8fad-e669a47e4192Sukh
- Marked As Answer by Gavin-Zhang Wednesday, April 04, 2012 5:35 AM
-
Friday, March 16, 2012 9:11 AM
Hi Brandon,
Any update about your issue, in fact, there are many reasones may cause the issue, above also gave some good information, pleae check them.
In my opinion, it is mostly caused by the network issue, such as the network delay, the NIC issue, and so on.
Please also refer to below KBs, update it
A transient communication failure causes a Windows Server 2008 R2 failover cluster to stop working
http://support.microsoft.com/kb/2550886
Cluster service still uses the default time-out value after you configure the regroup time-out setting in Windows Server 2008 R2
http://support.microsoft.com/default.aspx?scid=kb;en-US;2549448
Windows Server 2008 R2 failover cluster loses quorum when an asymmetric communication failure occurs
http://support.microsoft.com/default.aspx?scid=kb;en-US;2552040
Regards!
Gavin
TechNet Community Support
- Edited by Gavin-Zhang Friday, March 16, 2012 9:19 AM
- Edited by Gavin-Zhang Friday, March 16, 2012 9:27 AM
- Marked As Answer by Gavin-Zhang Wednesday, April 04, 2012 5:36 AM

