Cluster network failure
-
Thursday, June 07, 2012 2:38 AM
Hello Folks,
I am having a strange problem with a 3-node failover cluster. Every two days or so my "cluster" and "public" network are going down. The following is a sample of the two messages registered for each server:
"Cluster network interface 'NODE4 - Local Area Connection 7' for cluster node 'NODE4' on network 'Public' failed.
"Cluster network interface 'NODE4 - Local Area Connection 6' for cluster node 'NODE4' on network 'Cluster' failed.
As I mentioned the event viewer shows this for all three servers at the same time. I've checked my managed switches and they don't show a loss of connectivity and there are no significant errors in the log to indicate anything is wrong. Maybe with one exception. Present at every failure on each server is a time-service informational message. Event Id 37 - "The time provider NtpClient is currently receiving valid time data from..."
The whole failure lasts less than a minute and, of course, once the networks go down I start getting CSV Event Id 5120 messages about a bad network and the queuing of traffic.
I am stumped - any thoughts?
Thank you,
Scott
Scott
- Edited by Scott S Sikora Thursday, June 07, 2012 2:44 AM
All Replies
-
Thursday, June 07, 2012 6:17 AMModerator
Hi,
Did you use the latest drivers on the nodes and are you using the latest ios on the switches.
How is your NIC configured auto/auto , does it match the switch ? if you use 1gb full then the switch should have the same setting.
Greetings, Robert Smit [MVP] http://robertsmit.wordpress.com/ “Please click "Vote As Helpful" if it is helpful for you and Proposed As Answer”
-
Thursday, June 07, 2012 1:11 PM
Robert,
Thanks for the suggestions. I do have the latest network drivers and the switch OS is up to date. I will have to verify all the settings on the NIC's and get back to you on that. The wierd thing, to me at least, is that all the nodes go down at once. I would expect if there were setting problems on the NIC's that nodes would go up and down now and then randomly but all my node run fine for a couple of days and then all go down together.
-Scott
Scott
-
Thursday, June 07, 2012 1:20 PM
Anything in your environment scheduled to run at the same time you are having your problems? Notorious for causing weird problems are antivirus solutions that are not set up to properly exclude certain components.
timcerl
-
Thursday, June 07, 2012 1:26 PM
Tim,
I don't see anything yet but it seems like it might be related to something like that. Below are the last three instances of failure:
06-06 @ 5:19pm
06-04 @ 4:10pm
06-01 @ 3:53pm
Regards,
Scott
Scott
-
Thursday, June 07, 2012 1:46 PMModerator
Hi,
I have seen this and could be related to network switching.
and check you patch management and or if you have SCCM check if there are running any deployments that are troubling on this server.
Greetings, Robert Smit [MVP] http://robertsmit.wordpress.com/ “Please click "Vote As Helpful" if it is helpful for you and Proposed As Answer”
- Marked As Answer by Vincent HuModerator Monday, July 09, 2012 3:26 AM

