2 hyper-v nodes crash
-
2012年6月16日 下午 03:48
Hi,
we have 5 hyper-v nodes. Suddenly all vms on 2 nodes (1N and 2N) crashed. They were rebooted and moved to other nodes. I want to investigate what has happened. On N1 and N2 I just see that nodes were not able to access 'C:\ClusterStorage\Volume2\VMs\...' , 'C:\ClusterStorage\Volume3\VMs\...'. From this point it seems my Cluster Storage Volumes failed. But on other nodes I did not found these errors. They worked fine. These nodes just reported:
Cluster node 'n1' and 'n2' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
It seems that storage is fine and it could be the problem with SAN switch or maybe this could couse the problem on n1 and n2:
Event 1030. The processing of Group Policy failed. Windows attempted to retrieve new Group Policy settings for this user or computer. Look in the details tab for error code and description. Windows will automatically retry this operation at the next refresh cycle. Computers joined to the domain must have proper name resolution and network connectivity to a domain controller for discovery of new Group Policy objects and settings. An event will be logged when Group Policy is successful.
Event ID 4. The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server dc2$. The target name used was cifs/dc2.domain.local. This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal name (SPN) is registered on an account other than the account the target service is using. Please ensure that the target SPN is registered on, and only registered on, the account used by the server. This error can also happen when the target service is using a different password for the target service account than what the Kerberos Key Distribution Center (KDC) has for the target service account. Please ensure that the service on the server and the KDC are both updated to use the current password. If the server name is not fully qualified, and the target domain (DOMAIN.LOCAL) is different from the client domain (DOMAIN.LOCAL), check if there are identically named server accounts in these two domains, or use the fully-qualified name to identify the server.
thanks,
n
所有回覆
-
2012年6月16日 下午 04:29
Please help answer question for further analysis.
1. Can 'cluster service' start on node 1 and 2?
2. What is connection to SAN (FC or iSCSI)?
3. Did you set SPN for register virtual name to AD manually?
4. Did you ever destroy the cluster and recreate with the same name or evict node?
5. Can all nodes ping the AD and gateway when the problem occur?
___________________________________________________ Naruphon blog: http://www.vm360degree.com
-
2012年6月16日 下午 05:55
yes now it works everything. The mistake I did, I migrated one dc from old (sr2) to new dc (dc2) server, by uninstalling dc role from 2003 then reinstalling to new server 2008. But forgot to delete old dc computer account from AD. So that was the problem of error 4 and 1030. But it was 1 month ago and all this time I was getting this error just did not noticed that. Today suddenly on 2 hyper-v servers which are registered to this new dc just crashed. I want to know if it was the case with error 4 and 1030.
SAN is FC connection. No errors on EVA. I did not touch SPN. I never touched cluster or evicted nodes. I just noticed what happened then everything was solve by itself. But I found also these errors:
Event 1014. Name resolution for the name www.msftncsi.com timed out after none of the configured DNS servers responded.
Event 4201. Isatap interface isatap.{C08A07F9-E87E-483C-9C39-55235A5AA4D8} is no longer active.
Event 1014. Name resolution for the name 10.10.10.in-addr.arpa timed out after none of the configured DNS servers responded.
Event 1014. Name resolution for the name domain.local timed out after none of the configured DNS servers responded.
Event 1167. Cluster Agent: The cluster resource Virtual Machine VM1 has become degraded. [SNMP TRAP: 15005 in CPQCLUS.MIB]
Event 5719. This computer was not able to set up a secure session with a domain controller in domain DOMAIN due to the following: The remote procedure call failed. This may lead to authentication problems. Make sure that this computer is connected to the network. If the problem persists, please contact your domain administrator. ADDITIONAL INFO. If this computer is a domain controller for the specified domain, it sets up the secure session to the primary domain controller emulator in the specified domain. Otherwise, this computer sets up the secure session to any domain controller in the specified domain.
Event 7036. The Cluster Service service entered the stopped state.
Event 1038. Ownership of cluster disk 'Cluster Disk 7' has been unexpectedly lost by this node. Run the Validate a Configuration wizard to check your storage configuration. Event 7024. The Cluster Service service terminated with service-specific error A quorum of cluster nodes was not present to form a cluster.
Event 1014. Name resolution for the name _ldap._tcp.Default-First-Site-Name._sites.domain.local timed out after none of the configured DNS servers responded.
Event 1177. The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
Event 7024. The Cluster Service service terminated with service-specific error A quorum of cluster nodes was not present to form a cluster.
Event 131. NtpClient was unable to set a domain peer to use as a time source because of DNS resolution error on ''. NtpClient will try again in 3473457 minutes and double the reattempt interval thereafter. The error was: The requested name is valid, but no data of the requested type was found. (0x80072AFC).And after some time.
Event 7036. The Cluster Service service entered the running state.
Event 37. The time provider NtpClient is currently receiving valid time data from dc2 (ntp.d|0.0.0.0:123->10.10.10.251:123).
Event 4200. Isatap interface isatap.{C08A07F9-E87E-483C-9C39-55235A5AA4D8} with address fe80::5efe:169.254.5.52 has been brought up.could all this happen of events 4 and 1030 or I should check my network with nerwork guys?
thanks
- 已編輯 natip 2012年6月16日 下午 06:02
-
2012年6月16日 下午 07:41
It seems that Cluster service stoped suddenly. Could it be cause of event 1030 and 4. Or could it be that because quorum was lost. Could should be sure where is the problem?
thanks

