locked
cluster fails to reset CNO password in AD RRS feed

  • Question

  • We have a WS2012 Hyper-V cluster. The cluster has DNS name of hvcluster.domain.local, cluster CNO object in AD called hvcluster$, 2 nodes called node1.domain.local (computer account node1$) and node2.domain.local (computer account node2$)

    The cluster CNO is in a failed state. As a consequence, its dynamic DNS record is missing and Live Migration doesn't work. The primary problem is that when I use the Repair option on the CNO, the repair will fail with the following error:

    "There was an error repairing the active directory object for "Cluster Name'. Details: There was an error resetting the active directory password for 'Cluster name'. Error code: 0x80005000'

    This isn't a new cluster, it's been running for about 2 years now, but this problem manifested recently. I'm aware of the AD requirements for the cluster and for testing purposes I've additionally granted Full Access on the hvcluster computer account to the cluster computer account itself and to both cluster nodes' computer objects (through a group that both nodes are members of).

    The account I used for the Repair action (and all other actions) is a member of the Domain Admins group.

    Since that didn't help, I've checked that Authenticated Users group is member of the local "Users" group on the cluster nodes. Additionally I've tried modifying local group policy per http://blogs.technet.com/b/askcore/archive/2013/04/04/new-network-name-resource-fails-to-come-online.aspx. That didn't help either.

    I've also checked that http://support.microsoft.com/kb/2838043 is installed on both cluster nodes.

    From the cluster log (excerpt):

    000014a8.00001014::2015/03/03-12:52:32.368 INFO  [RES] Network Name <Cluster Name>: AccountAD: OU name for VCO is OU=Hyper-V,DC=domain,DC=local
    000014a8.00001014::2015/03/03-12:52:32.383 INFO  [RES] Network Name:  [NN] Setting crypto access members for decrypt. New container = false.
    000014a8.00001014::2015/03/03-12:52:32.383 INFO  [RES] Network Name: [NNLIB] Priming local KDC cache to \\DC01.domain.local for domain domain.local
    000014a8.00001014::2015/03/03-12:52:32.383 INFO  [RES] Network Name: [NNLIB] PopulateKerbKDCLookupCache - DC flags 0
    000014a8.00001014::2015/03/03-12:52:32.383 INFO  [RES] Network Name: [NNLIB] LsaCallAuthenticationPackage success with a request of size 100, result size 0 (status: 0, subStatus: 0)
    000014a8.00001014::2015/03/03-12:52:32.383 INFO  [RES] Network Name: [NNLIB] Priming local KDC cache to \\DC01.domain.local for domain label domain
    000014a8.00001014::2015/03/03-12:52:32.383 INFO  [RES] Network Name: [NNLIB] LsaCallAuthenticationPackage success with a request of size 78, result size 0 (status: 0, subStatus: 0)
    000014a8.0000227c::2015/03/03-12:52:32.399 INFO  [RES] Network Name <Cluster Name>: Getting Read/Write private properties
    000014a8.00001014::2015/03/03-12:52:32.414 WARN  [RES] Network Name: [NNLIB] LogonUserEx fails for user HVCLUSTER$: 1326 (useSecondaryPassword: 0)
    000014a8.0000227c::2015/03/03-12:52:32.430 INFO  [RES] Network Name <Cluster Name>: Getting Read only private properties
    000014a8.00001014::2015/03/03-12:52:32.446 WARN  [RES] Network Name: [NNLIB] LogonUserEx fails for user HVCLUSTER$: 1326 (useSecondaryPassword: 1)
    000014a8.00001014::2015/03/03-12:52:32.446 INFO  [RES] Network Name: [NNLIB] Logon failed for user HVCLUSTER$ (Error 1326), DC \\DC01.domain.local, domain domain.local
    000014a8.00001014::2015/03/03-12:52:32.446 ERR   [RES] Network Name:  [NN] GetToken - Logging on as the CNO failed with error 1326
    000014a8.00001014::2015/03/03-12:52:32.446 INFO  [RES] Network Name <Cluster Name>: AccountAD: End of Slow Operation, state: Initializing/Writing, prevWorkState: Writing
    000014a8.00001014::2015/03/03-12:52:32.446 WARN  [RES] Network Name <Cluster Name>: AccountAD: Slow operation has exception ERROR_INVALID_HANDLE(6)' because of '::ImpersonateLoggedOnUser( GetToken() )'
    000014a8.0000227c::2015/03/03-12:52:32.446 INFO  [RES] Network Name: Agent: OnInitializeReply, Failure on (6b0ee668-0731-4252-b066-dd657fd23f25,AccountAD): 6
    000014a8.0000227c::2015/03/03-12:52:32.446 INFO  [RES] Network Name <Cluster Name>: Configuration: InitializeReplyCreation of NetName (type Singleton), result: 6, IsCanceled: false
    00001fdc.000018ac::2015/03/03-12:52:32.446 INFO  [GEM] Sending 1 messages as a batched GEM message
    000014a8.0000227c::2015/03/03-12:52:32.446 INFO  [RES] Network Name <Cluster Name>: Configuration: Setting 'StatusKerberos' in clusdb returned status 0
    000014a8.0000227c::2015/03/03-12:52:32.446 INFO  [RES] Network Name <Cluster Name>: Configuration: Deleting ResourceData, CreatingDC, ObjectGUID for a newly created netname from cluster database
    00001fdc.000018ac::2015/03/03-12:52:32.446 INFO  [GEM] Sending 1 messages as a batched GEM message
    000014a8.000021c4::2015/03/03-12:52:32.461 INFO  [RES] Network Name <Cluster Name>: Getting Read/Write private properties
    00001fdc.000018ac::2015/03/03-12:52:32.461 INFO  [GEM] Sending 1 messages as a batched GEM message
    000014a8.0000227c::2015/03/03-12:52:32.477 INFO  [RES] Network Name: Agent: OnInitializeReply, Failure on (6b0ee668-0731-4252-b066-dd657fd23f25,Configuration): 6
    000014a8.0000227c::2015/03/03-12:52:32.477 INFO  [RES] Network Name <Cluster Name>: SyncReplyHandler Configuration, result: 6
    000014a8.00001568::2015/03/03-12:52:32.477 INFO  [RES] Network Name <Cluster Name>: PerformOnline - Initialization of Configuration module finished with result: 6
    000014a8.00001568::2015/03/03-12:52:32.477 ERR   [RES] Network Name <Cluster Name>: Online thread Failed: ERROR_SUCCESS(0)' because of 'Initializing netname configuration for Cluster Name failed with error 6.'
    000014a8.00001568::2015/03/03-12:52:32.477 INFO  [RES] Network Name <Cluster Name>: All resources offline. Cleaning up.
    000014a8.00001568::2015/03/03-12:52:32.477 ERR   [RHS] Online for resource Cluster Name failed.

    Any ideas? Btw. I've been through many articles like: https://support.microsoft.com/kb/2838043/, https://social.technet.microsoft.com/forums/windowsserver/en-us/2ad0afaf-8d86-4f16-b748-49bf9ac447a3/ws2012-cluster-network-dns-issues, http://blogs.technet.com/b/askcore/archive/2013/04/04/new-network-name-resource-fails-to-come-online.aspx, http://blogs.technet.com/b/askcore/archive/2012/09/25/cno-blog-series-increasing-awareness-around-the-cluster-name-object-cno.aspx etc.

    Tuesday, March 3, 2015 1:30 PM

All replies

  • Hi MarkosP,

    The  error 0x80005000 often occur when CNO is corrupt in AD or CNO and VCO not in same ou, please open the Active Directory users and computers and confirm whether the CNO was not under the same OU, if not please move them to the default Computer OU and gave the CNO full permission on the default computer OU.

    More information:

    Recovering a Deleted Cluster Name Object (CNO) in a Windows Server 2008 Failover Cluster

    http://blogs.technet.com/b/askcore/archive/2009/04/27/recovering-a-deleted-cluster-name-object-cno-in-a-windows-server-2008-failover-cluster.aspx

    I’m glad to be of help to you!


    Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Support, contact tnmff@microsoft.com

    Wednesday, March 4, 2015 5:47 AM
  • Hi Alex. I've moved the cluster CNO object to the default Computers container in AD, gave the CNO full access (with full inheritance) on the container, but that didn't help either. I got the same error when I tried the Repair action or when I moved core cluster resources from node to node.
    Wednesday, March 4, 2015 7:50 AM
  • I'm not so sure you can just delete the CNO for an EXISTING cluster and create a new one (with different SID/objectGUID). Are you sure about that?

    Why would there be articles about recovering deleted CNOs like http://blogs.technet.com/b/askcore/archive/2009/04/27/recovering-a-deleted-cluster-name-object-cno-in-a-windows-server-2008-failover-cluster.aspx
    • Edited by MarkosP Wednesday, March 4, 2015 11:09 AM
    Wednesday, March 4, 2015 11:08 AM
  • I'm sorry, don't remove it, it's a huge mistake.

    Regards, Samir Farhat Infrastructure Consultant


    Wednesday, March 4, 2015 11:17 AM
  • You were using the cluster for 2 years and the issue started manifesting recently ?

    Is there any change on your Active Directory platform ? : Upgrade, Updates, Group policy...


    Regards, Samir Farhat Infrastructure Consultant

    Wednesday, March 4, 2015 11:24 AM
  • I have been a lot of times in such issues. With Windows Server 2012 , things changed and the Active Directory configuration have to meet the fail over cluster requirements.

    Usually it's :

    - Permissions on the CNO object

    - Caused by group policies

    Did you tried this: Place the CNO and the nodes in a new OU and block the inheritance, GPO update and retest

    http://blogs.technet.com/b/askcore/archive/2012/03/27/why-is-the-cno-in-a-failed-state.aspx


    Regards, Samir Farhat Infrastructure Consultant

    • Proposed as answer by PaulRAYCAL Monday, March 9, 2015 12:06 AM
    • Unproposed as answer by MarkosP Tuesday, March 10, 2015 1:52 PM
    Wednesday, March 4, 2015 11:32 AM
  • I've tried this:

     - created new OU in the AD

     - granted Full Access permissions on this OU (with full inheritance) to the CNO and cluster nodes (computer accounts)

     - moved the CNO and nodes computer accounts to this OU

     - blocked GPO inheritance on this OU

     - ran gpupdate /force on both nodes

    Then I re-ran the Repair action and also tried to move the core cluster resources from node to node, still getting the same error.

    There was a problem with CAU update of the cluster approx. 2 weeks ago and I had to go through several reboot cycles to get the cluster working properly and that's when I noticed the problem with the CNO being in a failed state.

    Wednesday, March 4, 2015 12:59 PM
  • So this problem appeared after issues with Windows updates ?

    Are all the cluster nodes affected by this update issue ?

    What are the logged cluster console events ? only this one ?

    "There was an error repairing the active directory object for "Cluster Name'. Details: There was an error resetting the active directory password for 'Cluster name'. Error code: 0x80005000'


    Regards, Samir Farhat Infrastructure Consultant

    Wednesday, March 4, 2015 1:06 PM
  • Not sure if the problem appeared after the failed CAU update or before.

    Actually the Cluster Events log view in FCM doesn't even contain the error I described above. That can be only seen when you use the "Repair" action on the CNO and it fails and you can view the "Information details" of that event. Found this event in the FailoverClustering-Manager/Admin log actually, it seems the Cluster events view in FCM doesn't contain events from this log. The cluster events log only logs the generic problem with the CNO going online:

     - Cluster resource 'Cluster Name' of type 'Network Name' in clustered role 'Cluster Group' failed. (eventid 1069)

     - The Cluster service failed to bring clustered service or application 'Cluster Group' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application. (eventid 1205)

     - followed by (not surprisingly) Clustered role 'Cluster Group' has exceeded its failover threshold.  It has exhausted the configured number of failover attempts within the failover period of time allotted to it and will be left in a failed state.  (eventid 1254)

    There are also 2 other problems logged, both regarding DNS registration - once for the CAU VCO (hvcluscau) and once for a FileServer cluster group VCO (FS01):

     - Cluster network name resource 'hvcluscau' failed registration of one or more associated DNS name(s) for the following reason: The handle is invalid. (eventid 1196)

     - Cluster network name resource 'FS01' failed registration of one or more associated DNS name(s) for the following reason: The handle is invalid. (eventid 1196)

    Records for both (hvcluscau and fs01) are actually in the DNS. Before you ask, DNS is working fine though and there are no errors - regular domain members server can register and update records just fine.

    Wednesday, March 4, 2015 1:21 PM
  • Hi MarkosP,

    Please install the following update then monitor this issue again.

    Recommended hotfixes and updates for Windows Server 2012-based failover clusters

    http://support.microsoft.com/kb/2784261

    I’m glad to be of help to you!


    Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Support, contact tnmff@microsoft.com

    Monday, March 9, 2015 8:09 AM
  • Hi Alex.

    I've installed missing updates from that list (some were already installed, 1 was not applicable) on both nodes, however the issue didn't go away and I still get the same error.

    Monday, March 9, 2015 10:23 PM
  • Hi MarkosP,

    Which update "was not applicable" we must narrow down the issue area, please offer us which update you can not installed and the related error information, most times folks can not insatll a update because they don't have the don't meet that update dependent requirement, you can search the internet then find out the update requirement and fix it.

    If you can not install any update you also can choose reset your Update Commponents.

    How do I reset Windows Update components?

    http://support.microsoft.com/kb/971058

    Best Regards,


    Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Support, contact tnmff@microsoft.com

    • Proposed as answer by Alex Lv Tuesday, March 17, 2015 3:32 AM
    • Unproposed as answer by MarkosP Tuesday, March 17, 2015 5:22 AM
    Thursday, March 12, 2015 1:22 AM
  • KB976424 reported error "Installer encountered an error: 0x80096002. The certificate for the signer of the message is invalid or not found". I've tried redownloading this KB, but got the same error.

    KB2913695 reported "The update was not applicable to your computer"

    Following KBs were installed: KB2878635-v3, KB2894464, KB2916993, KB2929869-v2, KB3004098-v2

    Tuesday, March 17, 2015 5:21 AM
  • Hi Mark,  Did you ever get a answer or resolve the issue?

    If so, can you please post the solution here?

    Thanks in advance.

    Wednesday, February 24, 2016 1:14 AM
  • Hi. Unfortunately no. The thread got ignored apparently after my last reply. We have fairly custom and tight AD security in place and from my experience, if you can, give the nodes and cluster CNO full permissions over the OU where the objects are located. I have still had problems even with that  configuration since the security is tightened on the parent OUs too. But as described above in this thread, even moving to the the objects to the default Computers container didn't really help.

    I have since deployed some more (WS2012R2) clusters and I haven't encountered the problem with those so far.

    As for the WS2012 cluster that was experiencing the problem, it's still in production and spamming the eventlog with that error. Otherwise it is working fine. Meaning, the CNO is still down (in a Failed state), but I can connect and manage the cluster using the DNS name (even remotely) fine despite that. We haven't opened a support ticket with MS about this.


    • Edited by MarkosP Thursday, February 25, 2016 7:30 AM
    Thursday, February 25, 2016 7:29 AM
  • Thanks for the update.  Too bad the thread got ignored.

    Regards,

    Friday, February 26, 2016 2:31 AM
  • For starters, NEVER delete the CNO as any form of troubleshooting steps... that is the single worst thing you could ever do to a cluster and you will end up rebuilding the cluster.

    The issue here is that your overzealous domain admin's have locked things down to the point that things are starting to break.  They have restricted the Reset Password permissions on the Cluster Name Object (CNO) for that computer object.  Look at the OU's permissions and ensure Reset Password.

    To run Repair, your Domain User account must have Reset Password.  Also, cluster manages the passwords for it's computer objects.  The CNO must have Reset Password to rotate it by the domain policy (every 30 days by default) and the CNO needs to have Reset Password on all the VCO's.

    See this doc for more details
    https://technet.microsoft.com/en-us/library/cc731002%28v=ws.10%29.aspx?f=255&MSPPError=-2147217396

    Thanks!
    Elden

    Saturday, February 27, 2016 5:28 PM
  • Hello Elden, thanks for chiming in, I appreciate it. However I have to disagree, the issue is not overzealously locked down environment (althought it is locked down, but not in the sense you're describing). As said above in the thread, the CNO has FULL permissions on itself, the OU where the CNO and VCOs are located and on all the VCOs and it still doesn't work even with those permissions.

    Anyway, the issue is not important to us anymore, as the problematic cluster will be phased out soon and we don't have those issues on 2012R2 clusters (living on the same AD and OUs as the 2012 cluster).

    Thursday, March 3, 2016 1:24 PM