none
WSFC broken, please help diagnose

    Întrebare

  • I have a 2016 WSFC with file server role. 2 Nodes in the cluster shared storage. We lost Power to Node2 which died, when bringing it back up it wont join the cluster (shows 'Down' in failover cluster manager). If I shut down the entire cluster completley and start it on Node2 first, Node2 runs the cluster fine but Node1 now wont join the cluster (shows 'Down')

    As far as I can tell all connectivity seems fine, I've turned off windows firewall, the network between the two servers is working fine and no firewalls in between the two nodes. Other clusters are running on the same infrastructure.

    The only hints in failover cluster manager are that the Network connection for Node2 shows as offline (the network is up and working has the allow traffic and management ticked, can ping, RDP etc.

    When I shutdown then restart the entire cluster Node2 first, roles become reversed, Node1 now shows network as offline, information details or crytical events for network have no entries

    Crytical Events for Node2 itself, when in down state show: Error 1653 Cluster node 'Node2' failed to join the cluster because it could not communicate over the network with any other node in the cluster. Verify network connectivity and configuration of any network firewalls. - however im not convinvced this is actually the issue because of the below error messages:

    The failover clustering log is as follows:

    00000774.00001c4c::2018/05/15-16:48:50.659 INFO  [Schannel] Server: Negotiation is done, protocol: 10, security level: Sign 00000774.00001c4c::2018/05/15-16:48:50.663 DBG   [Schannel] Server: Receive, type: MSG_AUTH_PACKAGE::Schannel, buf: 161 00000774.00001c4c::2018/05/15-16:48:50.712 DBG   [Schannel] Server: ASC, sec: 90312, buf: 2059 00000774.00001c4c::2018/05/15-16:48:50.728 DBG   [Schannel] Server: Receive, type: MSG_AUTH_PACKAGE::Schannel, buf: 1992 00000774.00001c4c::2018/05/15-16:48:50.730 DBG   [Schannel] Server: ASC, sec: 0, buf: 51 00000774.00001c4c::2018/05/15-16:48:50.730 DBG   [Schannel] Server: Receive, type: MSG_AUTH_PACKAGE::Synchronize, buf: 0 00000774.00001c4c::2018/05/15-16:48:50.730 INFO  [Schannel] Server: Security context exchanged for cluster 00000774.00001c4c::2018/05/15-16:48:50.735 DBG   [Schannel] Client: ISC, sec: 90312, buf: 178 00000774.00001c4c::2018/05/15-16:48:50.736 DBG   [Schannel] Client: Receive, type: MSG_AUTH_PACKAGE::Schannel, buf: 60 00000774.00001c4c::2018/05/15-16:48:50.736 DBG   [Schannel] Client: ISC, sec: 90312, buf: 210 00000774.00001c4c::2018/05/15-16:48:50.749 DBG   [Schannel] Client: Receive, type: MSG_AUTH_PACKAGE::Schannel, buf: 2133 00000774.00001c4c::2018/05/15-16:48:50.752 DBG   [Schannel] Client: ISC, sec: 90364, buf: 58 00000774.00001c4c::2018/05/15-16:48:50.753 DBG   [Schannel] Client: ISC, sec: 90364, buf: 14 00000774.00001c4c::2018/05/15-16:48:50.753 DBG   [Schannel] Client: ISC, sec: 90312, buf: 61 00000774.00001c4c::2018/05/15-16:48:50.754 DBG   [Schannel] Client: Receive, type: MSG_AUTH_PACKAGE::Schannel, buf: 75 00000774.00001c4c::2018/05/15-16:48:50.754 DBG   [Schannel] Client: ISC, sec: 0, buf: 0 00000774.00001c4c::2018/05/15-16:48:50.754 INFO  [Schannel] Client: Security context exchanged for netft 00000774.00001c4c::2018/05/15-16:48:50.756 WARN  [ClRtl] Cannot open crypto container (error 2148073494). Giving up. 00000774.00001c4c::2018/05/15-16:48:50.756 ERR   mscs_security::SchannelSecurityContext::AuthenticateAndAuthorize: (-2146893802)' because of 'ClRtlRetrieveServiceSecret(&secretBLOB)' 00000774.00001c4c::2018/05/15-16:48:50.756 WARN  mscs::ListenerWorker::operator (): HrError(0x80090016)' because of '[SV] Schannel Authentication or Authorization Failed' 00000774.00001c4c::2018/05/15-16:48:50.756 DBG   [CHANNEL 172.23.1.15:~56287~] Close().

    specifically:

    Server: Negotiation is done (aka they talked to eachother?)
    [ClRtl] Cannot open crypto container (error 2148073494). Giving up. mscs_security::SchannelSecurityContext::AuthenticateAndAuthorize: (-2146893802)' because of 'ClRtlRetrieveServiceSecret(&secretBLOB)' mscs::ListenerWorker::operator (): HrError(0x80090016)' because of '[SV] Schannel Authentication or Authorization Failed'

    I cant find many if any articles dealing with these messages, the only ones I can find, say to make sure permissions are correct on  %SystemRoot%\Users\All Users\Microsoft\Crypto\RSA\MachineKeys 

    I did have to change some of the permissions on these files but still couldnt join the cluster. Other than that im struggling to find any actual issues (SMB access from node1 to node2 appears to be fine, smb access from node2 to node1 appears to be fine, dns appears to be working fine, file share whitness seems to be fine)

    Finally the cluster vlaidations report shows these two errors as the only errors with the cluster

    Validate disk Arbitration: Failed to release SCSI reservation on Test Disk 0 from node Node2.domain: Element not found.

    Validate CSV Settings: Failed to validate Server Message Block (SMB) share access through the IP address of the fault tolerant network driver for failover clustering (NetFT). The connection was attempted with the Cluster Shared Volumes test user account, from node Node1.domain to the share on node Node2.domain. The network path was not found.

    Validate CSV Settings: Failed to validate Server Message Block (SMB) share access through the IP address of the fault tolerant network driver for failover clustering (NetFT). The connection was attempted with the Cluster Shared Volumes test user account, from node Node2.domain to the share on node Node1.domain. The network path was not found.

    other errors from the event logs

    ID5398 Cluster failed to start. The latest copy of cluster configuration data was not available within the set of nodes attempting to start the cluster. Changes to the cluster occurred while the set of nodes were not in membership and as a result were not able to receive configuration data updates. . Votes required to start cluster: 2 Votes available: 1 Nodes with votes: Node1 Node2  Guidance: Attempt to start the cluster service on all nodes in the cluster so that nodes with the latest copy of the cluster configuration data can first form the cluster. The cluster will be able to start and the nodes will automatically obtain the updated cluster configuration data. If there are no nodes available with the latest copy of the cluster configuration data, run the 'Start-ClusterNode -FQ' Windows PowerShell cmdlet. Using the ForceQuorum (FQ) parameter will start the cluster service and mark this node's copy of the cluster configuration data to be authoritative.  Forcing quorum on a node with an outdated copy of the cluster database may result in cluster configuration changes that occurred while the node was not participating in the cluster to be lost.

    ID4350 Cluster API call failed with error code: 0x80070046. Cluster API function: ClusterResourceTypeOpenEnum Arguments: hCluster: 4a398760 lpszResourceTypeName: Distributed Transaction Coordinator lpcchNodeName: 2

    Lastly I built another Server node3 to see if I could join it to the cluster but this fails:

    * The server 'Node3.domain' could not be added to the cluster. An error occurred while adding node 'Node3.domain' to cluster 'CLUS1'. Keyset does not exist

    ive done the steps here with no joy, http://chrishayward.co.uk/2015/07/02/windows-server-2012-r2-add-cluster-node-cluster-service-keyset-does-not-exist/



    joi, 17 mai 2018 10:30

Toate mesajele

  • Hi,
    Based on the complexity and the specific situation, we need do more researches. If we have any updates or any thoughts about this issue, we will keep you posted as soon as possible. Your kind understanding is appreciated. If you have further information during this period, you could post it on the forum, which help us understand and analyze this issue comprehensively.
    Sorry for the inconvenience and thank you for your understanding and patience.
    Best Regards,

    Frank

    Please remember to mark the replies as an answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com

    duminică, 20 mai 2018 09:09
  • Hi ,

    1. Did you restart after correcting the permissions on the C:\ProgramData\Microsoft\Crypto\RSA folder?

    2. Please uninstall Anti-Virus that could be blocking things.

    3. Please check if system keeps rewriting permissions on the C:\ProgramData\Microsoft\Crypto\RSA folder.

    4. Please check if the machine key have the same GUID in registry and the RSA folder.

    Checked the Registry for machine key value (HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Cryptography



    Checked the machine key folder (%ALLUSERSPROFILE%\Microsoft\Crypto\RSA\MachineKeys) and found the key has a different GUID than in registry.



    Refer below link to configure the permission.

    https://www.techielass.com/2016/10/windows-server-2012-r2-cluster-node-issues.html

    We recommend to use disk quorum in cluster, because it can receive and save the cluster configuration data updates.

       
    Best Regards,
    Frank

    Please remember to mark the replies as an answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com

    marți, 22 mai 2018 02:08
  • Hi,
    Just checking in to see if the information provided was helpful. Please let us know if you would like further assistance.

    Best Regards,

    Frank

    Please remember to mark the replies as an answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com

    vineri, 25 mai 2018 08:40
  • Hi,

    Was your issue resolved? 

    If you resolved it using our solution, please "mark it as answer" to help other community members find the helpful reply quickly.
    If you resolve it using your own solution, please share your experience and solution here. It will be very beneficial for other community members who have similar questions.
    If no, please reply and tell us the current situation in order to provide further help.


    Best Regards,
    Frank

    Please remember to mark the replies as an answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com

    luni, 28 mai 2018 08:57
  • apologies Ive been away from work for a week ill run through the steps in the next few days and let you know

    marți, 29 mai 2018 10:22
  • 1. Did you restart after correcting the permissions on the C:\ProgramData\Microsoft\Crypto\RSA folder?

    I restarted the Cluster last night for the first time using the shutdown cluster option then rebooting the computer, unfortunately I'm still unable to add a node to the cluster

    2. Please uninstall Anti-Virus that could be blocking things.

    Antivirus has been uninstalled and windows firewall turned off with no change in behaviour.

    3. Please check if system keeps rewriting permissions on the C:\ProgramData\Microsoft\Crypto\RSA folder.

    The permissions are still as i set them manually.

    4. Please check if the machine key have the same GUID in registry and the RSA folder.

    Checked the Registry for machine key value (HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Cryptography

    Checked the machine key folder (%ALLUSERSPROFILE%\Microsoft\Crypto\RSA\MachineKeys) and found the key has a different GUID than in registry.

    Can you please give me more information on how i perform the above step(4) unless below is accurate?

    in the registry under cryptography i see "machineGUID" which is value:
    127ffa054-****-****-****-************ (stars to hide actual value)


    In the "MachineKeys" Folder one of the keys file name is called:
    f686****************************_127ffa054
    -****-****-****-************

    The second half of the file name matches that of the machineGUID in the regestry



    Refer below link to configure the permission.

    https://www.techielass.com/2016/10/windows-server-2012-r2-cluster-node-issues.html

    I have used this code to set the permission then rebooted the cluster with no change in behaviour

    We recommend to use disk quorum in cluster, because it can receive and save the cluster configuration data updates.

    I have added a new Disk Witness with no change in behaviour 

    Id appreciate any further steps you can offer. thanks

    miercuri, 30 mai 2018 10:48