none
Storage Spaces Direct Issues with May 17th Update Rollup

    Întrebare

  • Hi,

    We have had major issues updating our hyperconverged S2D cluster with the May 17th 2018 Update Rollup.
    The issue occurred while rebooting each cluster node as the node was shutting down to reboot.

    Our cluster pool looked to have partially failed and some virtual machines crashed, failed over and restarted each time a node was rebooted to apply the update rollup.

    Firstly, some background. This is a 4 node cluster with fully validated Dell R730XD servers. Cluster validation tests are all passed with success including 'Verify Node & Disk Configuration' for SES supported config. We have also verified and validated our network configuration and switches with Dell.
    We ensured no storage jobs were running and that all virtual and physical disks were healthy. File share witness was online and available during the patching.

    We pause one node, then applied the update rollup, after successful installation clicked to reboot the node. As the node was shutting down we got the following events:

    Event ID: 1289: Source: Microsoft-Windows-FailoverClustering.

    The Cluster Service was unable to access network adapter "Microsoft Failover Cluster Virtual Miniport". Verify that other network adapters are functioning properly and check the device manager for errors associated with adapter "Microsoft Failover Cluster Virtual Miniport". If the configuration for adapter "Microsoft Virtual Miniport" has been changed, it may become necessary to reinstall the failover clustering feature on this computer.
    *******************************************************

    Event ID: 5395: Source: Microsoft-Windows-FailoverClustering.

    Cluster is moving the group for storage pool 'Cluster Pool 1' because current node 'HYPER2' does not have optimal connectivity to the storage pool physical disks.
    ***************************

    I noted that event ID 5395 never referred to the node that was getting patched or rebooted, it was always another node in the cluster. 

    After the reboot and the node joined back into the cluster the repair jobs ran and completed successfully. When we carried out the same procedure on the other nodes the same issue occurred.

    Has anyone else experienced these issues? We are tearing our hair out as Dell cannot find any issues and our customer has lost complete confidence with Storage Spaces Direct due to the contant instability with it.

    Thanks,

     


    Microsoft Partner


    • Editat de rEMOTE_eVENT miercuri, 11 iulie 2018 11:01 spelling
    miercuri, 11 iulie 2018 10:57

Toate mesajele

  • Hi,

    As I understand, VM crashed when the node which installed update restart.

    Please follow the link below to take node for maintenance.Then check the issue.

    https://docs.microsoft.com/en-us/windows-server/storage/storage-spaces/maintain-servers

    Best Regards,
    Frank


    Please remember to mark the replies as an answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com

    joi, 12 iulie 2018 05:05
  • We already and always follow that procedure exactly, but the issue still occurred.

    Microsoft Partner

    joi, 12 iulie 2018 09:47
  • Hi,

    If it still occurred, I'm afraid you might need to contact  Microsoft Customer Support Services (CSS) so that a dedicated Support Professional can help you on this issue.

    To obtain the phone numbers for specific technology request, please refer to the website listed below:

    https://www.microsoft.com/en-us/worldwide.aspx 


    Appreciate your support and understanding.

    Best Regards,

    Frank



    Please remember to mark the replies as an answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com


    vineri, 13 iulie 2018 08:16
  • Did you ever get an answer on this?  We have 2 clusters, both 4 node 2016 s2d that had this happen this morning.

    Thanks,

    Brian


    • Editat de BHNAZ marți, 28 august 2018 16:56
    marți, 28 august 2018 16:55
  • Unfortunately we didn’t. What update rollup did you apply... Was it the May 17th rollup?

    Microsoft Partner

    marți, 28 august 2018 17:00
  • With us it happened as we placed the node into maintenance mode before updates were applied.  So the servers were updated in March and then again in June.  Do this still happen to you or was it a one off incident?
    miercuri, 29 august 2018 03:02
  • Hello there,

    it seems like you got the same issues like others. See here: https://social.technet.microsoft.com/Forums/en-US/0a44b0de-d082-44df-9bd2-bb565d732ef3/s2d-io-timeout-when-rebooting-node?forum=winserverClustering

    It also appears to be somehow confirmed that the May cumulative update introduced this issue...

    It is recommended to use:

    Get-StorageFaultDomain -type StorageScaleUnit | Where-Object {$_.FriendlyName -eq "<NodeName>"} | Enable-StorageMaintenanceMode

    Once the node is back online disable Storage Maintenance Mode with this syntax :

    Get-StorageFaultDomain -type StorageScaleUnit | Where-Object {$_.FriendlyName -eq "<NodeName>"} | Disable-StorageMaintenanceMode

    Sadly it does not work for me. I appreciate your feedback if it was helpfull so I can search for other issues...

    joi, 30 august 2018 06:58