none
Hyper-V over SMB3 problem RRS feed

  • Question

  • Hello.

    I have a problem with my Hyper-V cluster.
    It is simply a failover cluster with Hyper-V role consisting of two nodes. It uses SOFS share for VM storage.

    SOFS is run by second storage failover cluster dedicated solely for this role. Storage cluster consisting of two nodes and shared iSCSI storage, disks added as CSV and SOFS shares are on them.

    All Hyper-V and SOFS cluster nodes have dedicated 2x10G interfaces, so SMB3 multichannel is in place.
    - SMBv1 removed
    - NETBIOS disabled
    - TCP timestams enabled "netsh int tcp set global timestamps=enabled"
    - Enabled TcpAckFrequency and TcpNoDelay REG_DWORD 1 in HKEY LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Interfaces\<SAN interface GUID>

    Approximately every two weeks all VMs hang due to losing connection to SOFS share.

    Symptoms:
    - UNC address \\SOFS.INSIDE.LOCAL cannot be accessed from Hyper-V cluster nodes with error "The remote procedure failed and did not execute." https://i.imgur.com/ye69RKt.png
    - SOFS share can be accessed by UNC address \\SOFS from Hyper-V cluster nodes
    - SOFS share can be accessed directly by \\SOFS.INSIDE.LOCAL\SHARENAME from Hyper-V cluster nodes
    - SOFS share can be accessed from any other servers by \\SOFS.INSIDE.LOCAL or \\SOFS

    Known workaround: Reboot Hyper-V cluster nodes or only one of two nodes. Rebooting SOFS cluster nodes doesn't help.

    OS: Windows Server 2016 everywhere, 2018-06 updates

    Of course I can go back to directly connecting iSCSI storage to Hyper-V cluster, but in my case this dedicated SOFS storage cluster was in place to simplify Hyper-V and (in future) SQL cluster nodes setup. So I won't need to update storage array software on all cluster nodes (~20 nodes in future) when new version comes out and all storage array-host relationships will be only between two nodes and array for troubleshooting reasons.

    I believe that problem is somewhere in SMB client-server relations.

    I've already tried this in Hyper-V nodes "Set-SmbClientConfiguration -MaxCmds 32768" and on SOFS nodes "Set-SmbServerConfiguration -MaxThreadsPerQueue 64 -AsynchronousCredits 8192" but it didn't help. All other SMB settings are default.
    From my point of view this setup looks pretty simple: Hyper-V running VMs with storage over SMB without any insane or special things.

    Captured problem with procmon https://i.imgur.com/ewDDpL9.png
    Captured problem with network monitor: https://i.imgur.com/gbVvrZm.png (with filter ProtocolName == "SMB2")
    In this sample 10.10.10.101 - SOFS node #1 SAN interface 0  and 10.10.10.155 - HV node #5 SAN interface 0

    Looks like problem in RPC over SMB communication via Server Service Remote Protocol(https://msdn.microsoft.com/en-us/library/dd303117.aspx) but I have no idea whats the problem there.

    According to this blog post (https://blogs.technet.microsoft.com/josebda/2013/10/30/automatic-smb-scale-out-rebalancing-in-windows-server-2012-r2/) type of access of Hyper-V servers to SOFS share should be considered symmetric because both SOFS nodes identically connected to SAN via iSCSI but I see a lot of 30814 events logged with 1 second interval first stating that share type is asymmetric https://i.imgur.com/LJ425BN.png and second stating that it is symmetric https://i.imgur.com/MnfxtDQ.png .
    I can't find any documentation (except that blog post) about this behavior, and how SOFS determines type of share (symmetric/asymmetric).

    Also in SMB witness client eventlog I can see a lot of events "Witness registration has completed." and "Witness Client received a share move request".
    This events looks related, but I can't investigate further inside this SMB interaction.

    Yes, we have got support case opened (118072618661320) but I can't get any response for more than two weeks now.

    • Edited by Al.Kochm Wednesday, August 29, 2018 7:40 AM
    Monday, August 27, 2018 11:33 PM

Answers

  • We are stopping using SOFS for Hyper-V or SQL and reverting back to File Server in Failover Cluster.

    We are not recommending using SOFS to anyone without Premium Support contract, because Pro Support cannot solve this kind of problems in time (we have wasted two month).

    User voice ref is here .

    • Marked as answer by Al.Kochm Monday, September 10, 2018 7:36 AM
    Monday, September 10, 2018 7:36 AM

All replies

  • Hi,

    Based on my knowledge, When the SMB client initially connects to a file server cluster node, the SMB client notifies the SMB Witness client, which is running on the same computer. The SMB Witness client obtains a list of cluster members from the SMB Witness service running on the file server cluster node. The SMB Witness client picks a different cluster member and issues a registration request to the SMB Witness service on that cluster member.

    But in your scenario, you get the SMB witness client successful event. Maybe you could also check the thread discussed before to find more clue.

    https://social.technet.microsoft.com/Forums/forefront/en-US/f6f8310e-a1d6-479a-9ace-3994ac6488c7/smb-witness-client-error-sofs-storage-spaces

    In general, for further support about details analysis  may still consult CSS. Maybe you could also contact your Microsoft support engineer to give more suggestion.

    And it is also appreciate other members in Our forum could share more ideas.

    Best Regards,

    Mary


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Tuesday, August 28, 2018 4:13 AM
    Moderator
  • Thanks, I would love to consult with CSS, I have a case opened, but they simply don't answer.
    Tuesday, August 28, 2018 9:44 AM
  • Hi,

    Thanks for your feedback.

    And if you get the updates, please feel free to contact us.

    Best Regards,

    Mary


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Wednesday, August 29, 2018 1:16 AM
    Moderator
  • The only update is that support cannot help us.

    Latest recommendation is to stop using SOFS share for cluster witness and see what happens. Also there was recommendation about restarting SMB witness service, but it didn't help.

    There is absolutely ZERO documentation about SOFS share's symmetric/asymmetric behavior except this article and this is the only documentation that support have. Support engineer says that Microsoft won't reveal any more documentation about this, so we will never know why our SOFS share switching from symmetric to asymmetric.

    This support experience is painful and ridiculous.

    Sunday, September 9, 2018 8:57 AM
  • Hi,

    Thanks for sharing the results for anyone else to do a reference. I'm regret that you didn't get more information.

    Maybe you could consider to put forward your ides in user voice for windows server so that more Microsoft engineers could see this. 

    https://windowsserver.uservoice.com/forums/295047-general-feedback

    Best Regards,

    Mary


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Monday, September 10, 2018 1:01 AM
    Moderator
  • We are stopping using SOFS for Hyper-V or SQL and reverting back to File Server in Failover Cluster.

    We are not recommending using SOFS to anyone without Premium Support contract, because Pro Support cannot solve this kind of problems in time (we have wasted two month).

    User voice ref is here .

    • Marked as answer by Al.Kochm Monday, September 10, 2018 7:36 AM
    Monday, September 10, 2018 7:36 AM