none
Failover Clustering - Hyper-V - WS 2019 Issues Migrating VMs (Flickering States) RRS feed

  • Question

  • Hello,

    We currently are seeing the following issues when migrating VMs from Windows Server 2016 clusters or Windows Server 2019 cluster to Windows Server 2019 newly built cluster. We are building out these clusters as usual and have been running/migrating from cluster to cluster in 2016 without any issues (3000+ VMs). New 2019 clusters are being validated with 100% success on every single parameter. We are using Mellanox RMDA cards in a 4 node HCI or non-HCI clusters.

    When VMs are moved from 2016 cluster to 2019 cluster or 2019 to 2019, some VMs start flickering between states in Failover Cluster Manager. They are going between Resuming/Pausing/Paused/Starting and the FOC crashes. However, on the Hyper-V on the owner node, they are showing up fine as running. We have tried to migrate these VMs via VMM, or by un-clustering and bringing it in via Move on Hyper-V and once they are in FOC, the flickering starts.

    In event viewer, I keep seeing the events for the following:

    'XXX-HCTXXX-01': Virtual hard disk resiliency successfully recovered drive '\\XXX-XX-XXXX-02.contoso.com\CSV03\Hyper-V\Virtual Disks\XXX-HCTXXX-01.vhdx'. Current status: No Errors.

    'XXX-HCTXXX-01': Virtual hard disk '\\XXX-XX-XXXX-02.contoso.com\CSV03\Hyper-V\Virtual Disks\XXX-HCTXXX-01.vhdx' received a resiliency status notification. Current status: Disconnected. (Virtual machine ID DXXXXX-XXX-XXXX-XXXX-XXXXXXXXXXXX)

    'XXX-HCTXXX-01': Virtual hard disk '\\XXX-XX-XXXX-02.contoso.com\CSV03\Hyper-V\Virtual Disks\XXX-HCTXXX-01.vhdx' has detected a recoverable error. Current status: Disconnected. (Virtual machine ID XXXXX-XXX-XXXX-XXXX-XXXXXXXXXXXX)

    I checked the storage permissions for where it resides from NODE to SOFSC, no issues. Old cluster has access and new one does too. I have full control from NODE that is now hosting the flickering server and the old host node does too.

    It does not matter if the storage is moved along or not, the problem still arises.

    Is there something happening in Server 2019 that we are not aware off that results in such behavior? The fix has essentially been to drain all roles except the faulty one, stop the cluster services, reboot node and bring the VM back, it then stops acting in such way.

    Thanks for your help!



    • Edited by bond101 Thursday, October 17, 2019 6:17 PM
    Thursday, October 17, 2019 4:49 PM

All replies

  • Hi,

    Thanks for posting in our forum!

    >>Is there something happening in Server 2019 that we are not aware off that results in such behavior?

    I haven't found any information about this yet.

    As i understand, this issue just occured when VM joined in cluster and perform live migration from one 16/19 cluster to the new 19cluster. if i misunderstood, please let me know.

    For now situation, we can only see why VM changes state from cluster log. But, I want to explain to you that from the support level of our forum, we do not provide log analysis. To be honest, this is a bit difficult for us, but from my personal point of view, if you decide to upload cluster logs, I will try my best to help you analyze, but I can't guarantee that there will be results.

    You can try to collect the cluster log by yourself and see if there is any error information.

    How to get cluster log:

    https://docs.microsoft.com/en-us/powershell/module/failoverclusters/get-clusterlog?view=win10-ps

    In addition, if your problem is urgent, I suggest you open a case support to Microsoft.

    Really appreciated your understanding!

    Best Regards,

    Daniel


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Friday, October 18, 2019 7:20 AM
    Moderator
  • Hi,

    I am writing here to confirm current situation.

    If the above suggestion are helpful to you, please be kind enough to "mark it as an answer" for helping more people.

    Regards,
    Daniel

    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Tuesday, October 22, 2019 7:26 AM
    Moderator
  • Hello,

    Thank you for the response. We have determined that if the VM is turned off and then migrated, it does not flicker. So, the process has been, uncluster role, turn it off on Hyper-V, move to a new cluster, start it up and cluster role in FOC. It does not flicker. Any ideas? I can upload logs

    Tuesday, October 22, 2019 4:06 PM
  • Hi,

    Sorry for replying delay.

    For this situation, i'd better suggest you open a case to microsoft.

    Thanks for your understanding, if you have any question, please feel free to let me know.

    Best Regards,

    Daniel


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact

    Friday, October 25, 2019 7:05 AM
    Moderator