none
Hyper-V Failover Cluster Issue - Copy of VM being left on source server after Drain RRS feed

  • Question

  • Summary - When draining a Server 2016 Hyper-V Cluster node, randomly certain VMs will be left on the source node, even though they get properly moved to the destination node and this is reflected in the Failover Manager.

    Long version:

    Greetings!  We have run into a rather unique (it appears) situation with our Hyper-V 2016 Failover Cluster.  Whenever we do maintenance, we follow the standard process of Draining the node, which moves all of the Storage and VM resources over to the other functional nodes.  While the Drain process appears to complete as far as the Cluster is concerned (no errors or warning generated), we noticed that randomly certain VMs were being left on the drained node, but they also existed at the destination node where the cluster now says they should be located.   It only affects VMs that are in an Offline state.   This issue is particularly bad on our Replica/Testing Cluster, where MOST of the VMs are not running (because they are replicas), but it has also occurred on our Primary cluster as well with the few VMs that are not running on there.  The biggest issue this situation causes is that because the VMs exist on both nodes at the same time, they can both attempt to do their replication process and usually end up screwing it up, requiring a Resynch of the entire VM to the replica.

    The issue occurs with VMs both with and without replication enabled. When our clusters nodes were still running Server 2012 R2, this issue never occurred.  We migrated to 2016 late last year and this issue started occurring periodically during maintenance windows.

    At this point, I am just curious if anyone else has ever ran across this or has any thoughts.   Other than this issue, the nodes and VMs run fine.

    Thanks for your time!

    Thursday, May 28, 2020 10:58 PM

All replies

  • Hi jdobi,

    > While the Drain process appears to complete as far as the Cluster is concerned (no errors or warning generated), we noticed that randomly certain VMs were being left on the drained node, but they also existed at the destination node where the cluster now says they should be located. 

    Please check if my understanding is correct, the issue you are experiencing is after draining the node, VMs on the original node moved to the other node still appear on the original node. 

    If I misunderstood, please feel free to let me know.

    1. Please check if the windows server 2016 nodes are updated, if not, please install the latest windows updates on all cluster nodes.

    2. How do you update the cluster from Server 2012 R2 to Server 2016? Do you use cluster rolling upgrade? If yes, after migration, do you upgrade the cluster function level, and check if upgrade the VMs configuration version.

    https://docs.microsoft.com/en-us/windows-server/failover-clustering/cluster-operating-system-rolling-upgrade

    3. Besides, we may run Cluster validation report to check if there's any error listed.

    Thanks for your time!

    Best Regards,

    Anne


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Friday, May 29, 2020 8:49 AM
    Moderator
  • Yes, you are understanding the situation. All nodes are updated, in fact it was the draining to do the updates which is when we notice the issue.   Yes, we did a Rolling upgrade and yes, after the migration the cluster functional level was updated. Some of the VMs are still running version 5, HOWEVER, it should be noted that I realized after your post that ONLY version 8 VMs are being affected.  Only the VMs we have upgraded from version 5 to 8 seem to be having the problem.   I'm not sure yet if it's affecting newly created VMs that started on version 8, I'll keep an eye out the next time we do maintenance.

    I ran a Cluster Validation on our Replica Cluster and it shows no issues other than the 'warnings' of the VMs not being online, which is expected since most of the VMs are replicas.

    Thanks for your time!

    Tuesday, June 2, 2020 8:02 PM
  • Hi jdobi,

    >HOWEVER, it should be noted that I realized after your post that ONLY version 8 VMs are being affected. 

    Although it is strange, it is a good finding. Please check if the new created VM with version 8 also has the same issue.

    Thanks for your time!

    Best Regards,

    Anne


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Friday, June 5, 2020 3:00 AM
    Moderator