none
Hyper-V 2019 2 node cluster - intermittent host / cluster issue when restarting a VM RRS feed

  • Question

  • Hi all,

    Hoping someone may have ideas on how to resolve this one!  We have a 2-node cluster (HPE ProLiant DL360 Gen10) built on Windows 2019 Standard GUI, iSCSI to Nimble storage.  All validation tests pass with flying colors!  When we were testing, i.e. with very little load we had one instance of a 2019 VM (vm version 9) which got stuck in a 'stopping-critical' state when rebooting.  At this point, the host has gone 'awol' with cluster functions, from then on any cluster operation failed until the host is rebooted (draining would never finish, any other live migration times out and fails).  We found that we had a switch down so put it down to that (even though redundant paths were AOK).  Since then it has been working beautifully as expected, I've done plenty of migrations, draining of hosts without issue, until yesterday.  The client decided to patch a Window 7 VM, it rebooted and got stuck in the same fashion, Stopping-Critical.  I have tried the trick of killing the VM process, this does not work in this scenario.  There is nothing we can do except cold boot the server, forcing failover to the other host.  Obviously a bit of an issue as the 30 servers now running on there restart on the other host, interrupting file / print services etc.

    I've read about disabling VMQ but I really don't want to go disabling the advanced performance features unless we absolutely have to.  I am using Dynamic memory on most of the machines, otherwise pretty standard configurations.

    No issues with resource we are using under half memory on both hosts.

    Any ideas?

    Thanks,
    Simon


    • Edited by S1m0nB Thursday, August 15, 2019 9:46 PM edit
    Thursday, August 15, 2019 9:25 PM

Answers

All replies

  • Hi,

    Thanks for your question.

    Any error with show critical events under this clustered VM.

    Please also check if the VM needs to update, and the host nodes need to update.

    Meanwhile, here’s a similar thread as your situation (https://www.experts-exchange.com/questions/28647555/Can't-get-Hyper-V-Windows-2012-STD-VM-out-of-STOPPING-CRITICAL-State.html).

    Please Note: Since the web site is not hosted by Microsoft, the link may change without notice. Microsoft does not guarantee the accuracy of this information.

    As a mentioned solution, you can change the network settings while the machine is running. or temporarily change the broadcom virtual switch to private or local. or boot into safe mode and rename the .vhdx.

    And when the VM's state changed to RUNNING (from STOPPING-CRITICAL), you can disconnect the NIC from the host's NIC and shutdown the machine successfully. Once in a shutdown state I was able to install a legacy network adapter.

    In addition, as another reason for your current condition, Stopping-Critical is most commonly comes from storage. Such as a dynamic disk is trying to expand and there is not enough space left to safely execute the command.

    Other reference for you,

    https://social.technet.microsoft.com/Forums/en-US/b5fbbd7b-eae0-461b-84f8-53844e18673c/hyperv-usability-stoppingcritical?forum=winserverhyperv

    Hope above information can help you. If you have any question or concern, please feel free to let me know.

    Best regards,

    Michael


    Please remember to mark the replies as an answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com

    Friday, August 16, 2019 8:02 AM
    Moderator
  • Hi,

    Just checking in to see if the information provided was helpful. Please let us know if you would like further assistance.

    Best Regards,

    Michael


    Please remember to mark the replies as an answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com

    Monday, August 19, 2019 9:28 AM
    Moderator
  • Hi, thanks for the suggested workarounds... I cant try these solutions until it happens again, but I'll come back and post if / when it does.

    I have installed the August Cumulative update onto both servers, and changed the main vSwitch team from Dynamic to Hyper-V Port mode... so far shutting down or rebooting VMs has not caused any problems.

    Cheers,
    Simon

    Monday, August 19, 2019 2:28 PM
  • Hi,

    How are things going on? Was your issue resolved?

    Please feel free to let me know if you need further assistance.

    Best regards,

    Michael


    Please remember to mark the replies as an answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com


    Tuesday, August 27, 2019 7:14 AM
    Moderator
  • Hi, still no problems since the changes made as above.

    Thanks,

    Simon

    Wednesday, August 28, 2019 11:53 PM
  • Hi,

    I'm glad your issue was successfully resolved!

    Meanwhile, if you find any reply that helps, could you help mark it as an answer so that other community members could find the helpful reply quickly please ? Your contribution is highly appreciated.

    Thanks for your support and understanding.

    Have a nice day!

    Best regards,

    Michael


    Please remember to mark the replies as an answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com

    Thursday, August 29, 2019 8:10 AM
    Moderator