none
VM periodically fails with NMI Hardware Failure

    Question

  • I have a simple Server 2016 Standard running Hyper-V.  No raid, just a two disk, Core i7 with 16GB memory.

    Virtual machines are Windows 10 Professional v1607 installed as a Gen2 (UEFI).

    A couple of weeks ago, my Win10Pro UEFI VMs would randomly blue screen with NMI Hardware Failure.  There is nothing in the server logs and nothing I've been able to dig up so far from the VM.  It might BS in 5 minutes or 5 hours or 5 days.  This even happened on a VM of a clean install.

    I thought it might be the host hardware, CPU or memory, so I moved the VM to another physical server but the problem came with it; periodically.

    Then it stopped happening on both servers for 5 days; today it happened 4 times in 30 minutes on a VM I was building up.  Checkpoints for the win!

    All VMs are worked on in audit mode.

    Help?

    Tuesday, January 31, 2017 11:07 PM

Answers

  • I've been using these boxes for 4 years now.  With Server 2008 (Core and GUI), 2012 and now 2016.

    Exactly the reason why you should follow Leo's suggestion to check whether or not your hardware is supported on Windows Server 2016.  It is quite common for a system vendor to not support older systems on the latest release of the operating system.  (FYI, it is the responsibility of the system vendor to certify their systems; Microsoft simply provides a centralized place for people to validate systems are certified.)

    That is not implying that it absolutely will not work if it is not a certified solution.  It just means it is an unsupported reason.  But vendors often will not certify due to issues they have found in their own testing and they deemed it not to be cost effective to engineer a fix for the older hardware.

    At a minimum, make sure that you downloaded and installed the latest BIOS/firmware/chipset drivers/drivers for your hardware to see if that changes anything.


    . : | : . : | : . tim

    • Marked as answer by Ericahalfbee Tuesday, February 07, 2017 7:31 PM
    Friday, February 03, 2017 7:46 PM

All replies

  • Hi Ericahalfbee,

    Random issues are difficult to reproduce to analyze.

    What is running inside the VM?

    Install updates on the VM and update the firmware of the host.

    Are there any related Hyper-V events?

    Best Regards,

    Leo


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Wednesday, February 01, 2017 3:09 AM
    Moderator
  • Nothing at all in the host server application or system logs on either server pertaining to Hyper-V or anything at all.

    Both servers are patched and latest BIOS.

    How can a VM experience an NMI hardware failure without the host server indicating anything?

    Wednesday, February 01, 2017 7:56 PM
  • Hi Ericahalfbee,

    >>How can a VM experience an NMI hardware failure without the host server indicating anything?

    It also seems strange for me.

    The related information I could find about this error are all related to physical devices.

    Besides, check the following link to see if you are running a certified device:

    https://www.windowsservercatalog.com/

    Best Regards,

    Leo


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Friday, February 03, 2017 6:44 AM
    Moderator
  • I've been using these boxes for 4 years now.  With Server 2008 (Core and GUI), 2012 and now 2016.  This is the first problem.
    Friday, February 03, 2017 4:46 PM
  • I've been using these boxes for 4 years now.  With Server 2008 (Core and GUI), 2012 and now 2016.

    Exactly the reason why you should follow Leo's suggestion to check whether or not your hardware is supported on Windows Server 2016.  It is quite common for a system vendor to not support older systems on the latest release of the operating system.  (FYI, it is the responsibility of the system vendor to certify their systems; Microsoft simply provides a centralized place for people to validate systems are certified.)

    That is not implying that it absolutely will not work if it is not a certified solution.  It just means it is an unsupported reason.  But vendors often will not certify due to issues they have found in their own testing and they deemed it not to be cost effective to engineer a fix for the older hardware.

    At a minimum, make sure that you downloaded and installed the latest BIOS/firmware/chipset drivers/drivers for your hardware to see if that changes anything.


    . : | : . : | : . tim

    • Marked as answer by Ericahalfbee Tuesday, February 07, 2017 7:31 PM
    Friday, February 03, 2017 7:46 PM
  • That's as good a reason for an upgrade as any I guess. 8)

    Thank you for your input.

    Tuesday, February 07, 2017 7:31 PM
  • ... but on a whim, I've also been aborting or rearranging the boot order to prevent the UEFI network booting before the hard drive.  No more crashes so far; for whatever good that is.
    Tuesday, February 07, 2017 8:45 PM