locked
Non-Uniform memory NUMA warning RRS feed

  • Question

  • I am getting a non-uniform memory NUMA warning in the event logs from the Hyper-V-Worker if I set memory above 31334MB (a hair less than 32GB).

    The hardware is a Dell R510 server with two X5670 six core processors with HT disabled.  There are eight 8GB memory devices installed for a total of 64GB memory.  Hyper-V hardware technology tells me the memory limit is 31334 which seems odd, it's not a multiple of 1024.  If I set memory above 31334, Hyper-V hardware technology tells me there are 3 sockets.  These processors are triple channel so I suspect maybe it's the extra two memory modules but I'm not sure.

    The OS is Windows Server Standard 2016 (1607) in core mode with only the Hyper-V feature enabled and nothing else.  There is only one VM running Server 2016 Std (1607).  Updates are latest as of this moment.

    I would expect with this hardware, I should be able to utilize 12 cores and 48GB of memory 2 NUMA nodes and 2 sockets for a single VM.  Is there a reason Windows 2016 isn't detecting the hardware correctly or do I need to loose the extra two 8GB memory modules or is this actually by design for this particular hardware platform?

    Friday, November 30, 2018 3:48 PM

Answers

  • NUMA is a feature of the processor's built-in memory controller so I have my doubts about it being a matter of a supported chassis. If Windows/Hyper-V was unable to work with the NUMA feature of this CPU, then it would not be able to access memory at all without disabling NUMA. If by some miracle it did, Get-VMHostNumaNode would just show one big fat node with everything in it. It does appear that it detects the hardware properly.

    The memory limit per node calculation is not published, but it does process a formula to determine the number that you see. I expect it to not fall neatly on a megabyte boundary. I would guess that the difference is because you will never be allowed to address all of the memory in a NUMA node under any circumstances. This is physical memory we're talking about now, not a logical address space backed by virtual memory.

    I'm not going to be able to give you a definitive answer. I have done some work with NUMA but always in a shared resources configuration where I did not drive any one VM to any limit (you are assigning all pCores to one VM). I have worked with NUMA on hardware even older than yours, but not with 2016. Nothing changed in NUMA that I'm aware of, and I don't know of any changes to the way that Windows or Hyper-V uses CPUs that would impact this, assuming that you stayed with the default settings. It doesn't sit right, but I have to leave open the door to the possibility of an update to the tech in 2016 that causes this behavior on this hardware. But, that doesn't mean it won't work. It just means that maybe optimal configuration on this hardware is not what you wanted.

    Everything in your report looks fine. Both of your commands display exactly what I would expect. I would also think that you could assign 12 vCPU and nearly all of your 64GB of memory to a VM without it jumping to the third virtual node.

    I can think of a few things:

    • Did you configure CPU groups?
    • Reduce the number of assigned vCPU to 10 and see if that changes it. Try 6 after that. I'm not asking you to leave it, I just want to know if there's a CPU assignment where it reverts to the correct number of NUMA nodes.
    • Temporarily enable HT and see what changes. Specifically, can you assign all 24 logical processors and the desired amount of memory without it jumping to the 3rd node? That's what I see on my HT-enabled boxes.



    Eric Siron
    Altaro Hyper-V Blog
    I am an independent contributor, not an Altaro employee. I accept all responsibility for the content of my posts. You accept all responsibility for any actions that you take based on the content of my posts.

    Saturday, December 1, 2018 6:54 PM
  • I re-enabled hyper-threading and Server 2016 is no longer confused and correctly reports 2 NUMA nodes as would be expected.
    • Marked as answer by GettnBetter Wednesday, December 26, 2018 5:09 PM
    Wednesday, December 26, 2018 5:09 PM

All replies

  • You seem to have something wrong with your system.. 

    Be it you have an issue with hardware, or licensing.. as Server 2016 standard supports up to 24TB of memory, and doesn't have a processor limitation, where as Server 2016 Essentials does.. 

    That said, 

    It's also a possible hardware issue, and i don't believe that the Dell R510 supports windows 2016 Server, that said, it works, but i don't think it's supported, and there are a variety of driver related issues with running an R510 with 2016 server. It is possible that the issue you are seeing is a result of the Bios not working correctly with 2016 server, or that there is some other hardware related limitation with 2016 server. 

    If i were you, rather than eating the cost of the hardware, just run 2012R2 headless on it.. 

    https://www.dell.com/support/home/us/en/04/drivers/supportedos/poweredge-r510


    Rob

    Friday, November 30, 2018 4:37 PM
  • Could you post output from

    Get-VMHostNumaNode

    and

    Get-VMProcessor -VMName 'yourvm' | fl *

    please?


    Eric Siron
    Altaro Hyper-V Blog
    I am an independent contributor, not an Altaro employee. I accept all responsibility for the content of my posts. You accept all responsibility for any actions that you take based on the content of my posts.

    Friday, November 30, 2018 7:02 PM
  • Get-VMHostNumaNode results:

    NodeId                 : 0
    ProcessorsAvailability : {97, 93, 98, 98...}
    MemoryAvailable        : 14532
    MemoryTotal            : 32768
    CimSession             : CimSession: .
    ComputerName           : XXXXXXXXXXXXXXX
    IsDeleted              : False

    NodeId                 : 1
    ProcessorsAvailability : {96, 97, 97, 97...}
    MemoryAvailable        : 14527
    MemoryTotal            : 32755
    CimSession             : CimSession: .
    ComputerName           : XXXXXXXXXXXXXXX
    IsDeleted              : False

    Get-VMProcessor results:
    VMCheckpointId                               : 00000000-0000-0000-0000-000000000000
    VMCheckpointName                             :
    ResourcePoolName                             : Primordial
    Count                                        : 12
    CompatibilityForMigrationEnabled             : False
    CompatibilityForOlderOperatingSystemsEnabled : False
    HwThreadCountPerCore                         : 1
    ExposeVirtualizationExtensions               : False
    Maximum                                      : 100
    Reserve                                      : 0
    RelativeWeight                               : 100
    MaximumCountPerNumaNode                      : 6
    MaximumCountPerNumaSocket                    : 1
    EnableHostResourceProtection                 : False
    OperationalStatus                            : {Ok, HostResourceProtectionDisabled}
    StatusDescription                            : {OK, Host resource protection is disabled.}
    Name                                         : Processor
    Id                                           : Microsoft:XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    VMId                                         : XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    VMName                                       : XXXXXXXXXXXXXXXXXXX
    VMSnapshotId                                 : 00000000-0000-0000-0000-000000000000
    VMSnapshotName                               :
    CimSession                                   : CimSession: .
    ComputerName                                 : XXXXXXXXXXXXXXXXXXXX
    IsDeleted                                    : False

    Saturday, December 1, 2018 4:51 AM
  • I agree with Rob's observation.  Though some things might work, the lack of NUMA support for Hyper-V may have been one of the reasons why Dell chose not to support that system for Windows Server 2016.  You might to see if you could get some information from Dell about their viewpoint on this.

    tim

    Saturday, December 1, 2018 3:09 PM
  • NUMA is a feature of the processor's built-in memory controller so I have my doubts about it being a matter of a supported chassis. If Windows/Hyper-V was unable to work with the NUMA feature of this CPU, then it would not be able to access memory at all without disabling NUMA. If by some miracle it did, Get-VMHostNumaNode would just show one big fat node with everything in it. It does appear that it detects the hardware properly.

    The memory limit per node calculation is not published, but it does process a formula to determine the number that you see. I expect it to not fall neatly on a megabyte boundary. I would guess that the difference is because you will never be allowed to address all of the memory in a NUMA node under any circumstances. This is physical memory we're talking about now, not a logical address space backed by virtual memory.

    I'm not going to be able to give you a definitive answer. I have done some work with NUMA but always in a shared resources configuration where I did not drive any one VM to any limit (you are assigning all pCores to one VM). I have worked with NUMA on hardware even older than yours, but not with 2016. Nothing changed in NUMA that I'm aware of, and I don't know of any changes to the way that Windows or Hyper-V uses CPUs that would impact this, assuming that you stayed with the default settings. It doesn't sit right, but I have to leave open the door to the possibility of an update to the tech in 2016 that causes this behavior on this hardware. But, that doesn't mean it won't work. It just means that maybe optimal configuration on this hardware is not what you wanted.

    Everything in your report looks fine. Both of your commands display exactly what I would expect. I would also think that you could assign 12 vCPU and nearly all of your 64GB of memory to a VM without it jumping to the third virtual node.

    I can think of a few things:

    • Did you configure CPU groups?
    • Reduce the number of assigned vCPU to 10 and see if that changes it. Try 6 after that. I'm not asking you to leave it, I just want to know if there's a CPU assignment where it reverts to the correct number of NUMA nodes.
    • Temporarily enable HT and see what changes. Specifically, can you assign all 24 logical processors and the desired amount of memory without it jumping to the 3rd node? That's what I see on my HT-enabled boxes.



    Eric Siron
    Altaro Hyper-V Blog
    I am an independent contributor, not an Altaro employee. I accept all responsibility for the content of my posts. You accept all responsibility for any actions that you take based on the content of my posts.

    Saturday, December 1, 2018 6:54 PM
  • Interesting.  This is why I disabled HT:

    The hypervisor did not enable mitigations for CVE-2018-3646 for virtual machines because HyperThreading is enabled and the hypervisor core scheduler is not enabled. To enable mitigations for CVE-2018-3646 for virtual machines, enable the core scheduler by running "bcdedit /set hypervisorschedulertype core" from an elevated command prompt and reboot.

    When I enable Hyper Threading, all memory is available and the NUMA issue disappears but that warning appears

    What is really interesting is the link documenting how to "fully understand" what the schedulerytpe does, no longer exists: https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/manage/understanding-hyper-v-scheduler-type-selection

    This is what drove me to simply disable HT altogether

    • Edited by GettnBetter Thursday, December 6, 2018 4:09 AM
    Thursday, December 6, 2018 3:54 AM
  • The scheduler doc is here: https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/manage/manage-hyper-v-scheduler-types

    If you have another VM, then that might explain it. Or not. I haven't gotten a chance to really wring out the new schedulers. But it would make sense to me that it would not be able to guarantee access to every core so the NUMA nodes would need to be restricted accordingly. But, my non-HT boxes are single NUMA node only so I can't do a full comparison. I don't think that I'll have the opportunity to disable HT on my HT boxes for real apples-to-apples testing in the near future.

    Instead of enabling HT, what about dropping the assigned vCPU count? If you only have to drop it by a couple, that might be a better compromise.

    Another thing, NUMA misconfigurations are going to cause occasional sub-microsecond latencies. Would that hurt your workload? Does it perform a lot of memops? Basically, I'm wondering how much effort your problem justifies.


    Eric Siron
    Altaro Hyper-V Blog
    I am an independent contributor, not an Altaro employee. I accept all responsibility for the content of my posts. You accept all responsibility for any actions that you take based on the content of my posts.

    Thursday, December 6, 2018 5:48 PM
  • Hi,

    Just want to confirm the current situations.

    Please feel free to let us know if you need further assistance.

    Best regards,

    Michael


    Please remember to mark the replies as an answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com



    Friday, December 7, 2018 6:11 AM
  • I re-enabled hyper-threading and Server 2016 is no longer confused and correctly reports 2 NUMA nodes as would be expected.
    • Marked as answer by GettnBetter Wednesday, December 26, 2018 5:09 PM
    Wednesday, December 26, 2018 5:09 PM