locked
Windows 2012 R2 Hyper-V Cluster NIC Teaming Qlogic 10GB NIC Virtual switch BSOD RRS feed

  • Question

  • We are using 3x DL380 G7 with Windows 2012 R2 Hyper-V Cluster. In order to get more performance out of it we added 2 HPE NC523sfp 2P 10GB NICS per host. The ISCSI traffic worked as a charm. 

    So we had a consultant in to try and use the 2 remaining free 10GB port to improve LAN and Cluster network speed. 

    The HP Switch stack And ports are configured to use LACP. Host1 paused, he removed the 1GB NICS from the LAN Team and replaced them with the 10GB NICS. He tried to make the virtual switch converged so it would be used for VM traffic, management and cluster traffic. Everything seemed to go smooth, no errors and no restart needed. After resuming VM's were live migrated back without problems.
    So he started on host2 same procedure. After resuming and life migrating VM's, Host1 suddenly BSOD with stopcode 133 PDC watchdog violation error. The driver is the latest HPE november 2015 driver for this QLogic NIC.

    He removed the VLAN tag and virtual switch configuration so it would only be used for VM LAN traffic. Management and Cluster, Live Migration back to their own 1GB NICs. This time it remained stable so also Host3 was done. Live Migrating, pausing and resuming and the nightly backup of the VM's went without problems. 

    Yesterday I installed the July update and after completing successfully updating all three hosts, host3 started to BSOD under normal VM workload. I paused it to investigate further and this night also Host1 and Host2 went BSOD under the VEEAM VM backup.

    I have read that there are VMQ issues with these older NICS, NIC teaming due to overlapping processors. According to the consultant that only relates to switch independent teaming. We use LACP and load balancing mode is set to hashtag instead of dynamic.

    For now I disabled VMQ on the NIC team and team NICS. 

    Are there know VMQ issues with these older NICS, W2012R2 Hyper-V Host and W2008R2 VM's also when using LACP?

    TIA,

    Fred 

    Thursday, July 16, 2020 10:39 AM

All replies

  • You need to check with HPE.  Only certain models of the DL380 G7 are certified to run 2012 R2, and it does not appear that the NC523sfp is certified for 2012 R2.  This is according to windowsservercatalog.com where hardware components are listed as certified for various versions of Windows Server.  The entries are submitted by the hardware vendor.  HPE needs to state whether or not you are running a supported configuration and whether or not they support their older NICs to run the teaming software.

    tim

    Thursday, July 16, 2020 12:15 PM
  • HPE provides firmware and drivers for Windows 2012 R2 both for the DL380 G7 and the NC532sfp and the driver was updated through Windows Update.

    Problem has been experienced before:
    http://www.afinn.net/hyper-v/hyper-v-and-qlogic-equals-dpc_watchdog_violation-bsod/

    Cannot find the kb article he refers to. Found others on other servers and NICs and that it has been solved in W2016 with SET.

    Sofar disabling VMQ on all three nodes has not given a performance loss under normal day workloads and I performed multiple live migrations in production without any BSOD.

    Enabling it again with setting the processor preference might be a solution:
    Set-NetAdapterVMQ -Name “Ethernet1” -BaseProcessorNumber 4 -MaxProcessors 8
    (VMQ would use processors 4,6,8,10,12,14,16,18)
    Set-NetAdapterVMQ -Name “Ethernet2” -BaseProcessorNumber 20 -MaxProcessors 8
    (VMQ would use processors 20,22,24,26,28,30,32,34)

    before going down that route I would like more information on the working, pro and con’s of using VMQ.

    Pointers appreciated.

    TIA,

    Fred

    • Edited by Fred B. _ Thursday, July 16, 2020 9:20 PM
    Thursday, July 16, 2020 5:08 PM
  • Disabling VMQ has solved the situation. Veeam Backup completed normal. Servers had no more errors or BSOD. VMQ is definitely the cause for the BSOD with stopcode 133. 

    Regards,

    Fred

    Friday, July 17, 2020 5:32 AM
  • We are using this version:

    HPE QLogic P3P Multifunction Driver for Windows Server 2012 R2

    Type: Driver - Network
    Version: 5.3.32.1130(24 Oct 2016)
    Operating System(s): Microsoft Windows Server 2012 R2


    This product addresses an issue where the Virtual Memory Queue (VMQ) causes 100% CPU usage on a single core.

    https://support.hpe.com/hpsc/swd/public/detail?swItemId=MTX-11d772bdb217487996e3ec867e#tab4

    Friday, July 17, 2020 5:43 AM
  • Hi Fred B._

    Thanks for your sharing the information with us. Then, you may mark your reply as answer to end this thread.

    Best Regards,

    Anne

    This "High Availability (Clustering)" Forum will be migrating to a new home on Microsoft Q&A, please refer to this sticky post for more details. 


    "High Availability (Clustering)" forum will be migrating to a new home on Microsoft Q&A!

    We invite you to post new questions in the "High Availability (Clustering)"  forum's new home on Microsoft Q&A!

    For more information, please refer to the sticky post

    Thursday, July 23, 2020 7:23 AM