none
Hyper-V 2019 Ingnoring Live Migration "Performance Option" Setting | Bug? RRS feed

  • General discussion

  • Hi all,

    I've spent days at this point trying to diagnose this one, so I'm opening it up for discussion/a bug report.

    Windows Server 2019 / Hyper-V Server 2019 is ignoring the "Performance options" selection on hosts and using SMB regardless of any other administrative choice. This is using Kerberos authentication.

    2 hosts, non-clustered, local storage. Hyper-V Server 2019 (I've installed 2019 Standard as well, with the same behavior)

    I originally assumed that I must have a config problem. I blew two hosts away, clean installs of 2019, no configuration scripts. Just the LBFO network config and a a converged fabric network design. Same problem.

    Next I assumed it had to be networking. So drop the teaming, drop the config scripts, drop the VLAN configs, drop the LAGs. 1 physical NIC with access to domain controllers for non-migration traffic. Another single physical NIC - on a crossover cable - for live migration. Same problem.

    After this, I have tried clean installs with all group policy being blocked from application on the hypervisors. I've tried clean installs with Microsoft Intel NIC drivers and Intel's v23 and v24 release NIC drivers. Same problem. I've tried the June 2019 re-release of Hyper-V Server 2019 and even a direct from DVD install (no language packs etc), so a 100% vanilla source.

    Here is the problem in its simplest form (the two physical 1GbE NIC setup):

    Live Migration: Kerberos & Compression
    A test VM with a VHDX with a couple of GB of junk data in it to push back and forwards
    All configs are verified
    Windows Firewall modes are domain/private - and it makes no difference if Windows Firewall is off
    Windows Defender is uninstalled and no other software (let alone security software) is installed, period. These are fresh installs
    The ONLY live migration network in the Incoming Live Migration networks list is 10.0.1.1/32 on HV1 and 10.0.1.2/32 on HV2

    Migration LAN: 10.0.1.0/24 (point to point via a crossover cable)
    All other traffic: 192.168.1.0/24 (via a flat LAN configured switch i.e. it's in dumb mode)


    VMMS listens ON THE CORRECT LAN

    netstat -an | findstr 6600
      TCP    10.0.1.2:6600       0.0.0.0:0              LISTENING

    When performing an online/offline migration VMMS connects correctly over the >correct< LAN

    netstat -an | findstr 6600
      TCP    10.0.1.2:6600       0.0.0.0:0              LISTENING
      TCP    10.0.1.2:54397      10.23.103.1:6600       ESTABLISHED

    All fine!

    Using Packet Capture on the 10.0.1.0/24 migration LAN there is plenty of chatter to/from TCP 6600. You can see the VMCX configuration state being transmitted in XML over TCP 6600 and lots of successful back and forth activity for 0.35 seconds. Then traffic on TCP 6600 stops.

    Traffic now starts up on the non-Migration network, the 192.168.1.0 network - that is NOT in the Migration networks list. A large block transfer occurs. Packet monitoring this connection shows an SMB transfer occurring. This block transfer is of course, the VHDX file.

    As soon as the block transfer completes on the 192.168.1.0 network (~16 seconds) traffic picks-up again over TCP 6600 on the 10.0.1.0 network for about 0.5 seconds and the Live Migration completes.

    The only way that I can get the hosts to transfer over the 10.0.1.0 network is to add their respective FQDN entries to the local server Hosts files.

    Re-doing the transfer now uses the correct 10.0.1.0 network. You can clearly see the VMCX transfer over TCP 6600, then a SMB 2.0 session is established using the value from the hosts file between source and destination over 10.0.1.0. An SMB transfer of the VHDX occurs on the forced 10.0.1.0 network before finally the process is concluded via traffic on TCP 6600 (still on the 10.0.1.0 network) and the transfer completes successfully.

    Without the Hosts file entries, Hyper-V seems to be using NetBIOS to find the migration target, it can't so it defaults to whatever network it can find a SMB connection on. However, I say again, the 192.168.1.0 network is not in the Live Migration networks list - Hyper-V should be failing the transfer, not unilaterally deciding to "use any available network for live migration". PowerShell on both hosts confirm that this is correctly configured:

    get-vmhost | fl *

    ....
    MaximumStorageMigrations                  : 2
    MaximumVirtualMachineMigrations           : 2
    UseAnyNetworkForMigration                 : False
    VirtualMachineMigrationAuthenticationType : Kerberos
    VirtualMachineMigrationEnabled            : True
    VirtualMachineMigrationPerformanceOption  : Compression
    ...

    Get-VMMigrationNetwork

    Subnet       : 10.0.1.2/32
    Priority     : 0
    CimSession   : CimSession: .
    ComputerName : HV2
    IsDeleted    : False

    Something is causing it to ignore the Compression setting, but only for VHDX transfers. Other VM data is being sent correctly over TCP 6600. As the 10.0.1.0 network isn't registered in DNS, Hyper-V isn't "aware" that it can find the destination host over that link. Of course, in this test I do not want it to use SMB to perform this transfer, so it should not be using SMB in the first place. What I want is migration traffic to occur over a private 9K Jumbo Frame network - as I've always used - and not bother the 1.5K frame management network.

    I've clean installed Windows Server so many times to diagnose down on this I've gone dizzy! Does anyone have any bright ideas?

    Thanks


    • Edited by C-Amie Friday, July 19, 2019 3:58 PM Typo
    Friday, July 19, 2019 10:56 AM

All replies

  • Seems this one has stumped even the Guru's of Technet.

    Does anyone have any ideas for diagnostic approaches? I've put them back into their converged design state on the switches and played with the binding order to no avail.

    Friday, July 26, 2019 9:07 AM