locked
Linux VM KVP IP can't be shown on Hyper-V Manager after querying its KVP/IP information on Hyper-V host several times RRS feed

  • Question

  • Hello

    [Sorry for asking the same question in the wrong place/forum of "Hyper-V"]

    I am using a CentOS 6.5 VM (Linux kernel 2.6.32-431) and (generation 2) CentOS 7 VM (Linux kernel 3.10.0-123) with Hyper-V KVP daemon installed, and I periodically query its IP (via using WMI or Powershell to query its KVP information) to manage it.

    However, after querying its IP (KVP) for several times, its IP can’t be queried or shown on Hyper-V Manager anymore (Windows VM is ok without this problem).

    And here is the vmIntegrationService status of my CentOS 7 VM for the references.

    PS C:\Users\Administrator> (Get-VM -name G2_CentOS7).vmIntegrationService

    VMName     Name                    Enabled PrimaryStatusDescription SecondaryStatusDescription
    ------     ----                    ------- ------------------------ --------------------------
    G2_CentOS7 Time Synchronization    True    OK
    G2_CentOS7 Heartbeat               True    OK
    G2_CentOS7 Key-Value Pair Exchange True    OK                       The protocol version of the component installed ...
    G2_CentOS7 Shutdown                True    OK
    G2_CentOS7 VSS                     True    No Contact
    G2_CentOS7 Guest Service Interface False   OK

    I attached a simple KVP query Powershell script as the follows, and this problem can be reproduced in couple minutes if you run two instances with this script at the same time.

    $VMName = $args[0]
    write-host "$VMName"
    
    filter Import-CimXml
    {
        $CimXml = [Xml]$_
        $CimObj = New-Object -TypeName System.Object
        foreach ($CimProperty in $CimXml.SelectNodes("/INSTANCE/PROPERTY"))
        {
            if ($CimProperty.Name -eq "Name" -or $CimProperty.Name -eq "Data")
            {
                $CimObj | Add-Member -MemberType NoteProperty -Name $CimProperty.NAME -Value $CimProperty.VALUE
            }        
        }
        $CimObj
        $CimObj = $null
    }
    
    for ($i=1 ; $i -le 10000 ; $i++) {    
        $a = Get-Date
        write-host "$i - Time: " $a.ToLocalTime()
        $vm = Get-WmiObject -Namespace root\virtualization\v2 -Query "Select * From Msvm_ComputerSystem Where ElementName='$VMName'"
        $vm.ElementName
        $vmkvp = Get-WmiObject -Namespace root\virtualization\v2 -Query "Associators of {$vm} Where AssocClass=Msvm_SystemDevice ResultClass=Msvm_KvpExchangeComponent"
        $vmkvp.GuestIntrinsicExchangeItems | Import-CimXml
    }

    Actually, if your CentOS VM (has LIS) installed with KVP daemon running well, my test script will show more than 4 keys (include NetworkAddressIPv4 or NetworkAddressIPv6 keys).
    However, while the KVP daemon becomes problematic, it will only show a few keys (ex. 4~6 keys) and at this moment, Hyper-V Manager also can't show IP address of it anymore and you may need to reboot the CentOS VM to recover it.

    For example (KVP in 252 time is good, but KVP in 253 and 254 times become problematic)

    252 - Time:  8/26/2014 7:19:42 PM
    G2_CentOS7
    localhost                                                   FullyQualifiedDomainName
    3.1                                                         IntegrationServicesVersion
    10.1.145.190;192.168.122.1                                  NetworkAddressIPv4
    fe80::215:5dff:fe91:b902                                    NetworkAddressIPv6
    3.10.0-123.el7.x86_64                                       OSBuildNumber
    0                                                           OSDistributionData
    0                                                           OSDistributionName
    199168                                                      OSKernelVersion
    7                                                           OSMajorVersion
                                                                OSMinorVersion
    CentOS Linux                                                OSName
    129                                                         OSPlatformId
    3.10.0                                                      OSVersion
    x86_64                                                      ProcessorArchitecture
    253 - Time:  8/26/2014 7:19:42 PM
    G2_CentOS7
    localhost                                                   FullyQualifiedDomainName
    3.1                                                         IntegrationServicesVersion
    10.1.145.190;192.168.122.1                                  NetworkAddressIPv4
    0                                                           OSDistributionData
    0                                                           OSDistributionName
    199168                                                      OSKernelVersion
    129                                                         OSPlatformId
    254 - Time:  8/26/2014 7:19:44 PM
    G2_CentOS7
    0                                                           OSDistributionData
    0                                                           OSDistributionName
    199168                                                      OSKernelVersion
    129                                                         OSPlatformId

    I found the following patches and gave them a try, but the problem still remains after applying these patches and the (generation 2) Ubuntu 14.04 with Linux kernel 3.13 also has this problem.

     - Patch "Drivers: hv: util: Fix a bug in the KVP code" has been added to the 3.14-stable tree

     - Drivers: hv: vmbus: Fix a bug in the channel callback dispatch code

    But (generation 1) Ubuntu 14.04 VM with Linux kernel 3.17 doesn't encounter this problem after querying its KVP/IP information on Hyper-V host several times.

    Does anyone know what changes between Linux kernel 3.13 and 3.17 fix this issue?


    Thanks,

    Paul







    Tuesday, August 26, 2014 12:16 PM

All replies

  • Hi Paul, can you please double check if you applied the two patches correctly to 3.13 (you said  3.13+the 2 patches still didn't work)?

    Drivers: hv: vmbus: Fix a bug in the channel callback dispatch code:
    https: //git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=affb1aff300ddee54df307812b38f166e8a865ef

    Drivers: hv: util: Fix a bug in the KVP code:
    https: //git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9bd2d0dfe4714dd5d7c09a93a5c9ea9e14ceb3fcW

    With the 2 patches applied, we need to update hv_utils.ko and hv_vmbus.ko (assuming you didn't have them built into the kernel).

    Special NOTE: to make the patches take effect, after we apply the patches and run 'make; make modules_install' , we still need to re-generate the initrd image by mkinitrd or mkinitramfs, depending on which distro we use.  This is because typically a distro's default initrd image includes hv_vmbus.ko - if we only update the hv_vmbus.ko at  /lib/modules/`uname -r`/kernel/drivers/hv/hv_vmbus.ko by 'make module_install", the new hv_vmbus.ko won't be actually used.

    Wednesday, October 15, 2014 7:14 AM
  • Hi Paul

    I had issues just recompiling the modules when doing some testing CENTOS 6.5 and 7.0 and came to incorrect conclusions.

    Using rpmbuild its not too difficult even for a beginner like me.

    I used

    http://fedoraproject.org/wiki/Building_a_custom_kernel

    I would rebuild the kernel.  That is the safest way to know you are actually running the patched code.

    I am pretty sure your problem will be gone with the patches and a full kernel rebuild.

    And in any case I believe those patches are in 6.6 but I am not a Red Hat customer so I cant confirm.

    Thanks

    Mike

    Wednesday, October 15, 2014 8:26 AM
  • Hi Dexuan and Mike,

    Thanks for your help!

    I did the patch with rpmbuild on CentOS 6.5 (with kernel 2.6.32-431).

    However, the second patch cannot patch on CentOS6.5 since there is no target_cpu in such version.

    So we are patching that with the following similar patch but this issue still occurs. (also replace ko in initrd)

    https://lists.ubuntu.com/archives/kernel-team/2014-August/047725.html

    Can you tell me if any patch I could miss or any suggestion?

    Thanks for your time.

    Wednesday, October 15, 2014 9:25 AM
  • Hi Paul

    >also replace ko in initrd

    But you wouldn't need to do this is you rebuild the kernel.  Are you sure you are doing a full kernel rebuild.

    rpmbuild -bb --with baseonly --without debuginfo --target=`uname -m` kernel.spec

    You also need to build kernel-firmware dependency for 6.5 (not for 7.0 any more)

    rpmbuild -bb  --without doc--target=noarch kernel.spec

    Then install kernel-firmware followed by kernel then reboot.

    Re the patch are you saying that the diff fails on context of "target_cpu" as I can't see it in there.  Could you post the link of that patch that fails again.

    Thanks

    Mike

    Wednesday, October 15, 2014 9:42 AM
  • Hi Paul,
    Thanks for the quick feedback!

    Then it seems strange it doesn't work for you...
    Can you please add a WARN_ONCE(1, "poll_channel called") in the new function poll_channel(), and add a WARN_ONCE(1, "process_chn_event called")  in process_chn_event() and check 'dmesg' to 100% make sure the patches take effect? :-)

    Wednesday, October 15, 2014 10:07 AM
  • and can you please confirm your code has this one-line patch too(this is an old patch. I suppose the answer is yes, but let us confirm it)?

    Drivers: hv: Turn off batched reading for util drivers:

    --- a/drivers/hv/hv_util.c
    +++ b/drivers/hv/hv_util.c
    @@ -274,6 +274,16 @@ static int util_probe(struct hv_device *dev,
                    }
            }

    +       /*
    +        * The set of services managed by the util driver are not performance
    +        * critical and do not need batched reading. Furthermore, some services
    +        * such as KVP can only handle one message from the host at a time.
    +        * Turn off batched reading for all util drivers before we open the
    +        * channel.
    +        */
    +
    +       set_channel_read_state(dev->channel, false);
    +
            ret = vmbus_open(dev->channel, 4 * PAGE_SIZE, 4 * PAGE_SIZE, NULL, 0,
                            srv->util_cb, dev->channel);
            if (ret)

    Wednesday, October 15, 2014 10:18 AM
  • and in the case the issue still occurs, can you please dump the below info after the issue happens, Paul?

    (run the below commands with root)

    cd /sys/bus/vmbus/devices/vmbus_0_8/
    cat class_id
    cat in_intr_mask  in_read_index  in_write_index  in_read_bytes_avail  in_write_bytes_avail out_intr_mask out_read_index out_write_index out_read_bytes_avail out_write_bytes_avail
    (run the above cat command  for 3 times)

    Wednesday, October 15, 2014 10:24 AM
  • Hi Mike,

    I am still trying Dexuan's way to patch it and not yet rebuild full kernel.

    For the link of patch "Fix a bug in the KVP code" with "target_cpu" (doesn't work on CentOS 6.5), you can refer to https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9bd2d0dfe4714dd5d7c09a93a5c9ea9e14ceb3fc

    And we change to adopt the same name patch on Ubuntu, but still encounter the issue I said.

    https://lists.ubuntu.com/archives/kernel-team/2014-August/047725.html

    Thanks,

    Paul

    Wednesday, October 15, 2014 10:34 AM
  • Yes, my codes has this line.
    Wednesday, October 15, 2014 10:35 AM
  • HI Paul

    >And in any case I believe those patches are in 6.6 but I am not a Red Hat customer so I cant >confirm.

    Looking at https://bugzilla.redhat.com/show_bug.cgi?id=1118123

    Looks like the KVP fix was never picked . That bug report is a bit muddled and ended up reporting 2 bugs in 1. 

    1. fcopy bug

    2. kvp bug

    I think 1. was picked and 2. was meant to be opened as a new bug report but I never saw one.

    If they had picked the KVP fix I guess they would have found the dependency problem on target_cpu you described.  I checked the source for 2.6.32-431.29.2.el6 and couldn't find target_cpu as you said.

    So you could be coming up against something that may only get fixed in 7.1

    Dexuan may find you a patch set that works but I don't think you're going to see a released kernel with that fix in until 6.7 or 7.1

    In any case it may be good to check the fix works on 6.5 or 7.0 so it can be submitted as a tested fix in RHEL environment.

    Cheers

    Mike

    • Edited by Mike Surcouf Wednesday, October 15, 2014 11:16 AM
    Wednesday, October 15, 2014 11:01 AM
  • Hi Dexuan and Mike,

    Many thanks for your help!

    I finnally found that I replaced ko in the wrong initrd, and therefore, it didn't work.

    After I corrected it, these two patches work well.

    Thanks for your time,

    Paul

    Wednesday, October 15, 2014 11:37 AM
  • Great news.

    Hopefully Dexuan can get this pulled (the backported one without target_cpu).

    I think its too late for 6.6 though.

    Wednesday, October 15, 2014 11:43 AM
  • Great to know it works with the patches applied correctly! :-)

    As far as I know, the 2 patches (with the target_vcpu version) should be included in RHEL 6.6 (I checked 2.6.32-502.el6).

    Wednesday, October 15, 2014 1:46 PM
  • According to http s://access.redhat.com/articles/3078, RHEL6.6 was released with 2.6.32-504 on Oct 13, so the 2 patches should be in.

    RHEL7 was released on June 9, before the 2 patches appeared in the upstream (July 9). So the 2 patches are not in RHEL7, meaning RHEL7 doesn't work well with respect to KVP and hopefully RHEL 7.1 will fix this.

    Wednesday, October 15, 2014 1:55 PM