none
Virtual Machine is not reachable during complete live migration when *VMQ is enabled

    General discussion

  • I have a 4-node Hyper-V 2008 R2 SP1 cluster. Networks are like this:

    PS C:\management>  Get-ClusterNetwork | ft Name, Metric, AutoMetric, Role

    Name                                                 Metric                    AutoMetric                          Role
    ----                                                 ------                    ----------                          ----
    CSV Network                                            1000                          True                             1
    HB / LM Network                                        1100                          True                             1
    iSCSI Network A                                       10200                          True                             0
    iSCSI Network B                                       10300                          True                             0
    Management Network                                    10100                          True                             3

    HB / LM network is the (only) network that is assigned for Live Migration. All VM's are on CSV's (iSCSI).

    When migrating a VM via Live Migration from one node to another, when the migration starts (from 1%) until the migration is ended (100%), the VM is not available. A continuous ping shows a time-out. The Live Migration takes about 5 to 10 seconds. So the VM is not reachable for 5 to 10 seconds, which is much too long IMHO.

    What can be wrong? Why is the VM down during the complete LM, and not only a fraction of a second during the last phase of the LM? I need some help to troubleshoot this issue.

    Thanks.


    You know you're an engineer when you have no life and can prove it mathematically


    Wednesday, February 22, 2012 3:32 PM

All replies

  • Hi,
     
    Please check the following blog.
     
    Configuring Network Prioritization on a Failover Cluster
    http://blogs.msdn.com/b/clustering/archive/2011/06/17/10176338.aspx

    Vincent Hu

    TechNet Community Support

    Thursday, February 23, 2012 1:21 PM
    Moderator
  • Hi,
     
    Have you tried the suggestion? I want to see if the information provided was helpful. Your feedback is very useful for the further research. Please feel free to let me know if you have addition questions.

    Vincent Hu

    TechNet Community Support

    Monday, February 27, 2012 5:59 AM
    Moderator
  • I really don't understand it. I already showed this:

    Name                                       Metric               AutoMetric                     Role
    ----                                       ------               ----------                     ----
    CSV Network                                  1000                     True                        1
    HB / LM Network                              1100                     True                        1
    iSCSI Network A                             10200                     True                        0
    iSCSI Network B                             10300                     True                        0
    Management Network                          10100                     True                        3

    HB / LM network is the (only) network that is assigned for Live Migration. All VM's are on CSV's (iSCSI).


    That is what should be configured, shouldn't it? What would you like to have configured more? I think I have configured the cluster networks as suggested in the blog.


    You know you're an engineer when you have no life and can prove it mathematically

    Monday, March 05, 2012 10:42 AM
  • Actually, there is an exception: I have a 4-node cluster (all with the same network settings). But migrating from node 2 to one of the other nodes does not lead to downtime, whereas all other migrations (so also the migrations to node 2) does have downtime!


    You know you're an engineer when you have no life and can prove it mathematically

    Monday, March 05, 2012 1:43 PM
  • Nobody an idea?

    You know you're an engineer when you have no life and can prove it mathematically

    Monday, March 12, 2012 8:45 AM
  • Hi Staphan,

    this is strange. Can you look after any differences on node 2? Maybee you have:

    • other NIC
    • other Server
    • different Patche

    How is your storgage connected? iSCSI or FC?


    Grüße/Regards Carsten Rachfahl | MVP Virtual Machine | MCT | MCITP | MCSA | CCA | Husband and Papa | www.hyper-v-server.de | First German Gold Virtualisation Kompetenz Partner ---- If my answer is helpful please mark it as answer or press the green arrow.

    Monday, March 12, 2012 11:23 PM
  • I have found it!

    The difference was that on node 2 the *VMQ on the VM interfaces was disabled. I disabled *VMQ on all nodes, after which the live migration runs smoothly between all nodes.

    This defenitely raises the question: Wy does live migration not ru smoothly when *VMQ is enabled.

    Extra Info: The NICs are Intel I340-T4, with newest proset 64bit drivers installed. Servers are Hyper-V 2008 R2 SP1.


    You know you're an engineer when you have no life and can prove it mathematically

    Friday, March 23, 2012 8:19 AM
  • Hi Sephan,

    cool that you found the difference. Question is: what will happen if you turn on VMQ on both nodes?


    Grüße/Regards Carsten Rachfahl | MVP Virtual Machine | MCT | MCITP | MCSA | CCA | Husband and Papa | www.hyper-v-server.de | First German Gold Virtualisation Kompetenz Partner ---- If my answer is helpful please mark it as answer or press the green arrow.


    Friday, March 23, 2012 8:23 AM
  • Carsten,

    The problem still exists when having all nodes *VMQ enabled.

    Also the VMM job still result with the warning: 11037 There currently are no network adapters with network optimization available. 

    This is strange, because the wizards shows a checkmark for network optimization for this (and al other hosts, allthough not allways...). Why do  these contradictions show up? Is there a (3rd-party) tool to find out whether the optimizations are functioning? 


    You know you're an engineer when you have no life and can prove it mathematically


    Thursday, April 26, 2012 10:12 AM
  • Has anyone else tried enabling the *VMQ functionality? What are your findings in these? Positive or negative?

    You know you're an engineer when you have no life and can prove it mathematically

    Tuesday, May 01, 2012 6:30 AM
  • Hi Stephan,

    I'm in the same situation.

    I run Intel ET Quad Port Adapters with Intel PROSet driver version 17.0.200.2.

    My virtual network runs on a "Virtual Machine Load Balancing" Team with VMQ Enabled.

    One of the nodes of my cluster does not have an ET Quad Port Adpater so it does not have VMQ Enabled. I did some testing and live migration from a node with VMQ enabled gives me 5 or 6 ping time-outs. Live migration from the node without VMQ enabled gives only one ping time-out. This is normal behaviour as far as I know.

    As it seems it has definitely something to do with VMQ.

    Your warning 11037 means that there are no VMQ's left. The intel ET Quad Port Adapter has 8 VMQ's available. On the intel website you can find how many VMQ''s your adpater has.

    Hope this helps.

    Regards,

    DJITS.

    According to Intel they fixed the issue. Here you can download the latest driver: http://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=21228


    • Edited by DJITS Tuesday, May 15, 2012 7:44 AM New Information.
    Tuesday, May 15, 2012 7:05 AM
  • Hi DJITS,

    Thanks for your reply. As per intel documentation, there are 8 VMQ's per NIC indeed. Maybe this is the issue. In my case, I have decided to not enable the *VMQ functionality.


    You know you're an engineer when you have no life and can prove it mathematically

    Tuesday, May 15, 2012 1:27 PM