none
VM's disconnecting from network

    Question

  • Hello,

    I recently created a remote desktop collection using windows server 2012 R2 on an HP ProLiant BL620c G7 Blade Server (which has completely up to date firmware and drivers). There are 30 VM’s running on the server, and it is running great – for the most part. My issue is that, it seems to be at random (maybe once every 48 hours), users will be disconnected from their session, and they are unable to log back into their session remotely. I can log into their session as a local user by going through hyper-v, however once I am in there I notice that I have no internet or network access. To solve this problem, I have to go onto the network adapter of the VM, disconnect the virtual switch assigned to the VM, and reconnect it again. Doing this provides remote access to the VM again.

    I initially thought the issue came from the configuration of the virtual switch. The HP Blade Server has 4 10 gig NIC’s, and they were teamed into two teams (each team having 2 physical NIC’s). I created two separate external network virtual switches from each Team. I also had “allow management operating system to share this network adapter” checked. I noticed a warning on the server side when a disconnect would occur which was Event 16945 “MAC Conflict: A port on the virtual switch has the same MAC as one of the underlying team members on Team Nic Microsoft Network Adapter Multiplexor Driver.” Because this error occurred at the same time as a disconnect, I assumed that this warning message was causing the problem, and I was able to resolve it by redoing the NIC team to where there of the physical NICs were teamed together and a virtual switch was created from them without the checkbox that allowed management operating systems to share the network adapter. This left one NIC available for management purposes, and I haven’t seen the error message since, however disconnects still continue occur.

    I do not see any helpful information in the event logs of the client VM when one of these disconnects occur, and the users are doing nothing out of the ordinary that would cause it to happen. In fact this problem has happened when a user has been disconnected from their machine for over 10 hours.

    Anyone have any thoughts as to what is going on here?

    Monday, January 06, 2014 9:01 PM

Answers

  • Hi Peter,

    I am also in the same situation with HP BL360C Gen8 servers. As per my understanding, this is an issue with the Network Drivers which dont properly work along with VMQ. I hear many of us made a workaround by disabling VMQ.

    The issues which I faced specific to VMQ related with this issue are

    Event 106 - The processor sets overlap when LBFO is configured with sum-queue mode.

    Fixed this by allocating different set of processors using Set-NetAdapterProcessor.

    The next issue was Event 49.

    HP FlexFabric 10Gb 2-port 554FLB Adapter #5 : RSS is limited to 4 queues. Enable Advanced Mode in the PXE BIOS to use up to 16 queues. This may require a firmware update.

    Fixed this by enabling advanced mode in PXE Bios.

    After this two steps, I see significant difference as the occurrence reduced - how ever not a full fix. Still the servers which utilize more traffic gets disconnected - In my infrastructure, mainly SharePoint front end servers and SEP (endpoint protection) servers.

    At this point, It may be safe to disable VMQ if the situation don't allow you to take that risk.

    As per the update on Hyper-v.nu, the latest driver from emulex don't fix this issue. We may need to wait for some more time to get Microsoft/HP/Emulex comeup with a fix.

    Good luck !


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog

    Saturday, January 18, 2014 3:01 PM

All replies

  • Hi PerterEdge,

    Please try to uncheck "Allow management operation system to share this network adapter" on your external virtual switch .

    Hope this helps

    Best Regards

    Elton Ji


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

    Tuesday, January 07, 2014 3:07 PM
    Moderator
  • Thanks for the response, Elton.

    As I stated in my initial post, I did uncheck this option, which caused the warning messages to go away on the server side, however the disconnects still continue to occur.

    I notice these disconnects to happen at odd hours, such as 5:00 AM when the user has not been working from the machine for 10+ hours.

    Any further insight would be much appreciated.

    Thanks.

    Tuesday, January 07, 2014 3:30 PM
  • Hi PeterEdge,

    Please try to ping the disconnected VM from another VM that connecting to the same Vswitch as the issue VM when the problem arises .

    Any further information please post here .

    Best Regards

    Elton Ji


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

    Wednesday, January 08, 2014 10:49 AM
    Moderator
  • Hi Elton,

    The VM only has one virtual switch (external) connected to it. Therefor when connectivity is lost from that VM, all methods of communication are cut off, so I can not ping it, even from a VM connected to the same Vswitch.

    Travis

    Wednesday, January 08, 2014 9:14 PM
  • Are all users disconnected at the same time, or is it random?

    I had an issue with this where I was connecting to VMs that were running in a cluster.  If I connected by IP address, I would sometimes lose connectivity if the machine was moved to another node in the cluster and sometimes getting a different IP address.  My original system would still have the original MAC address and would be unable to access.  So I changed to accessing by DNS name and no longer had the issue.  If I had disconnected and reconnected the switch, as you are doing, the result would have been similar in that the DNS would get updated and MAC addresses would have been refreshed across the board.

    But, if all your VMs are getting disconnected at the same time, it doesn't sound like that would be the problem.

    Sounds like you might need to open a case with Microsoft.


    .:|:.:|:. tim

    Wednesday, January 08, 2014 10:20 PM
  • HI,

    Any event log in the VM can be found ?

    RGDS

    Thursday, January 09, 2014 2:49 AM
  • Hi

    We have the same problem with IBM and Emulex adapter. 

    We are waiting for new drivers that hopfully will fix the problem.

    You can read more about similar problems here:

    http://www.hyper-v.nu/archives/pnoorderijk/2013/11/the-story-continues-vnics-and-vms-loose-connectivity-at-random-on-windows-server-2012-r2/

    http://www.hyper-v.nu/archives/hvredevoort/2013/12/november-2013-ur-kb2887595-causing-network-problems/

    -Robert

    • Proposed as answer by Shabarinath Monday, January 27, 2014 4:01 AM
    Thursday, January 09, 2014 11:29 AM
  • Tim,

    It is random when a user disconnects. Not on the time of day, but also which user. It doesn't seem to be specific to any single VM in the collection, I feel like it can happen at any point to any VM in the collection. 

    That's interesting about your cluster issue. Currently there is only one server hosting the VM's but we are looking to add another. We only connect to the machines through either the machine name or through remote desktop gateway services.

    Thanks for the info...

    Thursday, January 09, 2014 7:59 PM
  • Hello,

    Nothing valuable comes from the event log on the VM. I notice some UDP offloading warning messages but these only occur after the disconnect occurs, never before. Everything leading up to the disconnect seems normal on the VM.

    Thursday, January 09, 2014 8:01 PM
  • Robert,

    Thank you so much for these links. This is exactly my problem! The NIC's on the server are Emulex as well...it seems to be quite the issue. I just received new drivers for them, and will install them as soon as I can. 

    I'll also look into trying the options suggested in those links. I really appreciate the feedback!

    I'll let you know how it goes.

    Travis

    Thursday, January 09, 2014 8:04 PM
  • We have been in direct contact with IBM and Emulex regarding this issue. 

    We have tested with driver 10.0.430.1003 with no luck. 

    What driver version have you got now?

    -Robert

    • Proposed as answer by Shabarinath Monday, January 27, 2014 4:01 AM
    Friday, January 10, 2014 7:55 AM
  • Hi Robert,

    It's nice to know we are not the only ones facing this issue. So I'm actually not entirely certain that I've properly installed the drivers now. I used the HP Emulex 10GbE iSCSI Driver for Windows Server 2012 R2 found on their website (with help from HP support). The version is 4.9.160.0 from Jan 3, 2014. Although after installing it we're still experiencing disconnects.

    When I look at the NIC in device manager the driver version says 10.0.430.570 (which is from 5/14/2013). I'm not sure why this isn't updating. 

    I am kind of scared to uninstall that driver since it is in a production environment at the moment. 

    Perhaps I should get in touch with Emulex myself.


    Friday, January 10, 2014 8:26 PM
  • We are not using ISCSI so we are using Nic driver, not ISCSI.

    I have been in contact with Emulex today and they have not released the driver who should fix this problem.

    We have heard from some large hoster that had the same problem, they have tested several nics and they have concluded their tests with Qlogic as the only Nics that works. 

    We can't wait anymore and have now ordered new qlogic cards, hopefully they will work.

    -Robert

    Saturday, January 11, 2014 6:06 PM
  • I've run into this issue before with one of my customers. We resolved this issue initially by uninstalling and reinstalling Hyper-V. However, the problem started appearing again at multiple sites. The other sites (built with the same image worked fine).

    As it turns out, we had multiple causes for this issue at different sites:

    • Port Security was turned on in the network switch for the port that dedicated to the Hyper-V host
    • Anti-Virus was causing the issue on another host.
    • Port Mirroring was enabled in switch connected to the Hyper-V host.

    Hope this helps!

    -Bill


    • Edited by Bill Curtis(MSFT) Sunday, January 12, 2014 11:25 PM Forgot a word (or two)
    Sunday, January 12, 2014 4:25 PM
  • Robert,

    Sorry for the delay. That is rough that you're needing to order qlogic cards at this point...kind of disappointing. I think I am going to try updating the drivers with the Emulex OneInstall kit which includes all drivers (ethernet driver is version 10.0.430.1047) and see how that goes.

    Have you tried anything regarding Virtual Machine Queue? I have read in a number of places that VMQ is the source of problems. I disabled it on a few machines that seemed to be frequent offenders, and so far I've been lucky and not seen a disconnect on one of them. I'll keep you posted because I don't imagine that is the solution. Are you using VMQ on all of your VM's?

    I realize the driver probably won't help, but I figure it's worth a shot. It may be an option for us to order new NIC's but I would consider that a last resort. We are living fine for now.

    We actually used the same exact machine with Windows Server 2012 (not R2) and had no problems with the NIC. I think we would maybe try to figure out a way to revert the OS before we would order new NIC's. At this point though we are just going to keep testing with VMQ disabled and pray that a driver for the emulex card comes out that resolves this issue.

    Thanks for your input, Robert.

    Friday, January 17, 2014 6:15 PM
  • Hi Peter,

    I am also in the same situation with HP BL360C Gen8 servers. As per my understanding, this is an issue with the Network Drivers which dont properly work along with VMQ. I hear many of us made a workaround by disabling VMQ.

    The issues which I faced specific to VMQ related with this issue are

    Event 106 - The processor sets overlap when LBFO is configured with sum-queue mode.

    Fixed this by allocating different set of processors using Set-NetAdapterProcessor.

    The next issue was Event 49.

    HP FlexFabric 10Gb 2-port 554FLB Adapter #5 : RSS is limited to 4 queues. Enable Advanced Mode in the PXE BIOS to use up to 16 queues. This may require a firmware update.

    Fixed this by enabling advanced mode in PXE Bios.

    After this two steps, I see significant difference as the occurrence reduced - how ever not a full fix. Still the servers which utilize more traffic gets disconnected - In my infrastructure, mainly SharePoint front end servers and SEP (endpoint protection) servers.

    At this point, It may be safe to disable VMQ if the situation don't allow you to take that risk.

    As per the update on Hyper-v.nu, the latest driver from emulex don't fix this issue. We may need to wait for some more time to get Microsoft/HP/Emulex comeup with a fix.

    Good luck !


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog

    Saturday, January 18, 2014 3:01 PM
  • This is great insight, Shabarinath. I have disabled VMQ on all VM's right now.

    I just glanced through my event viewer and noticed a number of occurrences of both of those errors. I imagine they will go away after entirely disabling VMQ? I would like to test the fixes you suggested to see if that decreases the number of drops, however things seem to be running smoothly with VMQ disabled, and there seems to be no performance impact. 

    I will reply back in a week to update you all on our status after disabling VMQ. 

    Thanks for your time.

    Wednesday, January 22, 2014 6:25 PM
  • Hi Peter,

    To my understanding, once you disable VMQ on the adapters, the other events should go off. Once you enable VMQ, you may need to set these configured.

    I did disabled VMQ on few nodes on my cluster and seems good as of now. I had talked with HP on this today and they don't have any clarity yet. However, they confirmed that they have involved Microsoft and Emulux team and hopefully we will get a fix soon.

    Cheers !


    Optimism is the faith that leads to achievement. Nothing can be done without hope and confidence.


    InsideVirtualization.com

    Wednesday, January 22, 2014 6:30 PM
  • This is great to hear!
    Wednesday, January 22, 2014 6:42 PM
  • Too bad I didn't read the mentions of this issue here and on hyper-v.nu before I suffered with this the last 6 months, although I did sort out that disabling VMQ would get us back on level ground a while back. But still, misery loves company, and I thought it was just me (well, me and the rest of my team).

    But....I'd really like to use VMQ. Anyone hear anything new from HP and/or Emulex? The same bad firmware is on the HP download site as of today.

    Funny how I saw no mention of this problem when checking their known issues, nor even a mention from HP after HP escalation worked with us on the Flex 10 pause frame issue we saw prior to the VC firmware that handles that better.

    Is there but a handful of us trying to deploy 2012 R2/Hyper-V on HP G8's using the Emulex NICs? (Don't get me started on adding Flex-10 to the mix.) It really seems so, sometimes, based on the lack of documented real world installs out there.

    Thursday, April 10, 2014 9:38 PM
  • Hi blautens,

    I know how you feel...this is exactly what we were going through a few months ago. There is very little documentation out there regarding 2012 R2 Hyper-V deployments on these machines. I have not revisited the issue for a couple weeks, but to my knowledge, there is still no driver update from HP or Emulex. 

    Here are the most recent drivers from Emulex:

    http://www.emulex.com/downloads/emulex/drivers/windows/windows-server-2012-r2/drivers/

    But it would appear that the most recent release is from February 2014.

    Pretty frustrating experience in all, but at least it is functional without the VMQ option.

    Friday, April 11, 2014 1:41 PM
  • Hi Blautens,

    I did opened a case initially with HP and as per the last update (I think in Feb), HP, Emulex and Microsoft is working together for a fix.

    I haven't got any update for this case after that. In Jan, HP did provided a beta firmware which I was not comfortable to apply on my production servers.

    We have few 2012 R2 clusters now running on HP Blade. VMQ is the only issue which we had faced yet specific to Hyper-V or OS - though we had multiple issues with the Bl360c servers and Flex module in this short span.

    Lets hope that we will get a fix for this issue soon !

    Cheers !


    Optimism is the faith that leads to achievement. Nothing can be done without hope and confidence.


    InsideVirtualization.com

    Friday, April 11, 2014 7:18 PM
  • I also am dealing with the same issue... Saw somewhere that disabling all of the offloads will resolve the issue while keeping VMQ enabled.  Also I am told via the case I have open with MS, that RSS and VMQ are mutually exclusive and must be disabled.  I thought I had it fixed by resolving the processor overlap, Duplicate MAC error and disabling all NIC offloads.  Once I turned offloads back on, they went back to failing immediately (network drops off completely) upon live migration.  Emulex 10GB cards, IBM servers, and Microsoft NIC teaming...

    Any news of a resolution???

    Friday, April 25, 2014 8:17 PM
  • Hi All,

    Emulex finally updated on this issue through their blog.

    http://blogs.emulex.com/implementers/2014/06/19/microsoft-windows-20122012-r2-hyper-vms-losing-network-connectivity-workaround/

    Marcel Van has also put a detailed note on this issue on the blog - http://up2v.nl/2014/06/16/hyper-v-2012-r2-virtual-machines-lose-randomly-network-connections-be-carefull-with-emulex-nics/

    A new firmware which is expected to release by July mid and awaiting for that.

    Cheers !


    Optimism is the faith that leads to achievement. Nothing can be done without hope and confidence.


    InsideVirtualization.com

    Sunday, June 22, 2014 6:41 PM
  • IBM already offers updated firmware and driver:

    Firmware: https://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=MIGR-5095660&brandind=5000008
    Driver: http://www.ibm.com/support/fixcentral/quickorder?product=ibm%2Fsystemx%2F7875&fixids=elx_dd_nic_ibm14a-10.2.261.11-6_windows_32-64&source=dbluesearch

    File details
    Version: oc11-10.2.261.36-1
    Release Date: 2014-06-16

    ...

    ===============================================================================
    Emulex OCe11xxx UCNA Firmware Package ===============================================================================

    Firmware Version: 10.2.261.36
    Supported On: IBM System x, BladeCenter, and Flex

    ...

    Change history

    Emulex OCe11xxx UCNA Firmware Package

    Firmware Version: 10.2.261.36

    Supported On: IBM System x, BladeCenter, and Flex

    Problems Fixed:

    FRU VPD fields are now properly populated on CN4054 and CN4054R
    Fixed UE resulting slow boot at UEFI splash screen and NIC devices removed
    When using VMQ with Windows Server 2012 or 2012R2, the user may experience VM connectivity loss, packet drops, system hangs, inability to shutdown VMs and possible system crashes on shutdown. The complete solution also requires 10.2 based Windows NIC driver.
    ...

    Wednesday, July 16, 2014 6:45 AM
  • http://blogs.emulex.com/implementers/

    http://www.emulex.com/downloads/emulex/drivers/windows/windows-server-2012-r2/previous-releases/july-2014-special-release/

    http://www-dl.emulex.com/support/elx/rt99/b15.5/docs/fw_win_relnotes_be_elx.pdfs

    --
    Resolved Issues
    1. This special release of Windows NIC driver version 10.0.430.1321 and firmware
    version 4.6.142.13 addresses issues that caused Hyper-V VMs to lose network
    connectivity when the VMQ feature was enab
    led. You must update both the driver
    and the firmware.
    --

    <small class="entry-sub">Posted June 19th, 2014 by Mark Jones</small>

    UPDATE as of 8/4/14: We are pleased to inform you that the July 2014 Special Release for Windows Server 2012 and Windows Server 2012 R2 CNA Ethernet Driver is now available for Emulex branded (non OEM) OCe111xx model adapters. Please refer to this link to download the driver kit and firmware. Please read and follow the special instructions within the Release Notes.  For non-Emulex branded adapters, please contact Emulex Tech Support here.

    ~~

    UPDATE AS OF 7/23/14: Emulex is in the process of rolling out updated Microsoft Windows 2012 and 2012 R2 VMQ solutions for our customers.  Testing of a Windows WHCK certified NIC driver update will be completed in 1-2 weeks. This initial “hotfix” will be for Emulex branded OCe11102 and OCe11101 products and will include a required firmware update.  As testing completes on hotfix solutions for additional product configurations, notices and links will be posted on this blog.  Thanks for your continued patience.

    Wednesday, August 27, 2014 1:16 PM
  • http://blogs.emulex.com/implementers/

    http://www.emulex.com/downloads/emulex/drivers/windows/windows-server-2012-r2/previous-releases/july-2014-special-release/

    http://www-dl.emulex.com/support/elx/rt99/b15.5/docs/fw_win_relnotes_be_elx.pdfs

    --
    Resolved Issues
    1. This special release of Windows NIC driver version 10.0.430.1321 and firmware
    version 4.6.142.13 addresses issues that caused Hyper-V VMs to lose network
    connectivity when the VMQ feature was enab
    led. You must update both the driver
    and the firmware.
    --

    <small class="entry-sub">Posted June 19th, 2014 by Mark Jones</small>

    UPDATE as of 8/4/14: We are pleased to inform you that the July 2014 Special Release for Windows Server 2012 and Windows Server 2012 R2 CNA Ethernet Driver is now available for Emulex branded (non OEM) OCe111xx model adapters. Please refer to this link to download the driver kit and firmware. Please read and follow the special instructions within the Release Notes.  For non-Emulex branded adapters, please contact Emulex Tech Support here.

    ~~

    UPDATE AS OF 7/23/14: Emulex is in the process of rolling out updated Microsoft Windows 2012 and 2012 R2 VMQ solutions for our customers.  Testing of a Windows WHCK certified NIC driver update will be completed in 1-2 weeks. This initial “hotfix” will be for Emulex branded OCe11102 and OCe11101 products and will include a required firmware update.  As testing completes on hotfix solutions for additional product configurations, notices and links will be posted on this blog.  Thanks for your continued patience.

    Thanks for the update, ManServ. I will try this out as soon as possible!
    Wednesday, August 27, 2014 1:25 PM
  • Hello Folks,

    Enjoy the Emulex fix driver for VM disconnecting from Network, just announced :)

    HP Emulex 10/20 GbE Driver for Windows Server 2012 R2 10.2.452.1 cp025338.exe ftp://ftp.hp.com/pub/softlib2/software1/sc-windows/p485589684/v103795

    HP Emulex 10/20 GbE Driver for Windows Server 2012 10.2.452.1 cp025337.exe ftp://ftp.hp.com/pub/softlib2/software1/sc-windows/p577600570/v103794

    HP Firmware Flash for Emulex Converged Network Adapters - Windows (x64) 10.2.340.25 cp025355.exe

    https://h20566.www2.hp.com/portal/site/hpsc/template.PAGE/public/psi/swdDetails/?spf_p.tpst=swdMain&spf_p.prp_swdMain=wsrp-navigationalState%3Didx%253D0%257CswItem%253DMTX_3c9fe31fd8fa4ee597043e145f%257CswEnvOID%253D%257CitemLocale%253D%257CswLang%253D%257Cmode%253D3%257Caction%253DdriverDocument&javax.portlet.begCacheTok=com.vignette.cachetoken&javax.portlet.endCacheTok=com.vignette.cachetoken

    Hope this help.

    Regards,

    Charbel Nemnom

    MCSE, MCS, MCSA, MCP, MVP

    Blog: www.charbelnemnom.com

    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.


    Regards,

    • Proposed as answer by CHAROX Friday, November 21, 2014 3:58 PM
    Friday, November 21, 2014 3:58 PM
  • Hi Elton Ji,

    I'm facing the same issue. I checked pinging from the server which is connected to the same node and have no errors. But if I ping from another server or PC, I'm getting "request timed out"

    A prompt reply will be highly appreciated.

    Tuesday, March 03, 2015 10:21 AM
  • Hello Ramseed,

    Please investigate deeper, did you change the Virtual Network adapter MTU to 9000?

    Are you using jumbo frames?

    Theses problem occur only when VM are on different Hosts.

    Problem disappear when resetting TCP stack or when placing VM on the same host.

    Please look into the following article that will help you:

    http://blogs.technet.com/b/askpfeplat/archive/2014/12/01/psa-incorrect-mtu-size-causes-connectivity-issues-with-windows-server-2012-and-windows-server-2012-r2.aspx

    Hope this help.

     

    Regards,

    Charbel Nemnom

    MCSE, MCS, MCSA, MCP, MVP

    Blog: www.charbelnemnom.com

     

    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.


    Tuesday, March 03, 2015 12:50 PM