none
Windows Server 2012 HyperV VMs becoming extremely sluggish and slow

    Question

  • I have recently set up a new system for a client consisting of a single Dell T320 server with Xeon E5-2350 2.2GHz 6 core cPU and 32GB of RAM.  Windows Server 2012 Standard is installed on the bare metal, and runs only the HyperV role.  It is not joined to any domains.

    It hosts two VMs, one a Windows Server 2012 Standard VM containing the RDS related roles acting as remote desktop server, with 10GB RAM allocated and 3 cores.  Only 5 people use the terminal server concurrently.

    The second VM is a Windows Server 2008 R2 based primary domain controller, with 16GB of RAM allocated and 3 cores.  It runs the AD role, as well as Exchange and GFI MailEssentials 2012.  Avira is also installed on this VM.

    The problem is that the system will function properly for 3-4 days, then suddenly it goes into a state where a multitude of applications would start consuming inordinate amount of CPU time, and become very, very slow.  Almost like the hypervisor goes into a software emulation mode where everything runs like molasses.  This affects BOTH VMs equally.  For example, Exchange and GFI on the Windows 2008 R2 VM together consume about 25% CPU, most of it is spent in kernel CPU time.  At the same time, in the other Windows Server 2012 VM, svchost, Taskmgr, WMIPrvSE, LogonUI, Explorer,  sqlservr, wsmprovhost etc. will all consume a total of about 70% CPU time, this time 99% of this is user time, not kernel time.  When one user starts any application, be that internet explorer, firefox or TaxPrep, the CPU spikes up to 100%.  

    All while nobody is using the system, the system is idle and no mails are flowing in or out of the system.  A reboot of the VMs does not fix anything, even a reboot of the host server does not fix anything.  When I yanked out the two power supplies and performed a hardware diagnostic (which came back 100% OK), it booted up and the slowness was gone for 3 days, until today when it started acting up again like I described above.  

    One more thing, when the system behaves normally the sunspider javascript benchmark in IE10 takes 316ms to execute (average) on the HyperV host operating system.  When the system goes slow like this, I pause the two VMs in HyperV then run sunspider again, only to be greeted by an average time of 5084ms.  So this clearly does not only affect the virtual machines but the host operating system as well. My gut still tells me the hypervisor is getting messed up somehow.

    Any ideas?  This is a major problem and I have no idea what is causing it.  


    • Edited by pwnell Saturday, November 03, 2012 5:32 PM
    Saturday, November 03, 2012 5:19 PM

Answers

  • You say that you are running Avira in one of the VMs.  Are you running it on the parent partition?  If so, I don't see anythig on their web page that says it is supported on Windows Server 2012.  At a minimum, you should also exclude all VM related files.

    3 vCPUs should work, but I don't know how optimum it would be.  Give a try to 4 vCPUs per VM (you are not limited to the number of physical Cores on the host).  I know, this sounds strange, but I have never tried running with 3, and I don't know how much testing has been done with the operating system with an odd number of vCPUs.


    tim

    Saturday, November 03, 2012 9:04 PM
  • Sounds like you need to log a call with Microsoft support so they can work through debugging the issue.


    tim

    Wednesday, November 14, 2012 2:07 AM

All replies

  • You say that you are running Avira in one of the VMs.  Are you running it on the parent partition?  If so, I don't see anythig on their web page that says it is supported on Windows Server 2012.  At a minimum, you should also exclude all VM related files.

    3 vCPUs should work, but I don't know how optimum it would be.  Give a try to 4 vCPUs per VM (you are not limited to the number of physical Cores on the host).  I know, this sounds strange, but I have never tried running with 3, and I don't know how much testing has been done with the operating system with an odd number of vCPUs.


    tim

    Saturday, November 03, 2012 9:04 PM
  • Avira runs in one VM, the Windows 2008 R2 VM as it is not compatible with Windows 2012, hence it is not installed at all in the other VM and neither in the host OS.  I therefore do not believe it affects anything.  Also consider the sunspider benchmark - in the host OS with no avira it runs more than 15 times slower than usual.  3CPUs work perfectly well for a couple of days, then suddenly EVERYTHING goes slow so I am also not convinced the 3 CPUs are the issue.
    Sunday, November 04, 2012 3:35 AM
  • Did you resolve this problem?  I'm thinking of going down the same route as you with Server 2012 with 2 vm's to replace our aging SBS2003 server setup.

    Monday, November 12, 2012 2:10 PM
  • Nope, no resolution.  It happened once again a week ago. That time it self resolved after a couple of hours.  But I have no further insight and it is still a major problem.
    Monday, November 12, 2012 5:33 PM
  • Sounds like you need to log a call with Microsoft support so they can work through debugging the issue.


    tim

    Wednesday, November 14, 2012 2:07 AM
  • Hi Everyone,

    I just configured a 2012 Cluster and got similar performance issues. 
    Did you find out what caused it?

    Best Regards,
    Jens

    Friday, November 16, 2012 2:01 PM
  • "even a reboot of the host server does not fix anything"

    "when I yanked out the two power supplies ... the slowness was gone for 3 days..."

    There's something going on with the hardware, BIOS, drives, controllers, etc. which degrades over time and survives a reboot but is reinitialized when completely powered off. Other than making sure all the firmware, drivers and related software are up to date, and randomly swapping components with different, more compatible components, sorry I can't be more helpful than that.

    I have seen similar issues occasionally, I'll reboot a system and it's noticeably slower even during the POST process before ever loading an OS; pull the plug and it's good as new.

    Friday, November 16, 2012 3:35 PM
  • Nope, still have not found the cause.  I am not so sure it is hardware, this issue has re-appeared and a simple hot reboot fixed it twice now.  The components are as compatible as they come - it is a stock Dell server.
    Friday, November 16, 2012 6:48 PM
  • We have had the EXACT same thing happen with a Dell T320 on two separate servers. The first one was running SQL Server 2008 with nothing else funny going on. It ran fine for a month or so, then it developed this slowness issue. We finally installed SQL on a loaner box (without Hyper-V) and it worked fine. We brought the T320 back to our shop and tested the hell out of it with Dell's assistance and it was NTF. Took it back onsite and moved an SBS 2011 VM over to it. After a few days, it ground to a halt. CPU utilization was 100%. I started shedding services, killing Exchange, Sharepoint, etc. Even stripped to the essentials, the CPU was still up in the 60-70% range. 

    I put a T310 out onsite and downed the guest SBS vm. I copied the HUGE vhd over to the T310 (running Server 2008r2 with the Hyper-V role) and it ran perfectly. CPU at about 2-4%, spiking to 15% occasionally.

    We finally concocted a story that the USB controller was bad, so they replaced the motherboard (not the CPUs) and it all seems to be working OK now after we sunk at least 100 man-hours into the problem.

    Our second one is a similar setup. T320, Server 2012, Hyper-V role, SBS 2011 Standard and a couple of member servers running basic apps. No one can find any problem and everyone acts like this is something new, although my posting comes one year after Tim's original posting.


    MCP SBSC

    Tuesday, November 05, 2013 3:00 PM
  • Hello All

    We have been experiencing this issue with ALL our recent Dell T320 deployments (3 to be exact).  This is some sort of issue it appears with the system it's self.  We had some stability after upgrading to latest firmware on all system resources available from the Dell web site.  As well we changed the TCP Offload setting to "Disabled" on the Brodcom NIC's which looked to have helped initially but the issue came back after a few day's.  We are still thinking it is something to do with the Brodcom NIC that is causing this issue but hopefully Dell support will be able to assist us. 

    Thanks 

    Sunday, November 17, 2013 6:40 PM
  • We also have 16 Dell T320's with this exact same problem as listed in this thread. We've tried everything from firmware updates to driver updates. The problem comes, and just as mysteriously leaves. It seems to disappear on it's own. The hyper-v guests are pegged at 100% The hyper-v host is very slow. All logs on the hyper-v host claim everything is normal. We are on Server 2012 for the Hyper-V host and guest.
    Wednesday, November 20, 2013 5:19 PM
  • What the the resource monitor show on the host machine while the issue happens? What are the chances of I/O hitting a bottle neck on the host machine which is causing the slowness?
    Thursday, November 21, 2013 3:46 AM
  • Hi,

    Did you ever find a solution to this?

    I have exactly the same issue with wsmprovhost.exe using up 50% CPU when the server becomes slow and applications start freezing etc.

    Thank you

    Monday, April 07, 2014 9:37 AM