none
HealthService.exe CPU Spikes RRS feed

  • Question

  • Hi all,

    We are using SCOM 2007 R1 to monitor some BizTalk 2009 servers. These are Win2008 64bit R1 boxes that have 2 CPU(4 cores per CPU) with 24 gb of ram. We are seeing some significant CPU spikes on the Health service.  We have opened a case with Preimier Support and they have advised us thatKB  http://support.microsoft.com/kb/968967 should fix the issue.  We have done so but are still seeing some spikes.  We have been told that these spikes are "normal" but I am not entirely convinced.  Has anyone run into issues with CPU spikes and if so what did you do?  The spikes will hit 90%+ for a period of 20 seconds then come back down only to spike a gain a minute later.

    Also, I have read that a few people have rebuilt WMI as per this article... http://thoughtsonopsmgr.blogspot.com/2008/12/wmi-and-windows-2003-server.html has anyone attempted this?  These servers are about 3 months old.

    http://kentweare.blogspot.com
    Wednesday, November 4, 2009 12:31 AM

Answers

All replies

  • Hi.

    The blogposting you are refering to has been written by me. It is know that WMI has some issues on Windows 2003 server. How ever, I don't think this is the cause of what you are experiencing.

    Check out these postings. Perhaps they will aid you further:
    http://blogs.technet.com/kevinholman/archive/2009/07/20/do-you-randomly-see-a-monitoringhost-exe-process-consuming-lots-of-cpu.aspx

    and
    http://thoughtsonopsmgr.blogspot.com/2009/07/opsmgr-sp1-is-process-healthservice-of.html
    Best regards, Marnix Wolf

    (Thoughts on OpsMgr)
    Wednesday, November 4, 2009 8:08 AM
    Moderator
  • Thanks Marnix...will check those links out.

    Also, I have taken a snap shot of Perfmon while this behaviour has been happening and posted it to GoogleDocs.  Would you consider this to be "normal" CPU usage?

    http://docs.google.com/Doc?docid=0AYv1Bn0FGSFuZGduY2hrbTRfOGRqaHd0OGNt&hl=en
    http://kentweare.blogspot.com
    Wednesday, November 4, 2009 1:45 PM
  • Hi Kent,
    you're not alone, I guess you have spikes when 21025 events are recorded in the event log. It's an xml parsing issue and an MP issue aty the same time, I wrote a lot about it you can check more at this link http://nocentdocent.wordpress.com/2009/07/21/opsmgr-2007-r2-lessons-learned-reprise/ (it applies at SP1 as well)
    - Daniele, Microsoft MVP OpsMgr This posting is provided "AS IS" with no warranties, and confers no rights. http://nocentdocent.wordpress.com http://www.progel.it
    Wednesday, November 4, 2009 5:17 PM
    Moderator
  • Hi Kent.

    No, this isn't normal behaviour. Gladly Daniele Grandini has also replied. He has a lot of good information about it. Thanks Daniele!
    Best regards, Marnix Wolf

    (Thoughts on OpsMgr)
    Thursday, November 5, 2009 9:33 AM
    Moderator
  • Hi

    I work with Kent and we have applied the msxml6.dll fix that resolves the spinlock issue and we are still getting the spikes, in fact we have a BizTalk machine with 2 quad core processors and 24 GB or RAM and a SQL Server machine with 4 quad core procs and 32 GB of RAM that are grided to a crawl when working with any GUI tools on the servers and we are getting 100% cpu spikes from the healthservice.exe and monitoringhost.exe. 

    When we put each server in maintenance mode we can work with the GUI tools and perform BizTalk deploys, but once Maintenace mode is back on GUI tools really struggle.

    Our servers are Windows 2008 64 servers, we have 3 clustered BizTalk 2009 servers and 3 clustered SQL 2008 servers. 

    do you guys have any suggestions on what we can do next?  We have applied all recommendations  found in forums such as AV exclusions, disabling Shared Memory using TCP only, making some registry edits.

    Tushar
    Thursday, November 5, 2009 6:02 PM
  • Hi Tushar,
    we must split the issue in two parts. SPikes in healthexplorer are different from the monitoringhost ones. First of all I would suggest you check how many 21025 events you have on your RMS and on your agents. Take in exam a two hours period and let me know how many.
    Second check for all your nodes and see if there's any difference between them, for example you may see HS costantly high on one of them but not on the other two. In this case there's a known bug.
    Third check how many monitoringhost do you have in execution and check if the wmiprvse process is having spikes as well.
    Lastly check if your event logs, anyone of them, is registering tons of events in a small amount of time. Let's say tens per second.
    - Daniele, Microsoft MVP OpsMgr This posting is provided "AS IS" with no warranties, and confers no rights. http://nocentdocent.wordpress.com http://www.progel.it
    Saturday, November 7, 2009 11:30 AM
    Moderator