locked
Console Slow - Hammered with Alerts?? RRS feed

  • Question

  • I have a small pilot installation of SCOM completed.  I only have about 10 servers with agents and a few MP's installed.  Recently the console (specifically the alerts window) has become unusable.  I read the following post:
    http://social.technet.microsoft.com/Forums/en-US/operationsmanagerdeployment/thread/a00c95c1-6975-4892-a489-a95b6cfae92a

    After running the querry, it shows I have many repeat counts over 10000 (I assume this is bad).  It appears that I am getting slammed with alerts and SCOM can't handle it.  I am not even able to close active alerts I have open (over 250).  When I attempt to close one it hangs the console and I have to kill it from task mgr.

    Any ideas on what is causing this or what I can do to resolve this issue?

    Thanks!


    xtiyu32n
    • Moved by Rob Kuehfus Tuesday, December 8, 2009 9:17 PM You will get more help in the MP area on tuning. (From:Deployment)
    Tuesday, December 8, 2009 5:22 PM

Answers

  • Hi xtiyu32n,

    Close network connections from the RMS and then there will be no agents reporting to it. Then you can restart the System Center Management service on the RMS and let it run for lets say couple of hours to get the situation stabilized. (Remember that if you have more than single server solution, you need access to the OperationsManager database on SQL server)

    After the RMS has been able to stabilize the situation, you can open the console and check the alerts with most repeat count. Then you should resolve the issues creating the alerts.

    After this take all the unconfigured MP's away (for example sql server, active directory, exchange, iis). Then go and look at the Management Pack guides to check what are the pre-requisites and what you need to configure after importing.

    Good way is to go and checkt he discoveries for management pack after importing it. For example IIS MP has quite often discoveries of IIS servers which could lead to problems in bigger environments.

    Hope this helps!

    -Tero


    MCT | MCSE | MCITP | MCTS SCOM & SCCM
    • Marked as answer by knicosia Tuesday, December 8, 2009 9:50 PM
    Tuesday, December 8, 2009 9:48 PM
  • Turned out to be a DB sizing issue.  I did not have autogrowth on, so I had to increase the size and do some DB maintenance.  Once I did that, I was able to manage the alerts again.

    xtiyu32n
    • Marked as answer by knicosia Tuesday, December 8, 2009 9:54 PM
    Tuesday, December 8, 2009 9:54 PM

All replies

  • What's causing an alert with repaet count over 10000,you are the only one who nows what rule or monitor that triggered the alerts.
    So find them and correct the error or disable the monitor or rule for the server or servers causing the alert flod




    /Micke
    Tuesday, December 8, 2009 5:52 PM
  • There are several with repeat counts in the 100's or 1000's.  They should not be repeating themselves over and over again should they?

    xtiyu32n
    Tuesday, December 8, 2009 6:22 PM
  • Yes  the monitor or rule will trigger according to the nature of it's configuration, repeat count tells you how many times this happend(Instead of geting 100 separate alerts, you get repeate ount 100 on one alert).

    No you do not want to have alerts with that high repate count on a regular basis


    You have to tune your envirnoment, find the and correct problems or change the rules or minitors by override


    /Micke
    Tuesday, December 8, 2009 6:55 PM
  • Hi.

    Importing a MP is just a step of a whole process. Read this blogposting of mine: http://thoughtsonopsmgr.blogspot.com/2009/08/opsmgr-r2-management-pack-guide-rtfm.html
    Best regards, Marnix Wolf

    (Thoughts on OpsMgr)
    Tuesday, December 8, 2009 7:04 PM
  • The main issue appears to be that alerts are not resolving themselves.  I show many alerts as old as 30 days ago that are not valid any more.  For instance, my SCCM server shows 17 critical alerts spanning back over the past month.  Most of them are not valid.  I looked on there SCCM server, and there are now issues.   When I look at health explorer, the only valid alert I see is Send Queue % Used Threshold.  There are no others.

    I have varrified the settings for auto-resolving alerts, yet they still remain.  Why are they not resolving themselves?
    xtiyu32n
    Tuesday, December 8, 2009 8:10 PM
  • As Marnix wrote read the product manuals and MP guides


    /Micke
    Tuesday, December 8, 2009 9:20 PM
  • So, I have a unusable console (locked up) getting slammed with 1000's of alerts from 10 agents, none of which will auto-resolve, and your solution is "read the manual"??

    Thanks for nothing.

    xtiyu32n
    Tuesday, December 8, 2009 9:30 PM
  • Hi xtiyu32n,

    Close network connections from the RMS and then there will be no agents reporting to it. Then you can restart the System Center Management service on the RMS and let it run for lets say couple of hours to get the situation stabilized. (Remember that if you have more than single server solution, you need access to the OperationsManager database on SQL server)

    After the RMS has been able to stabilize the situation, you can open the console and check the alerts with most repeat count. Then you should resolve the issues creating the alerts.

    After this take all the unconfigured MP's away (for example sql server, active directory, exchange, iis). Then go and look at the Management Pack guides to check what are the pre-requisites and what you need to configure after importing.

    Good way is to go and checkt he discoveries for management pack after importing it. For example IIS MP has quite often discoveries of IIS servers which could lead to problems in bigger environments.

    Hope this helps!

    -Tero


    MCT | MCSE | MCITP | MCTS SCOM & SCCM
    • Marked as answer by knicosia Tuesday, December 8, 2009 9:50 PM
    Tuesday, December 8, 2009 9:48 PM
  • Turned out to be a DB sizing issue.  I did not have autogrowth on, so I had to increase the size and do some DB maintenance.  Once I did that, I was able to manage the alerts again.

    xtiyu32n
    • Marked as answer by knicosia Tuesday, December 8, 2009 9:54 PM
    Tuesday, December 8, 2009 9:54 PM
  • Hi.

    First of all: sorry.

    Perhaps my answer was a bit too short. All I meant to say is that SCOM is a product which needs preparation and good understanding before it is implemented, whether in a lab or production environment. Somehow your question gave me the feeling that not all was prepared very detailed. Thus my answer. Next time I will give more details.

    Finally you found the cause (the DB size). Also this detail needs to be prepared in advance. So when you go to production with SCOM be sure to know in advance how many objects you are going to monitor. Based on that amount you can calculate the size of the DB. There are many good resources for it like:

    http://blogs.technet.com/momteam/archive/2009/08/12/operations-manager-2007-r2-sizing-helper.aspx

    And be sure to have a good understanding of all aspects of SCOM. It is a great product but has a bit more steep learning curve compared to notepad :)

    All SCOM R2 documentation: http://technet.microsoft.com/en-us/opsmgr/bb498235.aspx

    Hope this helps when going to production with SCOM.

    Best regards, Marnix Wolf

    (Thoughts on OpsMgr)
    Wednesday, December 9, 2009 6:19 AM