System Center TechCenter > System Center - Operations Manager Forums > Operations Manager - General > Why some Aggregate Monitors stay in Critical State when subordinate monitors are Healthy?

Answered Why some Aggregate Monitors stay in Critical State when subordinate monitors are Healthy?

  • Monday, October 05, 2009 5:31 PM
     
     
    Many times a monitored device or service will be in Critical state when it's not supposed to.  In Health Explorer I see that an Aggregate monitor (e.g. Availability) is in Critical state, but when I drill down to all subordinate monitors they are all Healthy.  Is there a time lag for an Aggregate monitor to switch states?  How do I cause Aggregate monitors to syncronize with their subordinate monitors?

Answers

  • Tuesday, October 06, 2009 3:53 PM
    Moderator
     
     Answered

    There are some upcomming releases that aim at improving reliability of dependency monitors, but I'm still investigating aggregate monitors in one repro case provided to me recently. Aggregate monitors supports resetting the state of its unit monitors (unlike dependency monitors, those are not reset-able), so maybe forcing a state change on the unit monitor (either thru reste or thru maintenance mode) can help synchronization.

    I also somewhat recently posted tool called "Runtime health explorer" to show what are health states recorded in local agent cache, maybe you can try to use it and compare its results to health explorer (state recorded in the DB)


    Marius Sutara
    My MSDN blog

    This posting is provided "AS IS" with no warranties, and confers no rights. Use of attachments are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm

All Replies

  • Monday, October 05, 2009 6:15 PM
    Moderator
     
     Proposed Answer
    they should do so without your intervention. If the aggregate monitors linger in an error state, try manually resetting with one of the following tools.

    Bulk Monitor Reset (green machine)
    http://blogs.technet.com/timhe/archive/2009/07/23/greenmachine-r2-updated.aspx

    Restart Monitoring Tool
    http://blogs.msdn.com/mariussutara/archive/2009/01/09/howto-restart-monitoring-of-my-environment.aspx

    Pete Zerger, MVP-OpsMgr and SCE | http://www.systemcentercentral.com
  • Monday, October 05, 2009 8:53 PM
     
     
    I have seen this a lot in my environment. It usually occurs after a computer gets out of maintenance. MS is aware of this and I think a public fix is planned. I've also seen this in situations when the maintenance mode was not involved. MS says the problem might be related to agents that generate a lot of state change and the RMS can't keep up.
  • Monday, October 05, 2009 8:58 PM
     
     
    thanks Pete for the response and some tools to fix.  But is it a known issue that some aggregate monitors get stuck in an error state that require these custom tools to solve?  If I have alerts associated with aggregate monitors they may provide false information that underlying monitors are down, when if fact they may be back up?  I just need to understand more about this situation before I report this anomaly to our admins.  Thanks.
  • Tuesday, October 06, 2009 3:53 PM
    Moderator
     
     Answered

    There are some upcomming releases that aim at improving reliability of dependency monitors, but I'm still investigating aggregate monitors in one repro case provided to me recently. Aggregate monitors supports resetting the state of its unit monitors (unlike dependency monitors, those are not reset-able), so maybe forcing a state change on the unit monitor (either thru reste or thru maintenance mode) can help synchronization.

    I also somewhat recently posted tool called "Runtime health explorer" to show what are health states recorded in local agent cache, maybe you can try to use it and compare its results to health explorer (state recorded in the DB)


    Marius Sutara
    My MSDN blog

    This posting is provided "AS IS" with no warranties, and confers no rights. Use of attachments are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm
  • Thursday, October 29, 2009 2:52 PM
     
     
    I have the same issue and PSS tell me same thing. When is planned the fix if so?
  • Thursday, October 29, 2009 5:16 PM
    Moderator
     
     
    Fix for SP1 is soon, fix for R2 is I guess couple month after that.
    Marius Sutara
    My MSDN blog

    This posting is provided "AS IS" with no warranties, and confers no rights. Use of attachments are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm
  • Thursday, October 29, 2009 5:55 PM
     
     
    What????
    I upgrade to R2 to solve some performance Issue of the SP1 and now I got a couple of month of this nightmare?

    Having Operation teams that use this product this is very painful

    Can you check often with dev team?