locked
Why some Aggregate Monitors stay in Critical State when subordinate monitors are Healthy? RRS feed

  • Question

  • Many times a monitored device or service will be in Critical state when it's not supposed to.  In Health Explorer I see that an Aggregate monitor (e.g. Availability) is in Critical state, but when I drill down to all subordinate monitors they are all Healthy.  Is there a time lag for an Aggregate monitor to switch states?  How do I cause Aggregate monitors to syncronize with their subordinate monitors?
    Monday, October 5, 2009 5:31 PM

Answers

  • There are some upcomming releases that aim at improving reliability of dependency monitors, but I'm still investigating aggregate monitors in one repro case provided to me recently. Aggregate monitors supports resetting the state of its unit monitors (unlike dependency monitors, those are not reset-able), so maybe forcing a state change on the unit monitor (either thru reste or thru maintenance mode) can help synchronization.

    I also somewhat recently posted tool called "Runtime health explorer" to show what are health states recorded in local agent cache, maybe you can try to use it and compare its results to health explorer (state recorded in the DB)


    Marius Sutara
    My MSDN blog

    This posting is provided "AS IS" with no warranties, and confers no rights. Use of attachments are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm
    • Marked as answer by Graham Davies Wednesday, October 7, 2009 9:10 AM
    Tuesday, October 6, 2009 3:53 PM

All replies

  • they should do so without your intervention. If the aggregate monitors linger in an error state, try manually resetting with one of the following tools.

    Bulk Monitor Reset (green machine)
    http://blogs.technet.com/timhe/archive/2009/07/23/greenmachine-r2-updated.aspx

    Restart Monitoring Tool
    http://blogs.msdn.com/mariussutara/archive/2009/01/09/howto-restart-monitoring-of-my-environment.aspx

    Pete Zerger, MVP-OpsMgr and SCE | http://www.systemcentercentral.com
    Monday, October 5, 2009 6:15 PM
  • I have seen this a lot in my environment. It usually occurs after a computer gets out of maintenance. MS is aware of this and I think a public fix is planned. I've also seen this in situations when the maintenance mode was not involved. MS says the problem might be related to agents that generate a lot of state change and the RMS can't keep up.
    Monday, October 5, 2009 8:53 PM
  • thanks Pete for the response and some tools to fix.  But is it a known issue that some aggregate monitors get stuck in an error state that require these custom tools to solve?  If I have alerts associated with aggregate monitors they may provide false information that underlying monitors are down, when if fact they may be back up?  I just need to understand more about this situation before I report this anomaly to our admins.  Thanks.
    Monday, October 5, 2009 8:58 PM
  • There are some upcomming releases that aim at improving reliability of dependency monitors, but I'm still investigating aggregate monitors in one repro case provided to me recently. Aggregate monitors supports resetting the state of its unit monitors (unlike dependency monitors, those are not reset-able), so maybe forcing a state change on the unit monitor (either thru reste or thru maintenance mode) can help synchronization.

    I also somewhat recently posted tool called "Runtime health explorer" to show what are health states recorded in local agent cache, maybe you can try to use it and compare its results to health explorer (state recorded in the DB)


    Marius Sutara
    My MSDN blog

    This posting is provided "AS IS" with no warranties, and confers no rights. Use of attachments are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm
    • Marked as answer by Graham Davies Wednesday, October 7, 2009 9:10 AM
    Tuesday, October 6, 2009 3:53 PM
  • I have the same issue and PSS tell me same thing. When is planned the fix if so?
    Thursday, October 29, 2009 2:52 PM
  • Fix for SP1 is soon, fix for R2 is I guess couple month after that.
    Marius Sutara
    My MSDN blog

    This posting is provided "AS IS" with no warranties, and confers no rights. Use of attachments are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm
    Thursday, October 29, 2009 5:16 PM
  • What????
    I upgrade to R2 to solve some performance Issue of the SP1 and now I got a couple of month of this nightmare?

    Having Operation teams that use this product this is very painful

    Can you check often with dev team?
    Thursday, October 29, 2009 5:55 PM