locked
Alerting in hardware monitors off by default. Can someone explain this to me? RRS feed

  • Question

  • I'm turning up our new environment, and noticing that I've got an agent in critical state due to Entity Health>Availability>Hardware Availability Rollup>Availability>Dell Server Availability Rollup (from Dell Server Memory)>Availability - Memory>Dell Server Memory Status (Periodic) - Memory (Memory).  No alerts are being generated.  When I check the Alerting tab on the memory monitor, "Generate an alert for this monitor" is disabled, and the monitor is sealed, so I can't change this option without creating an override.  Ditto all the way up the line to the Entity Health monitor.  Google searches indicate that this is default behavior for most MP's.  I don't understand why this is the case.  Why would I want SCOM acknowledging that I have critical or failing hardware without generating an alert to anyone?  I could see turning down a specific monitor if you've got an agent that you know you have a problem with and are remediating, but don't want to see constant alerts for, but turning them all off by default seems to be counter intuitive.

    We don't have the resources to devote to a full time OM person whose job is to look for every agent in critical state.  My strategy for this new environment was going to be to turn it all on, and then fine tune down to what the engineers want or need to see.  It would be painful, but surely better in the long run than missing critical events because no alerts were being generated, wouldn't it? 

    Monday, March 4, 2013 3:09 PM

Answers

  • Hi Matt

    With Management Packs we are all at the whim of the author ...

    I actually quite like this approach as it means that when I import a new management pack, I'm not flooded with alerts. I get to look at the state views to see which objects have health issues and sort out which are real health issues and what would just generate noise. I can see what monitors I just want to turn off and others where I might want to tweak a threshold.

    The problem I tend to find with the turn it all on approach is that real issues get missed amongst the noise and it can tend to turn people off the product if they start to get too many alerts and especially too many false positives and irrelevant alerts. It is a fine balancing act.

    But all of this is a personal matter and as end-users we have to take what we are given by the MP author.

    Regards

    Graham


    Regards Graham New System Center 2012 Blog! - http://www.systemcentersolutions.co.uk
    View OpsMgr tips and tricks at http://systemcentersolutions.wordpress.com/

    • Marked as answer by Nicholas Li Friday, March 8, 2013 4:22 AM
    Monday, March 4, 2013 3:20 PM
  • Hi Matt

    So there are many situations where an overt notification (an Alert) may not be needed but a change in Health State is. For example, we work heavily with Distributed Applications and Diagram Views. We use 2 stage monitors for disk related space issues. When the disk hits a threshold that triggers a Warning state - there is no Alert. The DA just rolls up Yellow and the Engineer responsible will investigate an remediate. If it is not resolved and the disk continues to fill up and reaches the a critical state - then an Alert is generated to notify addtional Operations staff plus send applicable notifications. Its this funtionality which at first seems a pain but it is what allows SCOM to be surgially tuned.

    Regards,

    Walter Chomak | Mobieus Systems | www.mobieus.com


    WpC

    • Marked as answer by Nicholas Li Friday, March 8, 2013 4:22 AM
    Monday, March 4, 2013 9:09 PM

All replies

  • Hi Matt

    With Management Packs we are all at the whim of the author ...

    I actually quite like this approach as it means that when I import a new management pack, I'm not flooded with alerts. I get to look at the state views to see which objects have health issues and sort out which are real health issues and what would just generate noise. I can see what monitors I just want to turn off and others where I might want to tweak a threshold.

    The problem I tend to find with the turn it all on approach is that real issues get missed amongst the noise and it can tend to turn people off the product if they start to get too many alerts and especially too many false positives and irrelevant alerts. It is a fine balancing act.

    But all of this is a personal matter and as end-users we have to take what we are given by the MP author.

    Regards

    Graham


    Regards Graham New System Center 2012 Blog! - http://www.systemcentersolutions.co.uk
    View OpsMgr tips and tricks at http://systemcentersolutions.wordpress.com/

    • Marked as answer by Nicholas Li Friday, March 8, 2013 4:22 AM
    Monday, March 4, 2013 3:20 PM
  • Hi Matt

    So there are many situations where an overt notification (an Alert) may not be needed but a change in Health State is. For example, we work heavily with Distributed Applications and Diagram Views. We use 2 stage monitors for disk related space issues. When the disk hits a threshold that triggers a Warning state - there is no Alert. The DA just rolls up Yellow and the Engineer responsible will investigate an remediate. If it is not resolved and the disk continues to fill up and reaches the a critical state - then an Alert is generated to notify addtional Operations staff plus send applicable notifications. Its this funtionality which at first seems a pain but it is what allows SCOM to be surgially tuned.

    Regards,

    Walter Chomak | Mobieus Systems | www.mobieus.com


    WpC

    • Marked as answer by Nicholas Li Friday, March 8, 2013 4:22 AM
    Monday, March 4, 2013 9:09 PM