locked
Help customizing a Management Pack to modify the alert threshold RRS feed

  • Question

  • Hi Everyone

    I am running System Center Operations Manager 2012 and am having difficulty reducing the noise on some of the alerts that are coming through.

    I am monitoring my Hyper V 2008 R2 Cluster with the Windows Server 2008 Cluster Management Monitoring Management Pack.

    Whenever I do a backup of my Hyper V Cluster, I take a snapshot of the Clustered Shared Volumes (CSV), this places the CSV in redirected access mode temporarily (for a few seconds). The problem with this is that SCOM considers this a failure and sends out alerts.

    I am unable to use maintenance mode for this as, the backup times, and times the snapshot occurs differ from day to day. I also need to be aware if the CSV remains in a redirected access mode for a long period of time, so I am unable to completely override the alert.

    I have looked at the monitor, and the alerting and there is no place to adjust the frequency of the alert or the length of time the state must be critical before it is considered a failure and the alerts are triggered.

    Is there any way of customizing the alert or the management pack using XML so that it waits 5 minutes before it is considered a failure?

    Alert name:

    Shared Volume IO is resumed in no-direct-io mode

    The XML configuration of the alert response is as follows:

    - <Configuration>
      <Priority>1</Priority>
      <Severity>2</Severity>
      <AlertMessageId>$MPElement[Name="Microsoft.Windows.2008.R2.Cluster.Shared.Volume.IO.is.resumed.in.no.direct.io.mode.AlertMessage"]$</AlertMessageId>

     </Configuration>

    The XML configuration of the Data Source is as follows:

    - <Configuration>

    - <Criteria>
    - <SimpleExpression>
    - <ValueExpression>
      <XPathQuery>EventDisplayNumber</XPathQuery>
      </ValueExpression>
      <Operator>Equal</Operator>
    - <ValueExpression>
      <Value>5121</Value>
      </ValueExpression>
      </SimpleExpression>
      </Criteria>
      <LogName>System</LogName>
      <PublisherName>Microsoft-Windows-FailoverClustering</PublisherName>

     </Configuration>

    Is it possible to add something like the statement below to make it wait longer:

    <IntervalSeconds>600</IntervalSeconds>

    If I try to override the monitor, the only options I have are:

    Enabled

    Priority

    Severity

    Any help or advice on how to modify or add additional thresholds to this monitor would be greatly appreciated. I hope I have provided enough information, If need be I can provide additional screen shots or more information.

    Thanks

    Leigh


    Wednesday, September 5, 2012 8:16 AM

Answers

  • Actually, replicating the rule really isn't going to help.  That rule is looking for a Windows event.  I assume that event gets created as soon as the condition occurs.  If there is another event that gets created to indicate that the condition has been cleared, then you could create a monitor that automatically resolves the alert when the second event is received.  The fact that this has been implemented in a rule though is a clue to me that there is no second event.

    So the core issue here is that we really don't know when the condition has been cleared.  If that's the case, then we obviously can't implement that logic in the management pack. 

    Maintenance Mode here really wouldn't help either because if we ignore the event that we know is going to get created, you're never going to be able to handle the situation where it remains in that mode for a long period of time. 

    The only way this would get solved is if you have some means of detecting that the volume is no longer in that state.  Figure out how to detect that, and we can figure out how to implement the logic.


    This posting is provided "AS IS" with no warranties, and confers no rights. Use of attachments are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm

    • Marked as answer by Cloud_TS Monday, October 1, 2012 7:40 AM
    Wednesday, September 19, 2012 8:36 PM

All replies

  • Hi

    If this is a sealed MP, we cannot customize the XML and update it to make it work. But you could see the following article which could give you some hints:

    SCOM alert notification subscription delay sending for x minutes and don’t sent if alert is auto-resolved within that time

    http://www.maartendamen.com/2009/12/scom-alert-notification-subscription-delay-sending-for-x-minutes-and-dont-sent-if-alert-is-auto-resolved-within-that-time/


    Alex Zhao

    TechNet Community Support

    Thursday, September 6, 2012 8:53 AM
  • Hi Alex

    Thank you very much for your reply. This is a sealed MP, I understand that I can delay the alert notifications preventing an email or SMS, but it does not stop the Console from receiving the alert, meaning I still have to go in everyday and clear these alerts manually. Which is a bit tedious and frustrating.

    I see I cant export them or copy them to make changes...

    I also see its not possible to configure auto-resolve on alerts generated by rules.

    How would I work around this? Is it possible to then delete the Rule, and re-create it somehow in a different management pack to allow customization or auto-resolve?

    Thursday, September 6, 2012 9:11 AM
  • Leigh,

    you are right. The best way is to just recreate the rule in your own mp. 

    You can copy/paste the rule in the xml, or just recreate it via the console. Whatever works best for you.

    Please try that, and if the issue still persists, please let me know.

    Thanks,

    Jose

    Wednesday, September 19, 2012 5:41 AM
  • Hi Leigh

    Copying and pasting the rule from xml isn't straight forward as the classes \ rules \ dependencies are spread throughout a management pack.

    And if you try to export the MP to break the seal and make changes, this will cause problems in the long run:

    http://systemcentersolutions.wordpress.com/2010/04/14/unsealing-sealed-management-packs/

    You can use powershell to auto-resolve a rule. We have some rules \ alerts that administrators like to see but don't want emails for and don't need to action. In theory, we should really disable the rule (or set enable = false to be accurate) but another option is to override to make the rule informational and then every morning run a clear up tasks that resolves the alerts.

    http://www.systemcentercentral.com/BlogDetails/tabid/143/IndexID/89870/Default.aspx

    Cheers

    Graham


    Regards Graham New System Center 2012 Blog! - http://www.systemcentersolutions.co.uk
    View OpsMgr tips and tricks at http://systemcentersolutions.wordpress.com/

    Wednesday, September 19, 2012 9:57 AM
  • Actually, replicating the rule really isn't going to help.  That rule is looking for a Windows event.  I assume that event gets created as soon as the condition occurs.  If there is another event that gets created to indicate that the condition has been cleared, then you could create a monitor that automatically resolves the alert when the second event is received.  The fact that this has been implemented in a rule though is a clue to me that there is no second event.

    So the core issue here is that we really don't know when the condition has been cleared.  If that's the case, then we obviously can't implement that logic in the management pack. 

    Maintenance Mode here really wouldn't help either because if we ignore the event that we know is going to get created, you're never going to be able to handle the situation where it remains in that mode for a long period of time. 

    The only way this would get solved is if you have some means of detecting that the volume is no longer in that state.  Figure out how to detect that, and we can figure out how to implement the logic.


    This posting is provided "AS IS" with no warranties, and confers no rights. Use of attachments are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm

    • Marked as answer by Cloud_TS Monday, October 1, 2012 7:40 AM
    Wednesday, September 19, 2012 8:36 PM
  • There is a second event that fires when the volume is back in its normal state.  It's in the Microsoft-Windows-FailoverClustering\Operational log and it's Event ID 5122.  The original error event is (for some reason) in the System log so that is probably how this was easily missed when the thread was started.

    I'm running into the same issue and was going to make my own custom monitor, but I'm finding that there is an additional problem.  The same event can indicate an error state for any number of Cluster Shared Volumes.  So if you have 20 volumes, and you receive 10 warnings about direct I/O being disabled, then five resolution events (5122) how do you generate 10 separate alerts and resolve the five that are working correctly, but leave the other five alerts alone?  I'm not a MP expert and I'm struggling to see how to do this.

    I do know that the name of the volume is contained in the event as a property, so I'm thinking it must be possible by leveraging the "VolumeName" property field.

    <EventData>
      <Data Name="VolumeName">Volume3</Data>
      <Data Name="ResourceName">HyperV_VM3</Data>
    </EventData>


    Wednesday, October 23, 2013 10:26 PM