locked
Threshold for SCOM 2012 RRS feed

  • Question

  • I have authored a custom management pack to monitor some servers. The pack is script based (PowerShell) and does all functionality via script only like discovery, monitors, rules all are done by script approach (I couldn't fine any better way to do this as I cannot use registry, or WMI based discovery). So this pack contains around 45 discovery scripts (timed at 30 minute interval) for populating data for 45 different components (classes) and these generates a total of 400 to 500 instances.

    Similarly I have 45 rules and monitors each. For rules (timed at 5 minute interval) I have a very strange scenario as one component (say Device 1) generates many alerts so I have implemented a script based rule to collect all the alerts in a single go for Device 1 and populate in SCOM. Same goes for monitors. All monitors (timed at 25 minute interval) are referring the same datasource (I guess this is cookdown). Now a monitor targeting Device 1 gets all instances of Device 1 and runs that many times. In each single go the script parses all the alerts related to this instance and assigns state to this instance. (In a way, my rule and monitors works independently, they don't co-relate).

    Now my problem is that, since all the discovery classes generates around 400-500 instances so monitors runs that many times (400-500) times which results in running 500 instances of script in a single go. This results in Management Server to get overloaded. And I got errors like this:

    Summary: 291 rule(s)/monitor(s) failed and got unloaded, 1 of them reached the failure limit that prevents automatic reload. Management group "--my-mg-name--". This is summary only event, please see other events with descriptions of unloaded rule(s)/monitor(s).

    and this error keeps on coming along with this error:

    The PowerShell script will be dropped because the it has been waiting in the queue for more than 10 minutes.
     
    Script Name:    My.Script.For.Monitor.ps1

    One or more workflows were affected by this.  

    Workflow name: XXXXX.XXXXX.XXXXX.UnitMonitor
    Instance name: <InstanceName>
    Instance ID: {D8C031B7-D8AC-0769-7F5D-4F23E38D5889}
    Management group: <MG_Name>

    On getting this error, if I restart the system, the machine works fine.

    Now I am not sure what is the issue here or how many scripts SCOM handle at a time? What is the SCOM limit for having this "queue"? Is there a better solution for my problem? I am really struck at this point and I am not sure how to proceed? Should I have to redesign my MP?


    Regards,
    Ravi

    Sunday, December 2, 2012 7:03 PM

Answers

  • You will not get cookdown if the parameter values vary between instances.  You only get cookdown if the parameter values are the same for all instances.  Your script needs to be designed to gather the results for all instances at once. The next module in your workflow then selects which property bag goes with each instance. This is explained in Cookdown and Scripts Supporting Cookdown.   

    This posting is provided "AS IS" with no warranties, and confers no rights. Use of attachments are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm

    • Marked as answer by Ravi_Raj Monday, January 28, 2013 5:16 AM
    Monday, January 28, 2013 4:56 AM

All replies

  • Hi Ravi,

    1. Take a look at this: http://msdn.microsoft.com/en-us/library/ee809360.aspx

    I think you should redesign your MP.


    http://OpsMgr.ru/

    Wednesday, December 5, 2012 11:15 AM
  • OK by redesign, I am not sure how to as there is no way I can use registry or wmi discovery. This has to be script. Now my server has 500-1000 components to be discovered and also for monitoring I have to parse all the faults/alerts arises from this component.

    The only way I can this is to reduce number of scripts. I am already using cookdown but in case of monitor, I guess monitor executes script that number of times, its target creates instances. Is there a possible way that I can set health of all these instances in a single go (at once)?

    Can you suggest me about redesigning?


    Regards,
    Ravi

    Wednesday, December 5, 2012 4:09 PM
  • Ravi,

    why there are so many instances that must be discovered? It seems to me, that either by fixing the perf bottleneck on the agents, you may end up passing that performance bottleneck to the database..as too much data will be first discovered, and then inserted by this MP.

    as a reference, I would check the following:

     - http://technet.microsoft.com/en-us/library/ff381335.aspx

     - http://blogs.technet.com/b/jonathanalmquist/archive/2011/11/08/cookdown-in-system-center-operations-manager.aspx

    hope this helps

    Jose

    Thursday, December 6, 2012 2:59 AM
  • As per cookdown, I am using a single script for monitors with only parameters changing to monitor a particular instance. I guess this is what cookdown means, please correct me if I am wrong.

    The problem here is the number of instances generated from the monitor target, as no of instances directly proportional to number of times same script running (obviously with diff params).

    As per discovery, I have to discover all the instances as it is required for my servers to be discovered all the instances so I guess I may not have less components.

    Can you tell if there is a way to monitor all the instances in a single running of script. I can access all the instances in the script.


    Regards,
    Ravi

    Thursday, December 6, 2012 3:49 AM
  • Ravi,

    I am no expert in cookdown, but I am seeing examples of cookdown in the following links

     - http://myitforum.com/cs2/blogs/vdipippo/pages/workflow-cook-down-in-operations-manager-2007.aspx

     - http://nocentdocent.wordpress.com/category/scom/cookdown/

     

    Hope this helps,

    Jose


    • Edited by jorodas Friday, January 25, 2013 2:06 PM
    Friday, January 25, 2013 2:06 PM
  • You will not get cookdown if the parameter values vary between instances.  You only get cookdown if the parameter values are the same for all instances.  Your script needs to be designed to gather the results for all instances at once. The next module in your workflow then selects which property bag goes with each instance. This is explained in Cookdown and Scripts Supporting Cookdown.   

    This posting is provided "AS IS" with no warranties, and confers no rights. Use of attachments are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm

    • Marked as answer by Ravi_Raj Monday, January 28, 2013 5:16 AM
    Monday, January 28, 2013 4:56 AM
  • Thanks Brian for the links any they are very much informative. Also, i would like to add that i am following good cookdown process but on further testing of my MP i found that number of instances my MP handles is quite large. That's why i am getting 'rules unloading' issue. I am trying to find other alternatives but couldn't so i am tweeking the intervalseconds for discovery, rule and monitors.

    Regards,
    Ravi

    Monday, January 28, 2013 5:16 AM
  • Typically, if you have that many instances, I would question the model design.  Even if you can gather data for that many instances, it's going to be tough the manage. 


    This posting is provided "AS IS" with no warranties, and confers no rights. Use of attachments are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm

    Monday, January 28, 2013 1:40 PM
  • The MP on which I am working is template based MP and used in monitoring for Server and its components. So for each server you have to monitor, you must provide its credentials (like ip address, username, password etc) in the Management Pack Template UI (in Authoring section of SCOM).

    On addition of such servers replicates this MP for each servers and hence all the rules, monitors and even discovery scrips gets multiplied. Hence number of instances of to be monitored get compounded as each server contains a minimum of 400 component to be monitored.


    Regards,
    Ravi

    Monday, January 28, 2013 1:51 PM