none
Monitoring the performances of multi-instances processes

    Question

  • Hello all!

    I have this (unsolvable) challenge for the Mgmt Pack Authoring community (the winner will have nothing but my gratitude)!

     

    Problem Description:

    1.      I want to monitor the Handles Windows Perf Mon counter for 10 processes.
    Each process is multi-instance.

    2.      Due to the multi-instance behaviour I need scripting so I choosed VbScript (PowerShell is still immature).
    Unfortunately the standard Windows Performance Monitor does not support the monitoring of multi-instance processes (for further details
    http://social.technet.microsoft.com/Forums/en-US/operationsmanagerauthoring/thread/afaffa52-3c69-4554-9042-4f796cfb6ac4)

    3.      I implement the CookDown technique in order to execute the script only once.

    4.      The thresholds of Handles are different process by process. Or course are commong for all the instances of each process.

    5.      The most important requirement of CookDown is that the script must not receive different input arugments (for further details http://social.technet.microsoft.com/Forums/en-US/operationsmanagerauthoring/thread/9b1d61be-eee8-4c72-96d7-8765c820bc63).

    6.      I have one monitor for each process. Each monitor as a different Targer Class. So I have 10 processes with 10 monitors and 10 classes.

    My implementation

    1.      I developed a Custom Data Soucr that executes the VbScript.

    2.      I developed a Custom Unit Monitor Type that contains that Custom Data Source.

    3.      I developed 10 different monitors based on the Custom Unit Monitor Type.
    Each monitor has 3 Config Parameters

    a.       ProcessName

    b.      Threshold for Handles

    c.       Polling interval

    The Execution

    1.      The 10 monitors are executed. They call the Custom Unit Monitor Type that calls the Custom Data Source. The script in the Custom Data Source runs one oncy only for each polling interval. It returns several property bags. One for each of my 10 processes.

    2.      The value for each property bag is the greater among all of the values allocated by the instances of the same process.
    Example of Handles allocated for two processes:
    - myprocess01.exe with 100 Handles
    - myprocess01.exe#1 with 150 Handles
    - myprocess01.exe#2 with 180 Handles
    - myprocess02.exe with 320 Handles
    - myprocess02.exe#1 with 124 Handles
    - myprocess03.exe#2 with 30 Handles
    Example of returned property bags:
    <Property Name="myprocess01_Handles" VariantType="5">180</Property>
    <Property Name="myprocess02_Handles" VariantType="5">320</Property>

    3.      Each monitor checks (a.k.a.: parses) each property bag in order to get the owned one. If the threshold is crossed, the monitor becomes UnHealthy.

     

    Until here all is ok. The implementation is simple and everything works. But now I faced the following problem

    1.      For example I have 8 instances of MyProcess01.exe running and allocating the following performances:
    - myprocess01.exe with 200 Handles
    - myprocess01.exe#1 with 300 Handles
    - myprocess01.exe#2 with 400 Handles
    - myprocess01.exe#3 with 500 Handles
    - myprocess01.exe#4 with 600 Handles
    - myprocess01.exe#5 with 700 Handles
    - myprocess01.exe#6 with 800 Handles
    - myprocess01.exe#7 with 900 Handles
    The value returned by the property bag is 900. The threshold of Handles monitor MyProcess01 is set to 450.
    The threshold is crossed. The monitor becomes UnHealthy. The alert says that there is a problem of abnormal Handles allocation for (at least) one instance of myprocess01.exe process. The alert can also say, if I want, the PID of instance myprocess01.exe#7 but…here you have the swindling part…I only know the instance allocating the greater value among all. But actually I have 5 instances crossing the threshold, not only one and I am not able to list the remaining four instances.

     

    What may I do?

    1.      I can list all the running instances of the process (also the ones under the threshold). Then the operator must identify by himsels the involved instances. It could be ok if I have one monitor only with once counter only. But if I have several monitors and for each monitor I have monitor several Windows Performances Counters and not only one? The workload for the operator becomes too high.

    2.      I can say to the operator to connect remotely to the Agent and che via Tak Manager of via Win Perf Mon…but if the monitors involved severla Agents? The workload for the operator becomes too high.

     

    I have already tried (but in vain) to create a Diagnostic Task as follows:

    Once the monitor becomes UnHealthy the VbScript Diagnostic Task receives as input argument the threshold set in the monitor. Then the VbScript looks for all the instances crossing that threshold and list them as property bag. Unfortunately the Diagnostic Task (for odd reasons) is not able to receive this kind of input argument (for further details http://social.technet.microsoft.com/Forums/en-US/operationsmanagerauthoring/thread/cd347b82-1dd1-4085-918e-7833b0a9f8e2).

     

    My challenge for you is: have you some idea for supplying to the operator the list of all the instances crossing the threhsolds?

    Tuesday, January 18, 2011 10:28 AM

All replies

  • Hi!

    One way, I can think of, is to put all the threshold-logic inside the script.

    This may not be desireable if you want overrides but should work otherwise.

    That way you can return two property bags, per process and it's instances, like:

    myprocess01OK = 0

    myprocess01Instances = "myprocess01.exe#3 500 Handles, myprocess01.exe#4 600 Handles, myprocess01.exe#5 700 Handles, myprocess01.exe#6 800 Handles, myprocess01.exe#3 900 Handles"

    myprocess02OK = 1

    myprocess02Instances = ""

    Then use the myprocess01Instances in the Alert description.

    Best Regards

    Roger


    This posting is provided "AS IS" with no warranties, and confers no rights.
    Tuesday, January 18, 2011 1:41 PM
  • Hi Roger and thank you for your suggestion!

     

    Microsoft (since OM) has implemented ovverides that are so reliable, so...why do not use 'em?

     

    No..ok...joking aside...put hardcocded thresholds normally it’s avoided due to the nature of threshold themselves…I exluded this option for many reasons. Following you have some example:

    1 – if you have many customers with the same processes but working with different loads you need to set different threshold. The only way to meet this goal with your suggestion should be the replication of the same Mgmt Pack for each customer. But this is not something Microsoft defines as suitable.

    2 – If you want to change a threshold you must take you Mgmt Pack, modify the VbScript, re-seal it. But this is not something Microsoft defines as suitable.

    3 – If you have many customer with many Mgmt Packs (Point 1) and you need to fix your VbScript code you must take all the Mgmt Packs, modify all the VbScripts, re-seal ‘em. But this is not something Microsoft defines as suitable.

     

    Thank you some much for you participation!

    Tuesday, January 18, 2011 2:20 PM
  • there are more ways to set an override than through scom. We used to hack into mom scripts to make them look in the registry to look for overrides. Mainly because we (as in mom admins) didn't want to make overrides for common overridable parameters and didn't wanna give these rights to admins in the tooling.

    if no entries were found in the registry a default will be used. So you still have a single scripted solution (1 mp/script) and can make overrides per computer (or even process, whatever you script and what to make "overridable"). The override need to be made at the computer though instead of the scom console (of course you can create a task to set the registry key).

    i realize this is far from optimal as well, but it might be suitable for what your trying to accomplish.


    Rob Korving
    http://jama00.wordpress.com/
    Tuesday, January 18, 2011 3:08 PM
  • Hi Rob and thank you for have joined this discussion.

     

    Actually also your solution fits all the requirement but, still, is far from meeting the mysterious and enigmatic Microsoft Guide-lines and Best Practices that (sometime such for this case) are seasonable.

    Below I explain why.

    When you start monitoring (for example) 6 customers (so different among them that it is not possibile to fix common thresholds) all of them with running the huge application (so huge that at most it requires four MSCS 2003 and two IIS – exluding the clients) your company develops you will need to write each Agent registry the 10 thresholds (one for each monitor) for 6 times. In detail this means do the following:

    1. Select the first class of the ten classes
    2. Select the the first Path of the 6 available
    3. Select the Task for setting the threshold
    4. Set the threshold
    5. Launch the Task
    6. Wait for the output

    …back to Point 2 and select the next Agent. Do this until the sixth one

    …back to Point 1 and select the next Class. Do this until the tenth one.

     

    Now you have happily finished!

     

    Think the load you have if  you need 30 monitors. Think if you need also monitoring Private Bytes, Virtual Bytes, Threads.

     

    I can grant you that if you submit to Microsoft a Mgmt Pack with this solution to be analized they will answer you that it does not meet the Guide-lines and Best Practices.

     

    J

     

    Thank you, Rob!

    Tuesday, January 18, 2011 4:22 PM
  • Let me add a couple of new limits I thought in these hours:

    1.      It is not valid to create from scratch some logic that:

    a.       for some business cycle collects performance data

    b.      calculate the averages

    c.       set them somewhere to be read by the Custom DataSource VbScript

    Reason: it is not acceptable that you lose time developing tools for developing monitors. You must use you times for developing monitors only.

     

    2.      It is not valid to create any sort derivative logic. In other words it is not valid create a VbScript that:

    a.       the first time reads the current values and stores them somewhere

    b.      the second time reads the current values and if they are greater than the first time stores them and increases a counter

    c.       the third time reads the current values and if they are greater than the second time stores them and increases the counter

    …this for, for example, 5 times

    d.      when the counter reaches 5 the monitor turns UnHealthy until the current values are under the thresholds, the counter is set to 0 and the monitors turn Healthy.

    Reason: it is not acceptable that you lose time developing complex tools for simple monitoring.
    Microsoft officially gives you Self Tuning Threshold monitors (also if Microsoft suggest againt their use and anyway they are bugged and won’t be fixed) for monitoring single-instance processes. Why should not give you Self Tuning Threshold monitors for multi-instance processes?

     

    You pay for having a complete ITIL OpsMgmt Process implementation, not a lame one!!!

    Tuesday, January 18, 2011 4:24 PM
  •  

    I can grant you that if you submit to Microsoft a Mgmt Pack with this solution to be analized they will answer you that it does not meet the Guide-lines and Best Practices.

    I'll return them the question why their mp's never seem to follow their own guidelines. i can give numerous examples of this unfortunately.

    This is load on an agent and not on the mg and a few reg-queries shouldn't be that much of a problem. It's a lot better to change a registry key on 1 server than set an override to a specific object, but have the override distributed to the entire class and have the agent figure out that it's an override for this agent or not. in fact i think this is the biggest issue with SCOM, crappy config distribution that doesn't scale well because all goes through the rms and millions of blogpost all to limit the data, but that's all just workarounds for their flawed design (so done my flaming for today as well).


    Rob Korving
    http://jama00.wordpress.com/

    Wednesday, January 19, 2011 4:52 PM
  • You said: "I'll return them the question why their mp's never seem to follow their own guidelines. i can give numerous examples of this unfortunately."

    I did it many times. The most funny answer has been "If you feel SCOM does not work, change product."

     

     

     

    Anyway...lemme back to the challenge.

    The biggest point I have against your suggestion is not that the reg queries are a problem from a workload point of view but are a problem from the operator point of view. The operator would need to spend hours in executing configuration / customization tasks instead than monitoring the environment...

    Wednesday, January 19, 2011 5:52 PM
  • i said to have a default, but to have a script check a key to make a customization possible. With a task in scom to set that key to make the operator's life easy. 

    Config you will always have as like you said not all environments are the same.


    Rob Korving
    http://jama00.wordpress.com/
    Wednesday, January 19, 2011 7:02 PM
  • Just curious, what do you mean by "due to multi-instance behavior" way at the top of this discussion?
    This posting is provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm
    Tuesday, February 01, 2011 1:55 AM
  • Take a look at System.LogicalSet.ExpressionFilter.  Here's an example: http://blogs.technet.com/b/momteam/archive/2010/10/04/unit-monitors-for-multi-instance-perfmon-counters.aspx

     


    HTH, Jonathan Almquist - MSFT
    Tuesday, February 01, 2011 2:49 AM
  • Hi Ake,

    how you can read in the opening post with multi instance I mean any Win Perf Mon counter having instances with same name but with the #1, #2, etc. postfix in order to distinguish them amon each other.

    For example:

    Multi Instance with same name: the object 'Process'. (Launch two ore more notepad and you'll see)

    Multi Instance with different name: the object 'LogicalDisk'. You have _Total and one instance for each logical disk. You canno have two disks with same name.

    Tuesday, February 01, 2011 9:07 AM
  • Hi Jonathan.

    I hope this is not the answer for me... :-)

    The whole Sad SCOM World knows that this does not work with if I want monitor the e.g. PrivateBytes of two processes called both MyProcess.exe.

    If you read my other posts you can check it. If it is not enough I can supply an official position of Microsoft about this point.

     

    Tuesday, February 01, 2011 9:09 AM
  • For native module, this is probably the closest you're going to get.  Otherwise you will need to get creative and develop your own composite.  We cover about 99% of all monitoring scenarios with our native modules, so this is probably that 1% where you'll need to dig in deep and utilize your scripting and authoring skills.

    Good luck to you!


    HTH, Jonathan Almquist - MSFT
    Tuesday, February 01, 2011 4:12 PM
  • Hi! please clarify me a small point: are you talking on behalf of Microsot or on behalf of yourself? This would be very important to me...
    Thursday, February 03, 2011 12:28 PM
  • Hi Elizabeth,

    You should try returning multiple property bags instead of one property bag witih the last item for each process in the list.   This way you can have all of the process instances to work with instead of one row of data for each.

    Dan


    Microsoft Corporation
    Thursday, February 03, 2011 4:39 PM
  • As you seem to be pretty at-ease with MP-development, I'll keep this one short.

    You could add a property to your bag containing a nicely formatted text string with all the "above-threshold processes" and include that into the alert. This is assuming that you return one propertybag for each process and have an expressionfilter of sorts between your probe and your DS-output.

    //Sam

    Wednesday, March 14, 2012 2:48 PM