locked
Tell us about your experience with Management Packs and Noise RRS feed

  • General discussion

  • Let me introduce myself.  I am Dan Rogers, and I work on the Operations Manager team that is looking at management pack design guidance for MP authors.  One of the consistent areas that we have anecdotal feedback on is that some MP's have a high cost of adoption due to what has come to be known as alert noise.  You may be familiar with alert noise - it is characterized as more alerts than are necessary to tell you about a condition that needs attention, or alerts that are not useful.

    If you have any feedback on alert noise, if you could respond here and identify the MP, the rule or monitor that you would like to see brought to the attention of the team who owns the technology, or suggestions for improving the MP by removing or disabling certain monitors, please let us know.

    In particular, if you have issues with the Operations Manager MP or the core OS mp, these are of special interest to my team since they effect the m ost operations manager installations.

    Thanks in advance

    Dan Rogers


    Microsoft Corporation
    Wednesday, May 27, 2009 1:52 AM

All replies

  • Hi Dan

    The way I have tested this in the past is to set up test environments for each MP. Not feasible all the time but I can generally group them together in some sort of logical manner (e.g. AD and DNS, Exchange 2003 and IIS etc). I’ll concentrate on the AD MP ... which I find extremely noisy.

    1)      Script errors – lots of them. Sometimes genuine. Frequently not. I set them all to informational. Keep an eye on the repeat count. And investigate daily any that have a count of more than 10.

    2)      Lack of suppression \ correlation. I shut down 1 Domain Controller and I get a minimum of 5 alerts. 2 health alerts about the server itself and 3 alerts from replication partners. The alerts aren’t even always helpful from a diagnostic point of view (being script errors about not finding a replication partner).  So downgrade both rules to informational.
    -      AD Client Side – Script Based test Failed to Complete

    -          Could not determine the FSMO holder

    Update customer knowledge base on critical monitor – AD Op Master is inconsistent to look for informational alert – “Could not determine the FSMO role holder for details of which DC is down”

    3)      State monitoring poor – if I shut down a healthy domain controller then that domain controllers stays healthy in the console but just greys out. With 600 + servers to monitor this is invisible in the state view. Worse, the domain controller that is healthy goes critical. Now explain that to my boss. The server that is down is healthy and the server that is up and responsive is critical. SLAs become impossible. This is the same for Exchange 2003 mail flow. The SLA availability reports are meaningless. I create a dashboard that includes the health state of computers plus the health state of agent in the same screen.

    4)      The AD domain \ site problems monitors where >60% of domain controllers unavailable. Just creates duplicate alerts again. You already know that you have AD problems because of all the other alerts you have. Additionally, any monitor that targets availability will (I think) affect this – not just AD availability. Which, again, makes it worthless from an SLA reporting perspective.

    5)      Maintenance mode doesn’t work – if I shut down a domain controller then the replication partners still generate an alert. Anyone have a script that will look up the replication partners and put their relevant classes into maintenance mode before shutting down a domain controller?

    6)      Trust monitor doesn’t allow overrides by trust – so just turn off and recreate for specific trusts.

    I can share other info with you if you are really interested ... just ping me a mail offline.

    Have fun

    Graham

    Wednesday, May 27, 2009 10:01 AM
  • I'd say - most of the noise came just after importing AD MP. Especially the Replication and Performance and 90% of errors in the console.

    Ingrifo - We Do SCOM
    Wednesday, May 27, 2009 3:25 PM
  • Hi Dan,

    Script/WMI Errors: Tons of them (1000+ Servers). These drive our Operations guys nuts ... you should find a way:
    - to reduce these somehow? Note, we applied every possible WSH & WMI fix on our systems hoping to ...
    - to 'contain' them to the OpsMgr Agent Alert View and 'disallow' them to pop up all over the general alert view and product related views.

    Glad Graham did all the typing on AD.

    Continuing on the "If I put a Domain Controller in Maintenance and shut it down" track, the whole Exchange 2007 MP goes ballistic, dozens of event based alerts ... manual reset :-(
    These events are 'correct', since they are logged in the Exchange event logs, but totally irrelevant, since Exchange corrects itself in a split second ... leaving the events in the Alert Views.

    Key issue here is that there's absolutely nothing wrong with either the AD or Exchange Service, nevertheless we're spammed with Server alerts!
    In an ideal world when Exchange raises an error event for a DC, OpsMgr should check if the DC is in maintenance and if so drop any alert.   

    I could go on and on :-) ... just ping me a mail offline.

    Cheers,
    Serge
    Wednesday, May 27, 2009 7:19 PM
  • Hi Dan,

    Script/WMI Errors: Tons of them (1000+ Servers). These drive our Operations guys nuts ... you should find a way:
    - to reduce these somehow? Note, we applied every possible WSH & WMI fix on our systems hoping to ...
    - to 'contain' them to the OpsMgr Agent Alert View and 'disallow' them to pop up all over the general alert view and product related views.

    Glad Graham did all the typing on AD.

    Continuing on the "If I put a Domain Controller in Maintenance and shut it down" track, the whole Exchange 2007 MP goes ballistic, dozens of event based alerts ... manual reset :-(
    These events are 'correct', since they are logged in the Exchange event logs, but totally irrelevant, since Exchange corrects itself in a split second ... leaving the events in the Alert Views.

    Key issue here is that there's absolutely nothing wrong with either the AD or Exchange Service , nevertheless we're spammed with Server alerts!
    In an ideal world when Exchange raises an error event for a DC, OpsMgr should check if the DC is in maintenance and if so drop any alert.   

    I could go on and on :-) ... just ping me a mail offline.

    Cheers,
    Serge

    Sorry to not keeping on track with topic but I have an idea based on the first sentence B-Serge made. If Active Alerts (as it is mainly a first window an operator have) would miss events from SCOM WMI/Script errors, would be nice. Then just create a second view: Administrative Alerts and show them only to Authors/Admins User Roles. That would be nice touch :)

    Ingrifo - We Do SCOM
    Wednesday, May 27, 2009 8:06 PM
  • That is basically what I do in changing the alerts themselves to informational and changing the active alerts view to warning and critical only. I then create alert views just based on the script errors. It takes a bit of work but makes the errors themselves easier to manage in the long run.

    Cheers

    Graham
    Wednesday, May 27, 2009 8:19 PM
  • Hi Serge,

    Which MP is creating the script/wmi errors?  can you break these down some - scripts that time out, scripts that don't compile.  Great idea to take the OM script failures and have them be restricted to OM alerts special view.

    Kee the thread coming - what I am going to do with the data is provide feedback on individual MP's to the product teams.  OM isn't off limits - we have plenty to do. The more specific the detail, the more actionable we can make it.

    dan


    Microsoft Corporation
    Thursday, May 28, 2009 2:11 AM
  • Hi Dan,

    In my case most of WMI\Script alerts is an alerts from scripts that timed out... Many management pack (SCCM f.e.) have a really big discovery scripts and when this scripts run on heavily loaded servers (in time when server have a CPU spike) then scripts cannot complete in time...
    We can disable such discoveries on servers which will never have any SCCM (f.e.) components, but this is a workaround and will not avoid alerts from SCCM servers itself if such scripts run in time when CPU have a short-time high load...

    Second. WSUS3 MP. It looks like forgotten pack. No fixes no new releases...
    F.e. a
    ccessibility attribute have an "internal" value in Microsoft.Windows.Server.UpdateServices.3.Server.ServiceState. I want to use a recovery tasks, but I can't.


    http://OpsMgr.ru/
    Thursday, May 28, 2009 4:27 AM
  • Yup, have to agree that script errors are the source of most of the rubbish on our console at the moment. Here's a bunch of script errors I've grabbed off our console at present, most of them are being seen on more then one server:


    Forced to terminate the following process started at 1:11:13 AM because it ran past the configured timeout 30 seconds.
    Command executed: "C:\WINDOWS\system32\cscript.exe" /nologo "C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 5\1257\CheckVirtualMachineNameMatchComputerName.vbs"
    Workflow name: Microsoft.Virtualization.VirtualServer.2005R2.VirtualMachineName_does_not_match_computer_name.rule

    The process started at 6:20:07 PM failed to create System.PropertyBagData, no errors detected in the output. The process exited with 128
    Command executed: "C:\WINDOWS\system32\cscript.exe" /nologo "NetworkAdapterCheck.vbs" mclovin 2 false true false
    Workflow name: Microsoft.Windows.Server.2003.NetworkAdapter.NetworkAdapterConnectionHealth

    The process started at 18:25:46 failed to create System.PropertyBagData, no errors detected in the output. The process exited with 0
    Command executed: "C:\WINNT\system32\cscript.exe" /nologo "NetworkAdapterCheck.vbs" mclovin 8 false true false
    Workflow name: Microsoft.Windows.Server.2000.NetworkAdapter.NetworkAdapterConnectionHealth

    Forced to terminate the following process started at 11:14:25 because it ran past the configured timeout 300 seconds.
    Command executed: "C:\WINDOWS\system32\cscript.exe" /nologo "PrintServerDiscovery.vbs" {52A54AB3-2D5A-AACC-6D82-04AAA42EB595} {C64CCA28-FEAB-98BA-21B1-D58B242C6D9F} mclovin
    Workflow name: Microsoft.Windows.Server.PrintServer.Microsoft_Windows_Print_Servers_Installation.Discovery
    Instance name: Microsoft.Windows.Server.PrintServer.Microsoft_Windows_Servers_with_Print_Service_Installation

    Forced to terminate the following process started at 13:59:39 because it ran past the configured timeout 300 seconds.
    Command executed: "C:\WINNT\system32\cscript.exe" /nologo "DiscoverSQL2005DBEngineDiscovery.vbs" {E71360F6-C12E-8326-4539-FBC9D78862F5} {C9A6FCC7-0603-BEFA-83C7-F11C4DDAC705} mclovin mclovin mclovin "Exclude:"
    Workflow name: Microsoft.SQLServer.2005.DBEngineDiscoveryRule.Server

    Forced to terminate the following process started at 17:59:37 because it ran past the configured timeout 300 seconds.
    Command executed: "C:\WINNT\system32\cscript.exe" /nologo "DiscoverSQL2005RSDiscovery.vbs" {4CC14CC2-E918-1733-726A-32631502C25B} {C9A6FCC7-0603-BEFA-83C7-F11C4DDAC705} mclovin mclovin mclovin
    Workflow name: Microsoft.SQLServer.2005.ReportingServicesDiscoveryRule.Server

    Thursday, May 28, 2009 5:05 PM
  • Well, you asked for it :-)

    Scripts
    x64 SQL Cluster:
    The process started at 3:48:00 PM failed to create System.Discovery.Data, no errors detected in the output. The process exited with 0 Command executed: "C:\WINDOWS\system32\cscript.exe" /nologo "DiscoverSQL2005DBEngineDiscovery.vbs"

    The process started at 3:48:00 PM failed to create System.Discovery.Data, no errors detected in the output. The process exited with 0 Command executed: "C:\WINDOWS\system32\cscript.exe" /nologo "DiscoverSQL2005RSDiscovery.vbs"

    The process started at 3:48:00 PM failed to create System.Discovery.Data, no errors detected in the output. The process exited with 0 Command executed: "C:\WINDOWS\system32\cscript.exe" /nologo "DiscoverSQL2005ASDiscovery.vbs"

    The process started at 15:58:08 failed to create System.Discovery.Data, no errors detected in the output. The process exited with 0 Command executed: "C:\Windows\system32\cscript.exe" /nologo "DiscoverSQL2008FileGroupsAndFiles.vbs"

    The process started at 15:58:11 failed to create System.PropertyBagData, no errors detected in the output. The process exited with 0 Command executed: "C:\Windows\system32\cscript.exe" /nologo "GetSQL2008DBConfig.vbs"

    These pop up on all nodes and easily reach Repeat Counts > 1000 ...

    WMI Event & Probe Modules
    x64 SQL Cluster
    Object enumeration failed Query: 'SELECT * FROM SQLServiceAdvancedProperty WHERE ServiceName = 'MSSQL$WFCD' AND PropertyName = 'SPLEVEL'' HRESULT: 0x80041013 Details: Provider load failure One or more workflows were affected by this. Workflow name: Microsoft.SQLServer.2005.DBEngine.Configuration.ServicePackLevel

    Object enumeration failed Query: 'SELECT * FROM SqlService WHERE ServiceName="SQLAgent$WFCD"' HRESULT: 0x80041013 Details: Provider load failure One or more workflows were affected by this. Workflow name: Microsoft.SQLServer.2005.AgentDiscovery

    Server 2008 x64 Exchange 2007 CCR Cluster
    Module was unable to enumerate the WMI data. Error: 0x80041032 Details: Call cancelled One or more workflows were affected by this.
    Workflow name: Microsoft.Windows.Cluster.Node.StateMonitoring

    Object enumeration failed
    Query: 'SELECT Name, State FROM MSCLUSTER_Resource'
    HRESULT: 0x80041008
    Details: Invalid parameter
    One or more workflows were affected by this.

    Again easily reach Repeat Counts > 1000 ...

    Cheers,
    Serge

    Thursday, May 28, 2009 6:42 PM
  • Noise we are currently looking to reduce:

    A. The Script failure and WMI Probe errors are the number one source of noise in my deployment.  

    B. The SQL Server Service Broker or Database Mirroring transport is disabled or not configured (Special Note:  not everywhere - just SharePoint servers)

    C. IIS Discovery Probe Module Failed Execution

    D. Two particular OS Performance Monitors.  

       1.  Total Percentage Interrupt Time - The average sample value seems to be exceeded quite frequently on Hyper-V or MSVS guests. This has been brought up before in the newsgroups but no one has resolved the issue yet.   This is especially noticable on a virtual Domain Controller and it occurs while healthservice.exe is consuming the processor.  this in turn appears to be affecting the DNS Resoltion Time Alert.  I'm still working on tracking down which specific workflows (Management pack) may be causing these spikes which is not the easiest thing to do.

       2.  Average Disk Seconds Per Transfer - I believe we might have tracked this down yesterday to be an AMD issue.  Apparently, Dual Core or multiprocessor AMD Opteron processors may encounter Time Stamp Counter drift in certain conditions.  We successfully fixed some virtuals by adding the /usepmtimer option in the boot.ini.  We have not yet confirmed this fixes our Physical servers showing this behavior.  But, this is one of our most noisy items.

    Thursday, May 28, 2009 8:12 PM
  • Keep it coming, this is great data.

    Dan
    Microsoft Corporation
    Saturday, May 30, 2009 4:38 PM
  • OK .. not noise but equally frustrating .. reports. The windows reports don't seem to show the dates correctly on the x-axis. I have run a report for last 24 hours and get 1 ... 31 along the x-axis (guess the windows team decided to plot integers rather than dates). The SQL reports and generic reports that I've tested work fine with 31st May coming before 1st June.

    While I'm on my soap box, how about a decent disk space report as well ..... OpsMgr has only been out 2 and a bit years .... how long does it take to write one ;-)

    Cheers

    Graham
    Monday, June 1, 2009 10:21 AM
  • SQL .... the good ... when you enable job discovery the default is not to alert but just to change state. So on green field install that has over 300 job failures, at least we don't get spammed. We sort them out then override enable to alert. This is quite a good idea.

    The bad .... Secure Reference Override Failure .... how to avoid being spammed by 60 SQL Servers when you deploy the management pack? I know how to set up the run as account and profile but the Run As Profile for SQL discovery and monitoring don't get created until I import the SQL MP ... and once I import the MP it is a race against time (which I always lose) to configure this before the discoveries start running. And as the discoveries start running so the alerts stating that SQLProbe doesn't have the correct privileges start flooding in. 

    A sensible approach to removing databases that get deleted so that they aren't still listed as monitored. But that is common theme across Operations Manager with bodges along the line of override the discovery and run a script. Why can't discovery actual discover that something has gone and update info appropriately? 

    IIS ... don't get me started ;-) 

    Have fun

    Graham

     
    Tuesday, June 2, 2009 6:18 PM
  • Thanks.  Keep it coming - this data is very helpful.
    Microsoft Corporation
    Thursday, June 4, 2009 3:33 PM

  • >> IIS ... don't get me started ;-)

    Absolutely! We would love to get more information on IIS noise in the wild.

    -Nathan
    Thursday, June 4, 2009 6:50 PM

  • Hmmm .... "wild" is probably the wrong word. Usually livid would be more appropriate ;-)

    Try setting up a small test environment - include an exchange 2003 cluster ... and some webservers with half a dozen sites on each.

    Exchange cluster - you get IIS based alerts that the passive node isn't running ... err .. yeah ... it isn't supposed to be. The response from the forums has tended to be that you can set the monitor to check only automatic startup services but the problem with that is that you don't get an alert if the service stops on the active node .... IIS needs to realise that it is sometimes part of a bigger picture.

    Restart IIS on a web server - how many alerts do you get? Really need more intelligence built into the monitoring rather than the current getting spammed by IIS alerts ..

    Have fun

    Graham

    Friday, June 5, 2009 9:51 AM
  • I agree with the WMI & "Script or Executable failed to run" noise - We receive tons of these messages everyday.  In reference to a specific management pack, we've experienced and continue to experience alot of issues with the DNS management pack.  We receive lots of "DNS 2003 Monitor Zone Resolution" alerts for multiple zone files across a number of servers.  I've tried a number of suggested workarounds that have helped reduce the number of alerts but they are still obnoxious and always false positives.

    Thanks for asking for our feedback!
    Wednesday, June 10, 2009 12:03 PM
  • I don't have a big issue with the "noise", yes I have it, and I deal with it, but this is a small shop. What I do have issues with is critical alerts that go out to my email recipients even when they are auto resolved almost instantly. Is there any way to delay the email alert for 2 minutes so that if it does auto resolve in that time period no email is sent?

    Sorry for getting off topic.



    Wednesday, June 10, 2009 9:25 PM
  • Hi

    There is an alert aging option on the notification subscription configuration - this delays the notification which is exactly what you need.

    Cheers

    Graham
    Thursday, June 11, 2009 7:20 AM
  • B. The SQL Server Service Broker or Database Mirroring transport is disabled or not configured (Special Note:  not everywhere - just SharePoint servers)


    I was told this is a known bug when running VMWare and was told by one of our SQL people to just dis-able the alert. Is that what u might be using?

    John Bradshaw
    Saturday, June 13, 2009 6:36 AM
  • When you create a distributed application, the icon for IIS and AD looks the same.  It is often hard to give out the application views as they don't give easy to understand icons.

    The live maps option of having much more customisable views would be great

    Paul .
    paulk
    Tuesday, June 16, 2009 3:41 PM
  • Yup, have to agree that script errors are the source of most of the rubbish on our console at the moment. Here's a bunch of script errors I've grabbed off our console at present, most of them are being seen on more then one server:


    Forced to terminate the following process started at 1:11:13 AM because it ran past the configured timeout 30 seconds.
    Command executed: "C:\WINDOWS\system32\cscript.exe" /nologo "C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 5\1257\CheckVirtualMachineNameMatchComputerName.vbs"
    Workflow name: Microsoft.Virtualization.VirtualServer.2005R2.VirtualMachineName_does_not_match_computer_name.rule

    The process started at 6:20:07 PM failed to create System.PropertyBagData, no errors detected in the output. The process exited with 128
    Command executed: "C:\WINDOWS\system32\cscript.exe" /nologo "NetworkAdapterCheck.vbs" mclovin 2 false true false
    Workflow name: Microsoft.Windows.Server.2003.NetworkAdapter.NetworkAdapterConnectionHealth

    The process started at 18:25:46 failed to create System.PropertyBagData, no errors detected in the output. The process exited with 0
    Command executed: "C:\WINNT\system32\cscript.exe" /nologo "NetworkAdapterCheck.vbs" mclovin 8 false true false
    Workflow name: Microsoft.Windows.Server.2000.NetworkAdapter.NetworkAdapterConnectionHealth

    Forced to terminate the following process started at 11:14:25 because it ran past the configured timeout 300 seconds.
    Command executed: "C:\WINDOWS\system32\cscript.exe" /nologo "PrintServerDiscovery.vbs" {52A54AB3-2D5A-AACC-6D82-04AAA42EB595} {C64CCA28-FEAB-98BA-21B1-D58B242C6D9F} mclovin
    Workflow name: Microsoft.Windows.Server.PrintServer.Microsoft_Windows_Print_Servers_Installation.Discovery
    Instance name: Microsoft.Windows.Server.PrintServer.Microsoft_Windows_Servers_with_Print_Service_Installation

    Forced to terminate the following process started at 13:59:39 because it ran past the configured timeout 300 seconds.
    Command executed: "C:\WINNT\system32\cscript.exe" /nologo "DiscoverSQL2005DBEngineDiscovery.vbs" {E71360F6-C12E-8326-4539-FBC9D78862F5} {C9A6FCC7-0603-BEFA-83C7-F11C4DDAC705} mclovin mclovin mclovin "Exclude:"
    Workflow name: Microsoft.SQLServer.2005.DBEngineDiscoveryRule.Server

    Forced to terminate the following process started at 17:59:37 because it ran past the configured timeout 300 seconds.
    Command executed: "C:\WINNT\system32\cscript.exe" /nologo "DiscoverSQL2005RSDiscovery.vbs" {4CC14CC2-E918-1733-726A-32631502C25B} {C9A6FCC7-0603-BEFA-83C7-F11C4DDAC705} mclovin mclovin mclovin
    Workflow name: Microsoft.SQLServer.2005.ReportingServicesDiscoveryRule.Server


    The NetworkAdapterCheck.vbs script still doesn't work for Windows 2000 network cards? I'm seeing this in our cert environment. I thought this would be resovled by now.
    Tuesday, June 23, 2009 9:45 PM

  • Exchange 2007 .. mostly good .. very good ... just takes a week to read the documentation. BUT!

    a) discovery seems to detect Exchange System Tools which means that the Exchange 2007 Computer Group contains servers that aren't running Exchange Roles. I can override discovery (enable = false) against the non-Exchange Servers and then use the Remove-DisabledMonitoringObject Powershell script ... but it would be better if the discovery was more accurate

    b) If a Mailbox Database store is removed, the discovery doesn't seem to automatically remove it (perhaps I haven't waited long enough as I realise most discoveries are every 24 hours). It isn't possible to use the above approach (override discovery + PShell script) as the discovery isn't granualar enough. At present I have to disable the TestMapiConnectivity Monitor that runs against this Mailbox Database store.

    This "discovery" and "undiscover" process is perhaps in need of a think so that it is consistent across all management packs and objects. At present some objects seem to get removed automatically and others need the process in step a) above .... and when neither happens I need to fall back to option b) above.
    Monday, June 29, 2009 4:50 PM
  • A lot of our noise comes from the Script/WMI Errors, as everyone else has said.  SQL provides the majority of our broken scripts.  Checking my console at the moment, just a few are:

    GetSQL2005DBSpace.js
    HostCPUUtilizationProvider.vbs (ProHostCPUMonitor)
    GetSQL2008BlockingSPIDs.vbs
    GetSQL2008DBConfig.vbs
    DiscoverSQL2005DB.vbs (Microsoft.SQLServer.2005.DatabaseDiscoveryRule)
    Query: 'SELECT Name FROM Win32_ServerFeature WHERE Name = "Print Services"'
    (Workflow name: Microsoft.Windows.Server.2008.PrintServerRole.Discovery) (this occurs on several R2 Hyper-V hosts for some reason)

    Also, we get tons of low disk space alerts on Windows 2008 R2 system reserved partitions.  They always say Volume {GUID} has less than 72MB free.  We also get tons of low disk space alerts on Data Protection Manager volumes, which are of course always very small and often full.  A single DPM server generates 100+ low disk space alerts that we have to tune out.

    Janssen Jones - http://www.janssenjones.com -- Don't forget to mark answers as answers. :)
    Wednesday, July 1, 2009 3:18 AM
  • For the Script/WMI Errors I Disabled most noisy script and WMI error alerts and convert them to Event Collection rule and I keep those events just in Operation Manager Database not in the DW. (Write Action) And I creted Event view and choose the Script and WMI error rules that collect Script and WMI errors. This way operators can`t see it but administrators know about the Event view.  I think for the troubleshoting purposes something like this can be done in the next mp development.
    Orhan Taskin
    Thursday, July 2, 2009 2:43 AM
  • In the last AD MP there is still problems with collecting AD Replication Latency Performance data : http://blogcastrepository.com/blogs/francoisd/archive/2009/07/06/scom-2007-ad-mp-reasons-why-replication-latency-performance-won-t-show-up.aspx

    For the rest Graham mentionned the main problems.

    Monday, July 6, 2009 1:51 PM
  • I've noticed what people have mentioned above and I've found the documentation, specifically the AD MP, to be frustrating.  Regarding the step-by-step instructions, in some places it seems as though they were written for a different version of OpsMgr...maybe a beta or RC...and in other places the documentation is completely wrong.

    Here's an example from the AD MP.  The following are instructions on how to configure the client monitoring modes:


    1.   Open the Operations console, and then click Authoring .

    2.   Expand Management Pack Objects , and then click Object Discoveries .

    3.   Locate the AD Client Monitoring Discovery rule. If you do not see the rule, check that your scope is set to include the Active Directory Client Perspective by clicking the Change Scope link at the top of the Actions pane.

    4.   Right-click the rule.

    5.   In the Override Properties window, select Override , and then click Override the rule . The user interface provides a list of client computers that are currently using client monitoring.

    6.         In the list of overrides, choose a mode. The mode can be 1, 2, 3, or 4 (see the table of modes described earlier in this section).


    These instructions are wrong.  You can't find these options in the AD client Monitoring discovery rule, as there is no rule with this name.  There is a monitor with this name, but there are no options in this monitor for configuring client monitoring modes, and you can't choose a mode in the list of overrides..  The only place I've found these options are in the AD Client Update DCs rule.

    Also, my co-worker called in on problem he was having with a hotfix...the instructions were different depending on where he looked on Microsoft's site.  Not only that, but when he spoke with the technician, they told him to ignore both sets of instructions and that there was a different hotfix with instructions.

    Having to spend hours searching for answers, then re-searching for correct answers, when trying to tune a management pack makes adopting OpsMgr a real chore.

    Friday, July 24, 2009 5:45 PM
  • Hey Graham, just checking whether the db was eventually removed.
    This posting is provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm
    Friday, July 31, 2009 5:44 PM
  • Hi Åke


    Looks like I was a bit eager - am onsite with the customer at the moment and it has been removed. I guess it is a wait of up to 24 hours when the discoveries run??

    Cheers


    Graham

    View OpsMgr tips and tricks at http://systemcentersolutions.wordpress.com/
    Monday, August 3, 2009 12:08 PM
  • SYSTEM CENTER CORE MP

    I guess this is where REPORTING is kept.
    In a word or two, Reporting in OpsMgr is downright appalling.

    It turns my grey hairs white to think that any graph can be produced without correctly labelled axes, yet EVERY graph in SCOM is unlabelled. This is kindergarten maths.
    And as for hunting around for Object pickers or whatever.....Don't get me started.

    This is a CORE functionality which is shot and should have been seriously addressed before now.
    Well I will leave it there.....for the time being.

    Thankyou,
    John Bradshaw
    Wednesday, August 5, 2009 10:01 PM
  • Hi Dan,

    I see quite a few of the following alerts in the MP for SMS 2003.

    SMS 2003 Inbox: Monitor inbox script error
    Resolution:
    - Check the script parameters on the appropriate inbox montoring rule.
    - Check that the SMS Identification registry key exists and can be accessed.
    - Check that the Installation Directory and Site Code registry values under the Identification key exist and are valid.
     
    In every case that I have seen this error, the registry keys and/or values exist, are populated, and are properly permissioned for the agent to read them.

    Shawn

    Thursday, August 6, 2009 6:30 PM
  • Right, all discoveries run every 24 hours in that MP.
    This posting is provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm
    Monday, August 10, 2009 4:36 PM
  • Hi Dan.

    Even though I have already answered once I would like to add that there is a shiny example out there which should become the blueprint of every new/updated (Microsoft) MP.

    I am talking about the native Exchange 2007 MP. Compared to what it has to monitor, it is relatively easy to configure. When the Discovery Helper has run, one can easily see whether there are servers being seen as Exchange servers but aren't. Since no monitoring has started it is easy to rectify it and then enable this MP step by step and thus making the monitoring by this MP become - as I call it - multi-layered.

    With the new templates added one can easily configure monitoring of OWA/EAS/WebServices and so on. With the Exchange 2003 MP in mind, this MP is really a walk in the park.
    Best regards, Marnix Wolf

    (Thoughts on OpsMgr)
    Monday, August 10, 2009 5:02 PM
  • I'm reiterating the comments of others,  but here's my two cents:

    The latest DNS pack was very problematic, generating a ton of false positives and script/wmi errors.    For something as basic as DNS,  it was suprising that this pack was so troublesome and noisy.

    AD pack also very, very noisy with script/wmi errors, and a general lack of intelligence in the rule set.   i.e. the intentional downing of one DC should not trigger replication alerts from 30 other DC's or alerts from a dozen exchange servers.  There needs to be more intelligence or dependency options between these management packs.     This "awareness" needs to extend to other highly distributed packs,  like Exchange, SMS, etc. 

    It's generally very difficult to troubleshoot the script/wmi errors.   There needs to be more of a knowledgebase on what to check when the errors occur. There should be more options to override the script errors on a specific pack or server without disabling all script warnings in order to silence the prepetual alerts associated with a given server/pack pairing.   I also agree with other comments that the script errors should have an informational severity by default and should not be warning.      Yellow's and Red's should be kept to a minimum in the console.   

    The discovery processes need to be capable of backing out discovery objects when a given role is removed from a server.  (i.e. DC demoted,  SQL uninstalled, etc.).     The lack of a back-out capability generates tons of alerts when a given component is intentionally removed and having to run the remove-disabledmonitoringobject script is not acceptible with hundreds of servers and different administrative groups and business units doing their own thing. 

    Finally,  on several large scale deployments, we've abandoned the "severity" level concept altogether because of all the noise and difficulty in turning down the default settings in hundreds of rules and monitors.    We've instead chosen a "bottom up" approach and identified the specific alerts that were important to us, and overriden them to "High Priority".     We then create views and tweak our subscriptions base on the priority level and not the severity level.     This concept is fine for alert views and subscriptions, but all the graphical views in SCOM (i.e. state and diagram views) trigger soley on the rollup of severity level.   It would be nice to create graphical views that change color based on something other than severity,  like the priority level.  
    Tuesday, September 15, 2009 5:42 PM
  • Hi Dan,

    we have a lot of noise for the AD Management Pack.  Scripts time out, when run in the SCOM Agent.
    When we execute the scripts on a command line they behave without problems.

    The scripts mostly perform WMI queries and they fail very often. This situation is not so good. We spend more time analysing the noisy alerts than doing productive work.

    ( Please note that we have increased Memory and Buffersizes for WMI and we also installed the WMI-Patch as recommend in this article:
    http://blogs.technet.com/kevinholman/archive/2009/06/29/errors-alerts-from-the-dns-mp-script-failures-wmi-probe.aspx

    I just see that Kevin Holman also found out that things are not better with his recommendations. ) 

    I hoped that the larger memory and the WMI-Patch would also improve the situation for AD-Scripts. But this hope was not fulfilled...

    There is a propbability that things are mixed aup, as we have DNS and AD on the same servers...


    number33

    Thursday, September 24, 2009 11:10 AM
  •  We're having nothing but endless noise from our management packs. 95% noise and 5% useful!

    Using the SQL Server Core/2005/2008 managent packs version 6.0.6648.0:
    1) Service Check Probe Module Failed Execution.
    Error getting state of service
    Error: 0x8007007b
    Details: The filename, directory name, or volume label syntax is incorrect
    Workflow name: Microsoft.SQLServer.2005.DBEngine.FullTextSearchServiceMonitor

    2) Script or Executable Failed to run
    The process started at <> failed to create System.PropertyBagData. Errors founds in output: GetSQL2005DBSpace.js(67, 12) Microsoft JScript runtime error: 'Databases' is null or not an object.
    Command executed: cscript.exe /nologo GetSQL2005DBSpace.js <machinename>\SQLEXPRESS
    Workflow name: many

    3) Service Check Data Source Module Failed Execution
    Error getting state of service
    Error: 0x8007007b
    Error getting state of service
    Error: 0x8007007b
    Details: The filename, directory name, or volume label syntax is incorrect
    Workflow name: Microsoft.SQLServer.2005.DBEngine.FullTextSearchServiceMonitor

    These errors occur on all SQL Server (2005 and 2008) instances that do not have the full text service installed. It's completely aggrevating since *all* basic SQL Express instances start creating noise. It's mentioned in the release notes of the management pack, but why hasn't it been fixed? I'm constantly trying to filter this one out and it's driving me crazy.

    Here's some others I see often:
    OleDB: Results Error
    OleDbProbe: Results Error
    OleDb Module encountered a failure 0x80004005 during execution and will post it as output data item. Unspecified error: Login timeout expired
    Workflow name: Microsoft.SystemCenter.SqlBrokerAvailabilityMonitor

    Script: failed to login
    GetSQL2008BlockingSPIDs.vbs: Cannot login to database [<servername>][<instancename>:master]

    We also see the following in one of our Windows 7 desktops with the latest client management packs installed:
    1) WMI Probe Module Failed Execution
    Object enumeration failed
    Query: 'SELECT * FROM Win32_ComputerSystem'
    HRESULT: 0x800706be
    Details: The remote procedure call failed

    2) WMI Probe Module Failed Execution
    Object enumeration failed
    Query: 'Select * from Win32_SystemEnclosure'
    HRESULT: 0x800700a4
    Details: No more threads can be created in the system

    3) WMI Probe Module Failed Execution
    Object enumeration failed
    Query: 'Select * from Win32_BIOS'
    HRESULT: 0x800700a4
    Details: No more threads can be created in the system

    The machine itself is fine and definitely not out of resources. Looks like a management pack bug - I've seen quite a few other people mention the issue with the "no more threads" result.

    Tuesday, November 17, 2009 9:06 PM
  • Dan,

    I have a lot of the same problems documented throughout this thread. Have you consolidated all of this feedback into one place with your (Microsoft's) resolutions / recommendations / responses so we're not beating the same issues to death and we don't have to search for the answers?

    Monday, December 14, 2009 3:10 PM
  • Better support for clustered apps (msmq, biztalk, etc)?
    Tuesday, December 15, 2009 6:18 AM
  • I'd like to re-emphasis the points that Graham Davies made on the AD management pack which are all pains we are feeling too.  They all ring true and it's beoming very difficult to capture real AD related issues due to the noise that's generated with out of the box AD management pack.  Seems little that can be done with customizations using overrides, we've even considered other products but really don't want go this route if we can help it. 

    Getting AD replication monitoring fine tuned and working well is currently a major goal of mine so any advice welcome.  We've actually needed to turn off replication monitoriung in the MP until we can reduce the noise as it's increasing the likleyhood of our operations team missing a genuine alert.  

    There seems to be very little that can be done with overrides, not enough granularity in tweaking for specific enviornments.  We would like the possibility to only monitor replication on specific DCs and to suppress alerts generated on all other DCs which doesn't appear to be possible.  Can you offer any advice, is this possible ?  how do others manage replication monitoring with SCOM ?

    We too get lots of 'could not determine FSMO role holders' then when you connect to the DC in question and run netdom query FSMO it immediately returns the right FSMO holders so again this is false alert, very reluctant to turn this off as we need to know about genuine issues with FSMO location.

    Same with Trust monitoring, no granularity which is what Graham also touched on. 

    We're running SCOM 2007 R2.  
    Wednesday, January 6, 2010 10:57 PM
  • Dan

    I note that your goal with all this was to create guideline for MP authors that result in better MP's.  Worthy task.

    Are there fixes / are there folks working on fixes for the above?

    This thread started in May 2009, I just completed a fresh install of SCOM 2007 R2 over the past few days, downloaded all the latest MPs and note that each and every issue documented above is still happeneing on my new system some 8 months after the start of this post.  I dont know if I can add anything that hasnt already been provided by others, except that the issues above have yet to see fixes from the top.

    Would love to know that at least SOME of this noise had been adressed by Microsoft, but cant seem to find anything but hand-rolled solutions.

    Is there another link where these issues have been provided solutions?

    Monday, February 1, 2010 10:12 PM
  • +1 folks
    Thursday, March 11, 2010 2:47 PM
  • Dan,

    I have to admit I've got a beef wth the new exchange 2010 management pack and the use of a new "correlation service" who's primary function is to control noise from the pack.     I don't like the idea of having to use seperate services for this purpose - the product (scom) and the MP, should have enough capabilities built into them to avoid additional services being installed for noise control.       I've found the Exchange correlation service is a big resource hog and have had to move it off of the RMS because of this.   (It also doesn't run on clustered RMS's).  It's also a single point of failure because no exchange alerts will be logged if the correlation service fails, and it can only be installed on  one server.          My concern is that this same approach will be used for other highly distributed/complex applications that have the potential to generate noise.   So what's next - seperate correlation services for AD, SQL, Biztalk?    

    Microsoft needs to come up with a better solution.    

     

    Tuesday, April 13, 2010 5:55 PM
  • Dan,

    This seemed like a very good thread to get feedback from customers using SCOM 2007, has there been any feedback from the various teams who develop the management packs referethat can be shared ?  future plans, updates etc ?

    Regards,

    Kevin

    Wednesday, June 23, 2010 11:36 AM
  • A very interesting thread... Be very interested to hear some feedback, as Kevin mentions above.

    Speaking of Kevins, Mr. Hollman recently posted this blog about an update to the core Management Pack for SCOM2007 R2.
    Section 4 looks very exciting for our WMI noise needs.

     http://blogs.technet.com/b/kevinholman/archive/2010/07/21/opsmgr-2007-r2-core-mp-s-updated-6-1-7672-0.aspx


    I haven't installed it yet, so I can't vouch for it's effectiveness, but thought I'd highlight it to the interested parties on this thread.

    Wednesday, July 28, 2010 12:46 PM
  • Alerts for Query: 'SELECT Name FROM Win32_ServerFeature WHERE Name =

    Which are run against Windows 2008 systems. Should be some basic checking where the monitor or management pack will not try to monitor a system that doesn't have the appropiate WMI class. In this case, this alert was generated for two Windows 2008 R2 server. When I run a manual WMI check against the system, the Win32_ServerFeature class doesn't exist.

    Saturday, February 4, 2012 1:04 AM
  • Alerts for Query: 'SELECT Name FROM Win32_ServerFeature WHERE Name =

    Which are run against Windows 2008 systems. Should be some basic checking where the monitor or management pack will not try to monitor a system that doesn't have the appropiate WMI class. In this case, this alert was generated for two Windows 2008 R2 server. When I run a manual WMI check against the system, the Win32_ServerFeature class doesn't exist.


    According to MSDN, this class exists on 2008 and later - this includes 2008R2 - http://msdn.microsoft.com/en-us/library/windows/desktop/cc280268(v=vs.85).aspx

    I suggest you verify how comes your WMI doesn't return it - it is probably an environmental issue.

    Saturday, February 4, 2012 2:20 AM
  • Hi,

    I am having problem with my "The Domain controller has been Stopped/Started" alerts. I kept few servers in Maintenance Mode and restarted the server after 15 mins, but still I received alerts for both Stopped/Started. While using MM window, I use "Selected objects and all other contained objects" option.

    Could you please help me on finding a solution to this.

    Thanks in advance.


    Regards, Suresh

    Monday, May 21, 2012 2:10 PM