locked
Exchange 2010 Management Pack - ExecuteDiagnosticScript Scripts Failing RRS feed

  • Question

  • All, recently we've imported the Exchange 2010 (SP1) management pack in our management group. I didn't take long and it started raising alerts about scripts that couldn't be run/initialized.  The following errors can be found in the Operations Manager event log on the Exchange Server 2010 Server.

    Command executed: "C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe" -Command "& '.\ExecuteDiagnosticScript.ps1' -MonitoringDataSource 'MSExchange Monitoring Troubleshoot-CI' -MaxStartDelaySeconds '15' -DiagnosticScriptName '.\Troubleshoot-CI.ps1' -DiagnosticScriptArguments '-Action DetectAndResolve -MonitoringContext'"

     

    In the (related) logfiles I see the following entries:

    [TIMESTAMP] The 'Microsoft.EnterpriseManagement.OperationsManager.Client' snap-in was not added: No Windows PowerShell snap-ins matching the pattern 'Microsoft.EnterpriseManagement.OperationsManager.Client' were found. Check the pattern and then try the command again.

    Does this error imply that we've to install the operations manager shell on all Exchange 2010 servers?

    Hope someone can shine a light on this.

    Regards,

    Mark Verbaas

    Tuesday, May 3, 2011 11:56 AM

Answers

  • Ok, it's great to know I'm not alone in this. Is the case still open?
    Friday, May 6, 2011 8:10 AM

All replies

  • You need to read the guide BEFORE importing the Mp and thoroughly prepare for this MP.  You cannot just import this MP and go.

     

     


    Microsoft Corporation
    Tuesday, May 3, 2011 3:26 PM
  • Hi Mark, this is the link to the mp guide that will help you further:

    http://www.microsoft.com/downloads/en/details.aspx?FamilyID=7150bfed-64a4-42a4-97a2-07048cca5d23

    And of course -> what Dan said :-) the mp guides are a really good resource and they are specifically written for the technical guys implementing those management packs. After that there are also community based best practises guides written on top of this (for instance the OpsMgr by Example series, but lots to find in several blogs and systemcentercentral for instance. This will save a lot of time in trying to understand errors and so on. Hope it helps.


    Bob Cornelissen - BICTT (My BICTT Blog)
    Tuesday, May 3, 2011 4:31 PM
  • You cannot just import this MP and go.


    Off course you can, but you end up asking questions on  the forum and as final solution reading the guide :)
    Regards,
    Marc Klaver
    http://jama00.wordpress.com/
    Tuesday, May 3, 2011 5:12 PM
  • All,

    Thank you for replying. I've read the MP guide. And have done so this morning again. Unfortunately I have no clue why these alerts appear.

    The correlation engine is installed and running (on the RMS). Agents are installed on the Exchange Servers and running under local system credentials. Patches are installed.

    Regards,

    Mark Verbaas

    Wednesday, May 4, 2011 9:30 AM
  • We have the same problem too, Mark. I've read the MP guide a few times. We have had a support case open with tier 3 about the new SP1 MP for almost 3 weeks. Maybe we are all overlooking something simple. There is also a known issue with the correlation engine and we have requested a hotfix. I'll update as we learn more.

    Edit: I dug into what this is doing, and that specific message is logged every time the script executes. Looks like it can be safely ignored. Our issues with timeouts and failures runs deeper....

    Thursday, May 5, 2011 4:26 PM
  • Ok, it's great to know I'm not alone in this. Is the case still open?
    Friday, May 6, 2011 8:10 AM
  • Case is still open, and very little progress. I have done quite a bit of testing to try and determine what is happening with the PS scripts. There will be a directory under Monitoring Host Temporary Files x that contains ExecuteDiagnosticScript. The code causing what you see to be logged is this:

    $momSnapin = Get-PSSnapin -Name $momSnapinName -ErrorAction SilentlyContinue;
    if ($momSnapin -eq $null)
    {
        # Attempt to add the MOM snapin but don't fail if it cannot be added.
        # When executing scripts on a machine where the SCOM console is not installed,
        # it is expected that the snapin won't be added.
        Add-PSSnapin -Name $momSnapinName -ErrorAction SilentlyContinue;

        $momSnapin = Get-PSSnapin -Name $momSnapinName -ErrorAction SilentlyContinue;
        if ($momSnapin -eq $null)
        {
            Write-Log "The '$($momSnapinName)' snap-in was not added: $($Error[0])";
        }
    }

    In essence a red herring...but not a red herring are the occasional 21xxx warnings written to the event log in a sequence of 3 events:

    First entry..

    The process started at x:xx:00 PM was terminated because the HealthService requested the workflow to stop, some data may have been lost.

    Second..

    Forced to terminate the following process started at 7:40:00 PM because it ran past the configured timeout 300 seconds.

    Last..

    Data was found in the output, but has been dropped because the Event Policy for the process started at 7:40:00 PM has detected errors. The 'ExitCode' policy expression:

    [^0]+

    matched the following output:

    259

    Analysis of the MP shows a hard coded 300 seconds for these scripts to run. I can run them manually all day within the context of local system (using psexec) and it never times out. In addition, we have a number of agents that will fail to collect perf counters periodically, for example:

    In PerfDataSource, could not resolve counter MSExchangeIS Mailbox, Delivery Blocked: Low Log Space

    I'm committed to logically analyzing this to figure out what's happening and why. The Exchange product team folks are hard to get a hold of, and I would imagine OpsMgr support is frustrated as well. If you're following this, MSFT, please try to find the happy medium:

    OpsMgr team writing all MP's = problematic

    Product teams writing respective MP's = sounded great, but is actually more problematic with certain MP's

    Sunday, May 8, 2011 1:44 AM
  • Update:

    We have multiple incidents open for the following items:

    1. Getting hotfix for CE bug with SDK

    2. Workaround for timeouts with Troubleshoot-DatabaseSpace.ps1 script. This is a module as part of a composite module, and is therefore not a simple rule that can be disabled via override, and a corresponding new rule authored

    3. We are finding that over time (maybe 7 days), workflows for some Exchange KHI monitors targeting mailbox servers will not reset health when the condition no longer exists. These are workflows where a clearing condition is defined, and that condition is present. The health service does not generate any errors and is is healthy. KHI alerts will continue to update repeat count, which makes no sense. Flushing the agent cache resolves the problem, and no more alerts are generated.

    Tuesday, May 17, 2011 1:08 AM
  • I'm having the same problems as well and would be interested in what you guys find out.  Although these are only Warning messages, I hate to see these errors even listed in my Active alerts.
    John K. Boslooper Windows Server Administrator UT Dallas
    Tuesday, May 31, 2011 3:53 PM
  • I am seeing these exact same timeouts and 'value cannot be null' sometimes.

    Which are then followed instantly by alerts that my DB performance is impacted.

    I have to say that the exchange 2010 SP1 MP is far from being the flagship MP that many SCOM admins proclaim it to be.

    Would be nice if one of the products teams acknowledged there where issues and stated a reasonable date to release an updated MP.

    Monday, June 6, 2011 1:41 PM
  • UPDATE:

     

    I found that it was caused by a McAfee bug which it was blocking the scripts even though it was set as disabled or excluded.  Version 8.8 Patch 1 fixed the issue.  If you don't have it available yet, contact McAfee support.


    John K. Boslooper Windows Server Administrator UT Dallas
    Monday, October 10, 2011 2:47 PM
  • John,

    Thank you for your reply, we don't have McAfee installated on the system(s). But I will check whether our virusscan engine blocks these scripts.

    Best regards,

    Mark

    Monday, October 10, 2011 3:08 PM
  • Could be the case of powershell policy.  Could be the case that the console is not installed on the agent managed computers (console install is required to get the snap-in) - I know it sounds wierd, but it is a possible remedy if the snapin is not there.  The antivirus path is often the culprit here.  Later on in this thread there is talk of timeouts - 300 seconds is 5 minutes.  Much longer than that the agent will kill it anyhow as a zombie.

    If this is occasional, ignore it (safe to do)

    If it is constant, you have a setup/security/antivirus issue.


    Microsoft Corporation
    Tuesday, October 11, 2011 9:41 PM
  • Dan,

    The powershell policy is set to RemoteSigned.

    You mention the console, but could you elaborate what console your referring to? Is this the Operations Manager Shell, the Exchange Management Console, or some other console I'm missing here?

    The 'normal' exclusion paths for the antivirus are active.

    Regards,

    Mark

    Thursday, October 13, 2011 7:55 AM
  • It is probably your powershell policy then.  The Agent writes unsigned PS to the file system and then executes them.  I was referring to the operations manager console.
    Microsoft Corporation
    Monday, October 17, 2011 8:25 PM
  • I recently encountered the same problem with Exchange 2010 MP. Can anyone find another solution rather than antivirus fix or SCOM console installation?
    Tuesday, March 13, 2012 11:40 AM
  • Hi all,

    I ran the script manually and it run slightly more than 5 minutes, so I suppose it's a timeout problem depending on the Exchange server configuration (the one I'm following by my customer is quite big).

    The missing snap-in is not a problem:

    # Attempt to add the MOM snapin but don't fail if it cannot be added.
        # When executing scripts on a machine where the SCOM console is not installed,
        # it is expected that the snapin won't be added.
        Add-PSSnapin -Name $momSnapinName -ErrorAction SilentlyContinue;

    but I wasn't able to find out the rule/monitor executing the ExecuteDiagnosticScript.ps1 script, can anyone help me please?

    as per the antivirus exclusion, I applied the ones suggested on Kevin's blog, but for this case would you suggest to exclude powershell.exe too?

    thanks a lot

    Thursday, March 15, 2012 11:49 AM
  • PhirePhil is correct. It does take slightly longer than 5 minutes for this script to complete. How can the timeout be adjusted so there is no alert?
    Friday, May 11, 2012 9:21 PM
  • You can't.  It's not an overridable parameter.    So you are stuck with it essentially, like the rest of us!

    Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/

    Saturday, May 12, 2012 3:41 AM
  • All,

    At the end of it all, we opened a case with Microsoft support and in agreement with the Exchange administrators we've disabled a number of monitors:

    • KHI: The database copy is low on database volume space and continues to grow. The volume is under 25% free.
    • KHI: The database copy is low on database volume space and continues to grow. The volume has reached error levels under 16% free.
    • KHI: The database copy is low on database volume space and continues to grow. The volume has reached critical levels 8% free.
    • KHI: Failed to execute Troubleshoot-DatabaseSpace.ps1.

    Disabling these monitors will break monitoring of the database volume space.

    Disabling these stopped the errors in our environment with the cost of losing insight of our Exchange environment. Rumors are the should be fixed in a OM12 Exchange Management Pack.

    Regards,

    Mark

    Saturday, May 12, 2012 10:38 AM
  • but I wasn't able to find out the rule/monitor executing the ExecuteDiagnosticScript.ps1 script, can anyone help me please?

    as per the antivirus exclusion, I applied the ones suggested on Kevin's blog, but for this case would you suggest to exclude powershell.exe too?

    Phil,

    The rule is 'Script event collection: Execute: Troubleshoot-DatabaseSpace diagnostic script.' I believe that the ExecuteDiagnosticScript.ps1 simply sets up the environment for rules of this kind.

    Sadly I think that Blake is right where he writes below that it's not overrideable, and Dan (msft) suggests that the agent will kill anything as a zombie running over 5min as well. The unfortunate part is that in our environment this scan takes just over 5min, I think someone earlier up the thread mentioned this as well. It seems that perhaps on much larger Exchange installations this is an issue? Although I would think that Microsoft would have noted this in their environment and made adustments somehow.

    Since Microsoft eats their own dogfood, it would be nice to hear in their environment how this is handled. Internally I've recommeneded that it's safe to simply clear out. And before someone tells me to RTFM, like several other's who've posted to this thread, I have a few times.

    Thanks,


    Jeffrey S. Patton Jeffrey S. Patton Systems Specialist, Enterprise Systems University of Kansas 1001 Sunnyside Ave. Lawrence, KS. 66045 (785) 864-0242 | http://patton-tech.com

    Friday, June 15, 2012 1:43 PM
  • It is not overrideable, they didn't expose the time out let alone much of anything else.  If you unseal the mp and look at it in the authoring console, go to where all the datasources are you will find this script.

    Regardless, I would suspect this has been mentioned several times to MSFT, but I just don't know if this is something they can fix with an updated mp, might be one of those rip and replaces, which is hard for this MP because of the correlation engine (ok not so hard, but more steps involved).


    Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/

    Friday, June 15, 2012 4:37 PM
  • another having the same issue. We need to update the timeout to 450 secs (and a lower frequency would be nice too as it will run for too long).

    Also found the rule 'Script event collection: Execute: Troubleshoot-DatabaseSpace diagnostic script.' I believe that the ExecuteDiagnosticScript.ps1' as the one that should execute the script. funny thing though it seems to be disabled by default with no overrides, no clue how it is run :).


    Rob Korving
    http://jama00.wordpress.com/

    Tuesday, June 26, 2012 12:50 PM
  • Apparently Microsoft released Service Pack 2 for the Exchange 2010 management pack where they changed the timeout from 300 seconds to 1200 seconds. However there is an issue where user mailboxes are getting quarantined, so Microsoft has pulled the service pack until they can provide a more long term solution:http://blogs.technet.com/b/exchange/archive/2012/06/28/mailboxes-on-a-database-are-quarantined-in-an-environment-with-system-center-operations-manager.aspx

    In short - a fix for this VERY annoying situation is in the works, and should be available to all of us once they fix this bug!


    UPDATE - Microsoft's updated version of the management pack can be found here: http://www.microsoft.com/en-us/download/details.aspx?id=692
    Thursday, June 28, 2012 5:25 PM
  • I've solved this issue by just making sure the logs were backupped and truncated. But i guess it still depends on how many db's you have... (the command finished in 30 secs for that customer).


    Rob Korving
    http://jama00.wordpress.com/

    Monday, July 16, 2012 1:34 PM
  • hi all,

    any update with this? - i've recently installed the latest version of the exchange 2010 MP (14.3.38.4) for SP2 and use it with SCOM12- its all installed and running and i'm working through the tuning stage.

    the problem i have might be related or a step further on if we assume this latest MP has fixed the timeout problems for the OP.

    i'm struggling to tune the alert: "The database copy is low on database volume space and continues to grow. The
    volume has reached error levels under 16% free." - this is generated by running the "Troubleshoot-DatabaseSpace.ps1" script using the variables in "StoreTSConstants.ps1" - both of these are located in "C:\Program Files\Microsoft\Exchange Server\V14\Scripts" originally but the MP/SCOM12 seems to generate its own version of the files for use with the rule and places it in the "C:\Program Files\System Center Operations Manager\Agent\Health Service State\Monitoring Host Temporary Files XXXX" folder - this newly generated version of the script always has the default values of:

    # The percentage of disk space for the EDB file at which we should start quarantining users.
    $PercentEdbFreeSpaceDefaultThreshold = 25

    # The percentage of disk space for the logs at which we should start quarantining users.
    $PercentLogFreeSpaceDefaultThreshold = 25

    # The percentage of disk space for the EDB file at which we are at alert levels.
    $PercentEdbFreeSpaceAlertThreshold = 16

    # The percentage of disk space for the EDB file at which we are at critical levels.
    $PercentEdbFreeSpaceCriticalThreshold = 8

    all of which i want to adjust to my environment - is anyone aware of which file SCOM or the SCOM Agents on the exchange servers use to create these temp files? - i maybe able to amend the values in this location and so finally tune the script! :)

    failing that could we write a copy script to search for and then overwrite the "StoreTSConstants.ps1" file with a pre-amendded version from a network share every 30mins...

    Tuesday, November 20, 2012 11:06 AM
  • @CurbysanPan - The issue you are reporting has nothing to do with this thread which was a timeout issue of the scripts running, which Microsoft fixed in the updated management pack: http://www.microsoft.com/en-us/download/details.aspx?id=692

    I suggest you start a new thread about your desire to modify the scripts/values to your enviornment.

    Tuesday, November 20, 2012 6:43 PM
  • @HotFix - okie doke, i did allude to that in my post but thought as i found this thread when searching for my problem presumed i'd post to help a possible future searcher(er) if researching the same problem :o)<o:p></o:p>

    @everyone - incidentally i did write a script that might be useful for someone else... please use if you need it:<o:p></o:p>

    :: ### Location of amended exchange script ###<o:p></o:p>

    set exchangescript=\\[server]\[share]\[folder]\StoreTSConstants.ps1


    @echo off


    :: ### Select start folder ##<o:p></o:p>

    pushd "C:\Program Files\System Center 2012\Operations Manager\Agent\Health Service State"<o:p></o:p>

    :: ### Search for known file and return full path inc filename - store result in variable ###<o:p></o:p>

    FOR /F "tokens=*" %%A IN ('Dir /s /b StoreTSConstants.ps1 ^| FIND "Store"') DO SET file=%%A<o:p></o:p>

    :: ### Trim the filename from end of the full path - reset the variable ###<o:p></o:p>

    set file=%file:\StoreTSConstants.ps1=%<o:p></o:p>

    :: ### Copy and overwrite the discovered file with a pre-amended version ###<o:p></o:p>

    copy /V /Z /Y "%exchangescript%" "%file%"

    You would need to schedule a task to copy the file to each exchange mailbox server or set the task individually on each mailbox server.

    Thanks<o:p></o:p>

    Wednesday, November 21, 2012 4:00 PM