none
Microsoft.Windows.Server.MonitorClusterDisks.vbs failed RRS feed

  • Question

  • Hi 

    I have a question because I get this Warning from a lot of our Servers:

    Data was found in the output, but has been dropped because the Event Policy for the process started at 12:53:29 has detected errors.   The 'ExitCode' policy expression:
    [^0]+
     matched the following output:


    Command executed: "C:\Windows\system32\cscript.exe" /nologo "Microsoft.Windows.Server.MonitorClusterDisks.vbs" false "ClusterDiskMonitoring" "XXXXXX.XXXXX.com" "XXXXXX"
    Working Directory: C:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State\Monitoring Host Temporary Files 5057\3300\ 

    One or more workflows were affected by this.  

    Workflow name: many 
    Instance name: many 
    Instance ID: many 
    Management group: XXXXXXXX

    So I tried to start the command from the working directory and after aproximatelly 15 seconds i got a result.

    What should be the problem here? I think a timeout is not the problem...

    Tuesday, November 19, 2019 12:12 PM

All replies

  • Hi,

    Do you see any errors in the Operations Manager event log?

    Best regards,
    Leon


    Blog: https://thesystemcenterblog.com LinkedIn:

    Tuesday, November 19, 2019 12:29 PM
  • Hi Leon

    The warning above is from the Operations Manager Log. It has the Event-ID 21414. This and the event 21402 are the only Warning/Critical events in the log. The event 21402 is a timeout event. I don't know why because when I run it manualy in the console it runs within 15 seconds.

    Event 21402 is:

    Forced to terminate the following process started at 15:23:29 because it ran past the configured timeout 300 seconds.
    Command executed: "C:\Windows\system32\cscript.exe" /nologo "Microsoft.Windows.Server.MonitorClusterDisks.vbs" false "ClusterDiskMonitoring" "XXXXXX.XXXXX.com" "XXXXX"
    Working Directory: C:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State\Monitoring Host Temporary Files 5057\3300\ 
    One or more workflows were affected by this.  

    Workflow name: many 
    Instance name: many 
    Instance ID: many 
    Management group: XXXXXXX


    It comes always one 21402 Event and then two 21414 Events, every 15 minutes.

    Kind Regards

    Tuesday, November 19, 2019 2:50 PM
  • Hi,

    What is the version of your SCOM Server? What is the version of your Windows server version? Did the issue happen before? 

    Best regards.
    Crystal


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Wednesday, November 20, 2019 5:36 AM
  • Hi Crystal

    I have SCOM 1807 and it runs on Windows Server 2016. The issue is on Windows Server 2016 and Windows Server 2012. The issue always occurs, but it is not on all server.

    Kind Regards

    Wednesday, November 20, 2019 6:52 AM
  • The special thing about the issue is, that it is only on the SQL Server. I monitor my SQL server with the SID approche. Kein Holman describe it here: https://kevinholman.com/2016/08/25/sql-mp-run-as-accounts-no-longer-required/
    Wednesday, November 20, 2019 8:48 AM
  • There is a confirmed bug with the Microsoft.Windows.Server.MonitorClusterDisks.vbs script and is being worked on according to the SCOM uservoice over here:

    https://systemcenterom.uservoice.com/forums/293064-general-operations-manager-feedback/suggestions/35694637-fix-the-script-used-to-monitor-the-cluster-disks


    Blog: https://thesystemcenterblog.com LinkedIn:

    • Marked as answer by StatelyElf Wednesday, November 20, 2019 9:39 AM
    • Unmarked as answer by StatelyElf Wednesday, November 20, 2019 9:39 AM
    Wednesday, November 20, 2019 9:26 AM
  • Thanks for your answer. I also saw this on SCOM Usevoice, but in my case the script don't need a lot of time to run when i start it manually. It completes after 10-20seconds. And the issue is also on 2k12 Server. The special thing is, that all SQL Server are affected, but no other Servers. I don't know if this is the same issue.

    Is it possible that I have to give an user more rights because of the configuration with the SID? I find it very special, that it only affect the SQL Servers.

    • Edited by StatelyElf Wednesday, November 20, 2019 9:47 AM
    Wednesday, November 20, 2019 9:47 AM
  • If you're running with a low privilege mode, you could try running with higher privileges.

    When you ran the script manually, you probably ran it with your own account which has different privileges.


    Blog: https://thesystemcenterblog.com LinkedIn:

    Wednesday, November 20, 2019 9:51 AM
  • I run the script under the local system account for my tests. How can i see with wich privilege mode SCOM runs the script? 

    This is the configuration of the Microsoft Monitoring Agent Service:

    

    Wednesday, November 20, 2019 9:58 AM
  • When you configured the serviceSID as per Kevin's blog post here: https://kevinholman.com/2016/08/25/sql-mp-run-as-accounts-no-longer-required/

    You have an option to choose between low privilege or sysadmin privilege, easiest to check what privileges your serviceSID has is by looking in the SQL Server Management Studio, check for the serviceSID login that was created and check it's permissions.


    Blog: https://thesystemcenterblog.com LinkedIn:

    Wednesday, November 20, 2019 10:04 AM
  • The Account has the sysadmin privilege...
    Wednesday, November 20, 2019 10:13 AM
  • Hi,
     
    Thanks for your confirmation. From your description, I know the issue only occurs on SQL servers. Would like to confirm if there’s any other cluster in our environment without the issue? Meanwhile, is there any performance issue on SQL server? Check the warning and find the related monitor or rule, check the setting on it and see if we can increase the time.
     
    In addition, from the warning , it shows the 'ExitCode' value is 3. Could you go through “Microsoft.Windows.Server.MonitorClusterDisks.vbs” to see the description of this exitcode.
     
    Best regards.
    Crystal

    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Thursday, November 21, 2019 9:04 AM
  • Hi, 

    How's everything going? Is there any update on the issue? If there's any, please let us know.

    Best regards.

    Crystal


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Tuesday, November 26, 2019 8:57 AM
  • Hi Crystal

    I have increased the timeouts, since this modification it seems to be better. I'm actually testing it with some systems. I will give an update as soon as i can definitly confirm it.

    Best Regards

    Tuesday, November 26, 2019 9:38 AM
  • Hi,

     

    Thanks for your reply. I am glad to hear that it is better after the modification. As you will do some testing with some systems, we will monitor. If there's any update, feel free to let us know.

     

    Thanks and have a nice day!

     

    Best regards.

    Crystal


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Tuesday, November 26, 2019 9:44 AM
  • It reminds me of an issue I had at one of my customers... could you try running this powershell command on one of the cluster node?

    Get-WmiObject -Query "Select * FROM MSCluster_Resource" -Namespace 'ROOT\MSCluster'

    Does it run successfully or does it fail?


    • Edited by CyrAz Tuesday, November 26, 2019 10:39 AM
    Tuesday, November 26, 2019 10:39 AM
  • Hi CyrAz

    It run successfully with no Errors.

    Unfortunately the issue still exists. But I realized something special:

    For example i have the following Error:

    Forced to terminate the following process started at 14:29:25 because it ran past the configured timeout 300 seconds.
    Command executed: "C:\Windows\system32\cscript.exe" /nologo "Microsoft.Windows.Server.MonitorClusterDisks.vbs" false "ClusterDiskMonitoring" "XXXXXX.XXXX.com" "XXXXX"
    Working Directory: C:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State\Monitoring Host Temporary Files 5641\28882\ 
    One or more workflows were affected by this.  
    Workflow name: Microsoft.Windows.Server.ClusterDisksMonitoring.ClusterDisk.Monitoring.CollectPerfDataSource.AvgDiskSecPerTransfer 
    Instance name: XXXX: 
    Instance ID: XXXXXXX} 
    Management group: XXXXX

    The Workflow Microsoft.Windows.Server.ClusterDisksMonitoring.ClusterDisk.Monitoring.CollectPerfDataSource.AvgDiskSecPerTransfer is from the Rule "Collection Rule for Average Disk Seconds Per Transfer Windwos Server Cluster Disk". But I created a override for the Rule for all Objects ob the Class "Cluster Disk" and set the Timeout Seconds on 600 Seconds. Why is in the Alerts still 300 seconds?

    Tuesday, December 3, 2019 3:49 PM