none
Microsoft.Windows.Server.MonitorClusterDisks.vbs script fails to run repeatedly reporting exit code 3. RRS feed

  • Question

  • Hi,

    a customer of mine has a problem with the cluster monitoring in SCOM.

    They are running SQL Server on a Windows Server servicing about 20 database instances and 5 DTC instances. The number of cluster disks in the cluster is about 155.

    Configuration:

    Server   CPU: Intel   xeon E5-2667 v3, sockets:2, Cores 16, logical processors 32

    Server   Memory: 768 Gb

    Server OS: Windows   Server 2012 R2 Standard

    SQL   Server: Microsoft   SQL Server Enterprise (64-bit), Version: 11.0.6020.0

    Operation   Manager 2012 R2 7.1.10226.1239   with U11 rollup update

     

    Management packs involved:

    Windows Server 2012 Cluster Management Library;  version: 6.0.7291.0

    Windows Cluster Management Monitoring;  version: 6.0.7291.0

    Windows Server 2012 R2 Cluster Management Library;  version: 6.0.7291.0

    Windows Server 2012 R2 Cluster Management Monitoring;  version: 6.0.7291.0

    Windows Cluster Management Library;  version: 6.0.7291.0

    Windows Server Cluster Disks Monitoring;  version: 6.0.7316.0

    Windows Cluster Library;  version: 7.0.8433.0

    Alert Detail example:

    The output data were found, but these have been removed because in the event policy for the process that started at 10:19:10 errors are found.

    The policy expression Exit Code:

    [^ 0] +

    corresponds to the following output:

    3

    Command is executed: "C: \ Windows \ system32 \ cscript.exe" / nologo "Microsoft.Windows.Server.MonitorClusterDisks.vbs" false "Cluster Disk Monitoring" "server.infra.local" "CLU02"

    Working folder: C: \ Program Files \ Microsoft Monitoring Agent \ Agent \ Health Service State \ Monitoring Host Temporary Files 13836 \ 55674 \

    This affects one or more workflows.

    Workflow Name: Microsoft.Windows.Server.ClusterDisksMonitoring.ClusterDisk.Monitoring.CollectPerfDataSource.FreeSpaceMB

    Copy Name: Cluster Disk 92 _ \\ \ Volume {6d883905-fff8-452f-8eea-9ecb4606c784}?

    Instance ID: {-B92-31F2-DEFA-5F89441DC5C5}

    Management Group: SCOM

    We applied overrides as described in https://blogs.technet.microsoft.com/kevinholman/2013/02/21/healthservice-restarts-still-a-challenge-in-opsmgr-2012. The have some effect but the problem is not gone.

    This MP never did his job.

    There are multiple clusters monitored and the problem is on all clusters

    The problem has nothing to do with backup, maintenance or what so ever

    I can’t figure out what the problem could be.

    thanks!

    Wednesday, November 30, 2016 7:54 PM

All replies

  • WOW - that is a LOT of disks, and SQL instances (which result in cluster resource groups)

    I am not sure the current cluster disk monitoring scripts really are designed to handle an environment that large.

    I am surprised it finishes at all, I'd expect it to time out.  A lot of the cluster WMI namespaces are SLOW and take forever to query.  It is likely this server is just too big.

    I'd recommend opening a case with Microsoft and pushing for an update to address this for very large clusters like this.  You can test this by running the script manually, and trying to debug why or where it is failing.

    C:\Windows\system32\cscript.exe" /nologo "Microsoft.Windows.Server.MonitorClusterDisks.vbs" false "Cluster Disk Monitoring" "server.infra.local" "CLU02"


    Kevin Holman http://blogs.technet.com/b/kevinholman

    Thursday, December 1, 2016 1:58 AM
    Moderator
  • Hi Kevin,

    thanks for your reply. I will open a case with Microsoft for this problem. I will definitely share the findings.

    Regards,

    André

    Thursday, December 1, 2016 3:02 PM
  • Hi,

    I did not opened a case but did some study.

    There are some things.

    The Workflow Name is: Microsoft.Windows.Server.ClusterDisksMonitoring.ClusterDisk.Monitoring.CollectPerfDataSource.FreeSpaceMB

    Using get-scomrule point to the rule used in this workflow: Cluster Disk - Free space / MB

    There also is an aggregate rollup “Cluster Disk - Free Space Rollup Monitor” with two monitors of Cluster Disk State Monitor Type. The Cluster Disk - Free Space Monitor (%) and Cluster Disk - Free Space Monitor (MB)

    The discriptions are “This monitor checks the free space in % of the targeted cluster disk” and “This monitor checks the free space in MB of the targeted cluster disk”.

    Workflows for these monitors: Microsoft.Windows.Server.ClusterDisksMonitoring.ClusterDisk.FreeSpacePercent (Cluster Disk - Free Space Monitor (%)) and Microsoft.Windows.Server.ClusterDisksMonitoring.ClusterDisk.FreeSpaceMB (Cluster Disk - Free Space Monitor (MB))

    So this are different workflows that apparently are not failing.

    Is it safe to say that if I disable the RULE Cluster Disk - Free space / MB (which is as I assume is collecting performance data) there is no data for the “Performance view” but there is still monitoring on disc space on the cluster disks?

     

    Wednesday, December 7, 2016 2:40 PM
  • Hi,

    I did a test and if I disable the RULE Cluster Disk - Free space / MB there is no data for the “Performance view” but there is still monitoring on disc space on the cluster disks!

    Thursday, December 8, 2016 8:08 PM