none
Disable Cluster MP on selected servers but not remove MP

    Question

  • SCOM 2012 SP1

    We recently had multiple issues occur on our Server 2008 R2 Hyper-V servers in a cluster that brought down our cluster services and in one instance led to a BSOD on one of the hosts. 

    The cluster MP queried the cluster services deadlocked or timeout the cluster service on the host causing VM's to failover.  It happened twice! and event errors I show at end of this post.  Subsequently, we removed SCOM agent from our HPV servers and they are working fine.

    We wish to reinstall the SCOM agent to enable monitoring but wish to disable the cluster component from integrating the servers.  We wish to keep the MP because its also monitoring our SQL cluster successfully.  We will need to somehow disable cluster monitoring while the agent is being reinstalled so the MGT pack information is not transferred to the HPV servers but then re-enable it so it continues to monitor our other cluster resources.

    I have version 6.0.7063.0 of Windows Cluster Management Library and Monitoring installed.

    Recommendations?

    The Second question is also related to SCOM interrogation of our HPV servers.  We want to change the frequency that the NIC cards are queried.  I think default is around every 5 minutes.  Where can I find this and change so that they are queried every say 1/2 hour?

    --------------------------------------------------------------------------------------------------------------------------------------

    Log Name:      Operations Manager

    Source:        Health Service Modules

    Date:          5/8/2014 4:36:34 PM

    Event ID:      10409

    Task Category: None

    Level:         Warning

    Keywords:      Classic

    User:          N/A

    Computer:      Server1 of cluster

    Description:

    Object enumeration failed

    Query: 'SELECT Name, State FROM MSCLUSTER_Resource'

    HRESULT: 0x80071716

    Details: The call to the cluster resource DLL timed out.

    One or more workflows were affected by this. 

    Workflow name: many

    Instance name: many

    Instance ID: many

    Management group: SCOM GROUP NAME

    Log Name:      System

    Source:        Microsoft-Windows-FailoverClustering

    Date:          5/8/2014 4:36:34 PM

    Event ID:      1146

    Task Category: Resource Control Manager

    Level:         Critical

    Keywords:     

    User:          SYSTEM

    Computer:      SERVER 1 of Cluster

    Description:

    The cluster resource host subsystem (RHS) stopped unexpectedly. An attempt will be made to restart it. This is usually due to a problem in a resource DLL. Please determine which resource DLL is causing the issue and report the problem to the resource vendor.

    And at the same time, all the VMs are failed.  Here's the first one:

    Log Name:      System

    Source:        Microsoft-Windows-FailoverClustering

    Date:          5/8/2014 4:36:34 PM

    Event ID:      1230

    Task Category: Resource Control Manager

    Level:         Error

    Keywords:     

    User:          SYSTEM

    Computer:      SERVER 1 of Cluster

    Description:

    Cluster resource 'SCVMM SERVER-SQL-NAME (resource type '', DLL 'vmclusres.dll') either crashed or deadlocked. The Resource Hosting Subsystem (RHS) process will now attempt to terminate, and the resource will be marked to run in a separate monitor

    Wednesday, May 14, 2014 11:30 PM

Answers

  • Create a group of windows computers (the seed class for cluster I believe) that you want to exclude this from cluster.  Override the seed discovery to be disabled against this group of computers.  Done and done.  You may have to run remove-disabledmonitoringobject after if they already have been discovered.

    Find the seed class, I suspect it's windows.computer.  Look at the mp in the authoring console and look at the discoveries, find the seed.

    In regards to the nics:

    I "believe" you have two nic discoveries.  One comes from the windows os, the other comes from cluster mp.  So look at those, and it would probably be a good idea to mod both, but just for clusters.  I disabled network card discovery for all our servers because it generates a lot of config churn, especially if you have a lot of clusters.  If you look at the config data that is updated, you get IPV4 addresses being changed to IPV6, computer names changing from node name to fqdn node name, etc.  It's nasty.  Nic config changes were one of our top noisy config data and I suspect it was due to the fact that if you fail a resource over to another node, I believe a cluster discovery is auto kicked off, so if you have a lot of fail overs, a lot of config data is discovered and sent to the DW.

    I also modified windows server discovery to run once a week.  Now our config data (changes) in a 24 hour period are greatly reduced (nic discoveries turned off - doesn't impact performance grabs and most of the nic monitors are disabled out of the box, cluster hb and network connectivity are still working). 

    As for those events you posted, I have yet to see them, but maybe they are happening in our labs where we run clusters on hyper-v...not sure haven't looked.  But go into authoring, and clear all check boxes, then search for network adapter, and you should see all the classes and find their discoveries, you can override as you wish...just make sure you created instance groups of those TARGETED classes, so you can override properly.


    Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/ If my response was helpful, please mark it as so, if it answered your question, then please also mark it accordingly. Thank you.

    • Marked as answer by DDNDAVID Thursday, May 15, 2014 3:53 PM
    • Unmarked as answer by DDNDAVID Thursday, May 15, 2014 8:35 PM
    • Marked as answer by DDNDAVID Thursday, May 15, 2014 8:35 PM
    Thursday, May 15, 2014 6:55 AM

All replies

  • Create a group of windows computers (the seed class for cluster I believe) that you want to exclude this from cluster.  Override the seed discovery to be disabled against this group of computers.  Done and done.  You may have to run remove-disabledmonitoringobject after if they already have been discovered.

    Find the seed class, I suspect it's windows.computer.  Look at the mp in the authoring console and look at the discoveries, find the seed.

    In regards to the nics:

    I "believe" you have two nic discoveries.  One comes from the windows os, the other comes from cluster mp.  So look at those, and it would probably be a good idea to mod both, but just for clusters.  I disabled network card discovery for all our servers because it generates a lot of config churn, especially if you have a lot of clusters.  If you look at the config data that is updated, you get IPV4 addresses being changed to IPV6, computer names changing from node name to fqdn node name, etc.  It's nasty.  Nic config changes were one of our top noisy config data and I suspect it was due to the fact that if you fail a resource over to another node, I believe a cluster discovery is auto kicked off, so if you have a lot of fail overs, a lot of config data is discovered and sent to the DW.

    I also modified windows server discovery to run once a week.  Now our config data (changes) in a 24 hour period are greatly reduced (nic discoveries turned off - doesn't impact performance grabs and most of the nic monitors are disabled out of the box, cluster hb and network connectivity are still working). 

    As for those events you posted, I have yet to see them, but maybe they are happening in our labs where we run clusters on hyper-v...not sure haven't looked.  But go into authoring, and clear all check boxes, then search for network adapter, and you should see all the classes and find their discoveries, you can override as you wish...just make sure you created instance groups of those TARGETED classes, so you can override properly.


    Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/ If my response was helpful, please mark it as so, if it answered your question, then please also mark it accordingly. Thank you.

    • Marked as answer by DDNDAVID Thursday, May 15, 2014 3:53 PM
    • Unmarked as answer by DDNDAVID Thursday, May 15, 2014 8:35 PM
    • Marked as answer by DDNDAVID Thursday, May 15, 2014 8:35 PM
    Thursday, May 15, 2014 6:55 AM
  • Great reply but leads me to another question

    I wish to do the grouping of computer process prior to installing an agent on the HPV server.  I'm not finding how I can create a group of machines I can't see, ie are not managed yet.

    Thursday, May 15, 2014 3:32 PM
  • Right, so to create the dynamic grouping, you could key off something like a common naming structure, ou membership, things of that nature.  Otherwise, you can add a registry key and make membership of this instance group, any machine that has that key.

    So there are ways to do this, but it will take some planning if you have a very dynamic virtual environment.  If the Hyper V boxes are of a certain naming convention or stored in a common ou, or the ou structure is very good in your environment, this shouldn't really be an issue.


    Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/ If my response was helpful, please mark it as so, if it answered your question, then please also mark it accordingly. Thank you.

    • Marked as answer by DDNDAVID Thursday, May 15, 2014 3:53 PM
    • Unmarked as answer by DDNDAVID Thursday, May 15, 2014 8:35 PM
    Thursday, May 15, 2014 3:48 PM
  • Create Group / Authoring Group

                    2008 R2 Hyper-V Servers

                                    Dynamic Members

    ( Object is Windows Computer AND ( Organizational Unit Equals domainname/Servers/Hyper-V ) AND True )

    Authoring/Management Pack Objects/Object Discovery

                    Searched for Cluster

                    Find a lot – can this be narrowed or do I need to touch each object discovery

                    For each object that is Windows Server 2008 R2 (Cluster)

                                    EX: I’m seeing Cluster Network, Cluster Node etc.. or can I drill down to Just Windows Cluster

                    Override for All Objects of Another Class/View All Targets

                    Select 2008 R2 Hyper-V Servers

                    Disable

    Thursday, May 15, 2014 4:13 PM
  • OK want to run through this and get check off that I've done this correct

    Create Group / Authoring Group

                    2008 R2 Hyper-V Servers

                                    Dynamic Members

    ( Object is Windows Computer AND ( Organizational Unit Equals datadirect.datadirectnet.com/Servers/Hyper-V ) AND True )

    Use Management Pack viewer and open Windows Server 2008 Cluster Management Library.  While doesn’t show SEED it indicates Windows Server 2008 R2 Cluster Discovery.

    In SCOM in Object Discoveries I scoped to cluster and only saw Windows Server 2008 R2 Monitoring Cluster Server.  I did not see Windows Server 2008 R2 Cluster Discovery (seed?)  Then select as shown in image

    Post next message as can only post 2 images per message

    Thursday, May 15, 2014 5:13 PM
  • Would set override to disable.

    Thursday, May 15, 2014 5:14 PM
  • Also re: comment: I also modified windows server discovery to run once a week. Where did you make this change - I'm unable to locate
    Thursday, May 15, 2014 6:22 PM
  • Hopefully last question and I get answer.  RE: Remove-DisabledMonitoringObject  this is a destructive process that is irreversible.  Given steps above, do those steps of disabling the monitoring, or is the CMDLET necessary?

    Thursday, May 15, 2014 8:57 PM
  • Don't use MP viewer or the authoring console.  Export the MP, or use the authoring console to open the MP (could be a separate discovery mp, but I can't recall).  Look at the CLASSES in the authoring console (where the discoveries are), and you will see all the class, and one will be targeting the Windows Computer class.

    That is the seed class, and what you will want to disable for this group.

    For windows server discovery, you don't have to make that change, I made it because our windows servers are not changing properties on an hourly basis, and windows discovery seems to find IP addresses and what not, and every four hours, if you have clusters that are flip flopping and nics that are populating WMI with IPV6 addresses from IPV4, then those changes get detected and sent to the RMS, which I really don't care about and don't want to know about, so we made the windows server discovery run once every seven days, cut down on a lot of garbage discovered data we didn't need (WE, not you, but our organization, who knows maybe someone in your company wants that data refreshed on an hourly basis).  Windows server discovery is going to be found in one of the base management packs, you just have to scope to Windows Computer object and look at what MP hosts that class, and then you will know, in general where to look for the discovery.  

    remove disabled monitoring object is only going to remove discovered objects for disabled discoveries, it's not going to destroy anything.  If it removes things you didn't want, then you remove the override, and wait for discovery to kick off and they will be found again.

    So you have to find the CLASS that the seed class is targeting.  I am certain its windows.server or windows.computer.  You have to create your instance group of those computers that fit the critera (clustered hyper V servers or whatever it is) and VERIFY that the group is populated with the right class and the right servers that you want to disable the discovery for.

    Then you would create the override.  Right click on the discovery, override for group, or you could use the last option which i forget the wording but object of other type, pick your instance group that you created, and select disabled, save in override management pack with a similar name that the original discovery is housed in.. so if

    windows cluster discovery management pack, then you would make windows cluster discovery management pack overrides and save it in there.

    After you override, this config is going to be sent out.  So it takes time depending on your environment.  Change discovered inventory scope to the class that you are hoping to remove these servers from (cluster seed class).  You should see your servers.  Open up the scom command shell, run the remove disabled monitoring object command and refresh this discovered inventory view, and you should see them dropping out.  You may have to run the remove command a few times before they are all gone.  The remove command can take a bit of time to complete so be patient.


    Regards, Blake Email: mengotto<at>hotmail.com Blog: http://discussitnow.wordpress.com/ If my response was helpful, please mark it as so, if it answered your question, then please also mark it accordingly. Thank you.

    Friday, May 16, 2014 12:31 AM