none
Management server getting greyed out again and again

    Question

  • Hi guys,

    We have installed Scom 2016 and installed 3 MS. out of which one MS is getting greyed out again and again.

    Please help me in this:-

    When we tried the below solutions , Management server is becoming healthy but again after 1 min, it becomes greyed out.

    1. Restart all the services

    2.Clear the cache by renaming the health service folder and restart Microsoft monitoring agent.

    Please mention if any other solution I can apply to make MS healthy.

    When checked in event viewer, I am getting the below errors

    1. Event id - 1103

    Summary: 1 rule(s)/monitor(s) failed and got unloaded, 1 of them reached the failure limit that prevents automatic reload. Management group "MG". This is summary only event, please see other events with descriptions of unloaded rule(s)/monitor(s).

    2. Event id - 21406

    The process started at 2:09:59 PM failed to create System.PropertyBagData. Errors found in output:

     

    H:\Program Files\Microsoft System Center 2016\Operations Manager\Server\Health Service State\Monitoring Host Temporary Files 1\1981\HotFixValidation.vbs(117, 10) Microsoft VBScript runtime error: Subscript out of range: 'count'

     

    Command executed:            "C:\Windows\system32\cscript.exe" /nologo "HotFixValidation.vbs"

    Working Directory:               H:\Program Files\Microsoft System Center 2016\Operations Manager\Server\Health Service State\Monitoring Host Temporary Files 1\1981\

     

    One or more workflows were affected by this. 

     

    Workflow name: ExchangeRequiredHotfixesNotInstalled

    Instance name: Instance name

    Instance ID: {3C122109-526B-CDD3-CCB5-0580829AB47F}

    3. 4502

    A module of type "Microsoft.EnterpriseManagement.Mom.Modules.SubscriptionDataSource.InstanceSpaceSubscriptionDataSource" reported an exception System.ArgumentNullException: Value cannot be null.

    Parameter name: value

       at System.Collections.CollectionBase.OnValidate(Object value)

       at System.Collections.CollectionBase.System.Collections.IList.Add(Object value)

       at Microsoft.EnterpriseManagement.Mom.Modules.SubscriptionDataSource.HttpRESTClient.PostDataAsync(Byte[] data, Object context)

       at Microsoft.EnterpriseManagement.Mom.Modules.SubscriptionDataSource.SubscriptionDataSource`2.WriteToCloud(List`1 items, DateTime firstTryDateTime)

       at Microsoft.EnterpriseManagement.Mom.Modules.SubscriptionDataSource.SubscriptionDataSource`2.PostAsync(List`1 items, DateTime firstTryDateTime) which was running as part of rule "Microsoft.SystemCenter.CollectInstanceSpace" running for instance "All Management Servers Resource Pool" with id:"{4932D8F0-C8E2-2F4B-288E-3ED98A340B9F}" in management group "MG".

    4. 10103

    In PerfDataSource, could not resolve counter instance OpsMgr DW Writer Module, Dropped Data Item Count, All Instances. Module will not be unloaded.

     

    One or more workflows were affected by this. 

     

    Workflow name: Microsoft.SystemCenter.DataWarehouse.CollectionRule.Performance.Writer.DroppedDataItemCount

    Instance name: Instance name

    Instance ID: {3C122109-526B-CDD3-CCB5-0580829AB47F}

    Management group: MG

    Thanks in Advance!


    AD

    Thursday, June 21, 2018 9:03 AM

Answers

  • Hi,

    I pointed to that rule since that was the one that popped up in the event description you provided in the question.

    "Send TypeSpace to the Cloud" rule is a different rule. And yes, this happens because of that as well :)

    If I remember right, these rules are part of the System Center Advisor MP. If you're not using this MP, you can consider deleting it.

    Please mark the thread as answered, so that others can benefit from it :)

    Hope this helps

    Cheers


    Sam (Please take a moment to "Vote as Helpful" and/or "Mark as Answer" wherever applicable. Thanks!) Blog:AnalyticOps Insights Twitter:Sameer Mhaisekar

    • Marked as answer by AD_SC Friday, June 22, 2018 9:20 AM
    Friday, June 22, 2018 9:19 AM

All replies

  • Hi,

    you've mentioned that the server is getting greyed out again and again...How does it turn back to "Monitored" and "green" again? Automatically or do you reset the workflow monitors? Does it stay permanently greyed out?

    Some other questions:

    - Does the affected server has the same UR level as the rest of the Management servers?
    - Is there an anti-virus software installed? The minimum requirement would be to exclude the Health Service State folder from scanning, but I have seen worse behaviour:

    From an older thread I have answered on the topic AV solution and SCOM :

    "Antivirus solutions use the so called "filter drivers" to provide the anti-virus filtering functionalities for their software. The specific thing is that when you want to exlude the AV solution as a possible cause sometimes disabling it is NOT sufficient. If you disable the service the filter driver is still active in the kernel of the OS and does filtering of requests on low level. This is all explained here:

    How to temporarily deactivate the kernel mode filter driver in Windows

    So, because of this, it is very, very important to:

    - either disable the filter driver as mentioned in the KB article or
    - Fully uninstall the AV software (for testing purposes) and then test. You can install it back at any time."

    The next thing to do would be to check the logs. You can enable verbouse logging on the affected management server like that:

    How to Debug SCOM agent

    You can collect the traces and check what is found there. Unfortunately only by looking at the events will be pretty tough to tell what is exactly going on and why so many workflows are being unloaded. 

    Regards,


    (Please take a moment to "Vote as Helpful" and/or "Mark as Answer" where applicable. This helps the community, keeps the forums tidy, and recognizes useful contributions. Thanks!) Blog: https://blog.pohn.ch/ Twitter: @StoyanChalakov


    Thursday, June 21, 2018 9:19 AM
    Moderator
  • Below are the answers to your questions

    1. How does it turn back to "Monitored" and "green" again? Automatically or do you reset the workflow monitors? Does it stay permanently greyed out? - Whenever I reset the health and restart the services, then it turns automatically green and then after few minutes it turns grey.

    2. Does the affected server has the same UR level as the rest of the Management servers - Yes, all have UR5

    3.Is there an anti-virus software installed - No


    AD

    Thursday, June 21, 2018 10:55 AM
  • Hi,

    can you please check for other events that could help troubleshoot this. The events you've posted could also just be a cnsequence of what is going on with the management server. Also , don't underestimate the logging, it could give a clue tot what happens. 

    Regards,


    (Please take a moment to "Vote as Helpful" and/or "Mark as Answer" where applicable. This helps the community, keeps the forums tidy, and recognizes useful contributions. Thanks!) Blog: https://blog.pohn.ch/ Twitter: @StoyanChalakov

    Thursday, June 21, 2018 12:46 PM
    Moderator
  • Hi,

    I have a speculation that this might be happening due to some rule that is running on that MS (whenever the rule runs, it fails and turns the MS to grey, when you restart services, it re-initializes and fails again). I have seen this happening before.

    The 3rd event you posted could be the culprit here. There is a rule named ."Microsoft.SystemCenter.CollectInstanceSpace" which according to the System Center Wiki, "sends instancespace up to the cloud." I suggest you to find this rule (search "Send Instancespace to the Cloud" in rules) and disable it, then flush the cache on the MS and check.

    Hope this helps

    Cheers


    Sam (Please take a moment to "Vote as Helpful" and/or "Mark as Answer" wherever applicable. Thanks!) Blog:AnalyticOps Insights Twitter:Sameer Mhaisekar

    • Marked as answer by AD_SC Friday, June 22, 2018 8:07 AM
    • Unmarked as answer by AD_SC Friday, June 22, 2018 8:40 AM
    Thursday, June 21, 2018 1:57 PM
  • May be one of the quick methods is uninstall this Management server and reinstall it again.
    roger

    Friday, June 22, 2018 4:21 AM
  • Thanks a lot Sameer. This solution Helped me. I disabled this rule and now my MS become healthy.


    AD


    Glad to know it worked. Happy to help :)

    Cheers


    Sam (Please take a moment to "Vote as Helpful" and/or "Mark as Answer" wherever applicable. Thanks!) Blog:AnalyticOps Insights Twitter:Sameer Mhaisekar

    Friday, June 22, 2018 8:18 AM
  • Hi Sameer,

    Thanks for the reply.

    I tried to disabled this rule but again the same issue persists.

    After the I disabled the Below rule by which the issue got fixed. May be this rule is different for scom 2016.

    "Send TypeSpace to the Cloud" rule

     

    Thanks for your help. It helped me a lot.


    AD


    • Edited by AD_SC Friday, June 22, 2018 8:45 AM
    Friday, June 22, 2018 8:43 AM
  • Hi,

    I pointed to that rule since that was the one that popped up in the event description you provided in the question.

    "Send TypeSpace to the Cloud" rule is a different rule. And yes, this happens because of that as well :)

    If I remember right, these rules are part of the System Center Advisor MP. If you're not using this MP, you can consider deleting it.

    Please mark the thread as answered, so that others can benefit from it :)

    Hope this helps

    Cheers


    Sam (Please take a moment to "Vote as Helpful" and/or "Mark as Answer" wherever applicable. Thanks!) Blog:AnalyticOps Insights Twitter:Sameer Mhaisekar

    • Marked as answer by AD_SC Friday, June 22, 2018 9:20 AM
    Friday, June 22, 2018 9:19 AM
  • yes this rule is related to Advisor MP. Thanks a lot.

    AD

    Friday, June 22, 2018 9:22 AM