none
Management servers went grayed state and OpsMgr SDK event ID's are triggering frequently in management servers RRS feed

  • Question

  • We are facing the issue in SCOM management servers, 2 of the servers went grayed state and they're not coming to healthy state.

    The below mentioned Event ID's are frequently appearing in most of the management servers.

    Some of the client servers are not in monitoring properly, like some of them are in healthy state but all the monitoring parameters are showing Unknown  and some of them are in not monitoring state.

    Help needed...

    --------------------------------------------------------------------------------------------------------------------------------

    MANAGEMENT SERVER

    Log Name:      Operations Manager
    Source:        Health Service Modules
    Date:          6/25/2019 2:35:26 PM
    Event ID:      31551
    Task Category: Data Warehouse
    Level:         Error
    Keywords:      Classic
    User:          N/A
    Computer:      C**SCOM001PD.h*q.h***y.com
    Description:
    Failed to store data in the Data Warehouse. The operation will be retried.
    Exception 'InvalidOperationException': The given value of type String from the data source cannot be converted to type nvarchar of the specified target column. 

    One or more workflows were affected by this.  

    Workflow name: Microsoft.Exchange.15.MailboxStatsSubscription.Rule 
    Instance name: C**SCOM001PD.h*q.h***y.com 
    Instance ID: {1EE4544E-32BC-65BB-D4A1-E7525C61C10C} 
    Management group: HUSKY_SCOM2016

    ----------------------------------------------------------------------------------------------------------------------------------------
    MANAGEMENT SERVER

    Log Name:      Operations Manager
    Source:        OpsMgr Connector
    Date:          6/25/2019 2:34:36 PM
    Event ID:      20034
    Task Category: Availability
    Level:         Error
    Keywords:      Classic
    User:          N/A
    Computer:      C**SCOM001PD.h*q.h***y.com
    Description:
    The health service {D3B679C2-390B-69A3-C8DA-24EE57C3CD0F} running on host CGW***04PD.hq.ha**y.com and serving management group hasuy_SCOM2016 with id {D7B0417C-E590-184F-0767-23427F4A27A1} is not healthy.  Entity state change flow is stalled with pending acknowledgement.


    ----------------------------------------------------------------------------------------------------------------------------------------
    MANAGEMENT SERVER

    Log Name:      Operations Manager
    Source:        OpsMgr SDK Service
    Date:          6/25/2019 2:32:18 PM
    Event ID:      26319
    Task Category: None
    Level:         Error
    Keywords:      Classic
    User:          N/A
    Computer:      C**SCOM001PD.hq.h***y.com
    Description:
    An exception was thrown while processing GetUserRolesForOperationAndUser for session ID uuid:849e1102-d445-4993-8b4c-618f02e35991;id=22369.
     Exception message: Value does not fall within the expected range.
     Full Exception: System.ArgumentException: Value does not fall within the expected range.
       at Microsoft.EnterpriseManagement.Interop.Security.Auth.IAzApplication2.InitializeClientContextFromStringSid(String SidString, Int32 lOptions, Object varReserved)
       at Microsoft.EnterpriseManagement.Mom.Sdk.Authorization.AzManHelper.GetScopedRoleAssignmentsForUser(Int32 operationNumericId, String userName)
       at Microsoft.EnterpriseManagement.Mom.Sdk.Authorization.AuthManager.GetUserRolesForOperationAndUser(Guid operationId, String userName)
       at Microsoft.EnterpriseManagement.Mom.Sdk.Authorization.AuthorizationService.GetUserRolesForOperationAndUser(Guid operationId, String userName)
       at Microsoft.EnterpriseManagement.ServiceDataLayer.SecurityConfigurationService.GetUserRolesForOperationAndUser(Guid operationId, String userName)
       at Microsoft.EnterpriseManagement.Mom.ServiceDataLayer.SdkDataAccessBackCompatProxy.GetUserRolesForOperationAndUser(Guid operationId, String userName)

    ----------------------------------------------------------------------------------------------------------------------------------------
    MANAGEMENT SERVER

    Log Name:      Operations Manager
    Source:        OpsMgr Connector
    Date:          6/25/2019 2:38:17 PM
    Event ID:      20038
    Task Category: Availability
    Level:         Error
    Keywords:      Classic
    User:          N/A
    Computer:      C**SCOM001PD.hq.h***y.com
    Description:
    The health service {D3B679C2-390B-69A3-C8DA-24EE57C3CD0F} running on host CGWSCOM004PD.hq.hasuy.com and serving management group hasuy_SCOM2016 with id {D7B0417C-E590-184F-0767-23427F4A27A1} is not healthy.  Alert flow is stalled with pending acknowledgement.

    ----------------------------------------------------------------------------------------------------------------------------------------
    MANAGEMENT SERVER

    Log Name:      Operations Manager
    Source:        HealthService
    Date:          6/25/2019 2:45:01 PM
    Event ID:      2115
    Task Category: None
    Level:         Warning
    Keywords:      Classic
    User:          N/A
    Computer:      C**SCOM004PD.h*q.h***y.com
    Description:
    A Bind Data Source in Management Group hasuy_SCOM2016 has posted items to the workflow, but has not received a response in 1500 seconds.  This indicates a performance or functional problem with the workflow.
     Workflow Id : Microsoft.SystemCenter.CollectPublishedEntityState
     Instance    : CGWSCOM004PD.hq.hasuy.com
     Instance Id : {D3B679C2-390B-69A3-C8DA-24EE57C3CD0F}

    ----------------------------------------------------------------------------------------------------------------------------------------
     GATEWAY SERVER


    Log Name:      Operations Manager
    Source:        HealthService
    Date:          7/11/2019 7:59:37 AM
    Event ID:      2120
    Task Category: Health Service
    Level:         Warning
    Keywords:      Classic
    User:          N/A
    Computer:      W**SCOM011PD.h*q.h***y.com
    Description:
    The Health Service has deleted one or more items for management group "h*suy_SCOM2016" which could not be sent in 1440 minutes.
    ------------------------------------------------------------------------------------------------------------------------------------------------
    GATEWAY SERVER

    Log Name:      Operations Manager
    Source:        HealthService
    Date:          7/11/2019 7:59:37 AM
    Event ID:      2120
    Task Category: Health Service
    Level:         Warning
    Keywords:      Classic
    User:          N/A
    Computer:      C**SCOM010PD.d*nstrm.h***y.com
    Description:
    The Health Service has deleted one or more items for management group "h*suy_SCOM2016" which could not be sent in 1440 minutes.



    Thanks, Shiva ravichandran.



    Thursday, July 11, 2019 3:14 PM

All replies

  • Hi Shiva,

     

    From your description, I know the issue is that 2 Management servers are greyed out. And some monitoring agents are not monitoring or some parameters are showing unknown. If there’s any misunderstanding, please let us know.

     

    Before going forward, I would like confirm some information.

    1. Did it work well before?
    2. When did this happen?
    3. Has an change happened?

     

    Meanwhile, try the following steps to see if the issue can be fixed.

    1. Try to uninstall and reinstall the agent on management servers.
    2. Check if System Center management configuration and System center Data access service are running under the same Domain account which has enough permission.
    3. Restart the services on the 2 management servers :

           System Center management configuration

           System center Data access service.

           Microsoft monitoring agent service

      4. Test the network connectivity between Management server and  Gateway server, make sure the port 5723between them are opened

     

    Please try the above steps and if any update, please let us know.

     

    Best regards.

    Crystal


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Friday, July 12, 2019 2:08 AM
  • 1) Flush Health service state cache from SCOM console, for MS you can do it from Operations Manager folder -> Management group Health
    2) Deleted the ‘Health service state’ under operations manager-> Server folder on the MS
    Roger

    Friday, July 12, 2019 2:20 AM
  • These above troubleshooting steps are not working those servers are still as before (grayed state)

    1. Did it work well before? Yes it was working before 
    2. When did this happen? 1 month before
    3. Has an change happened? we haven't change anything

    1. Try to uninstall and reinstall the agent on management servers.

               I think we shouldn't install SCOM agent in management servers (correct me if i'm wrong)

    1. Check if System Center management configuration and System center Data access service are running under the same Domain account which has enough permission.

                the services are running with the enough permissions

    1. Restart the services on the 2 management servers :

           System Center management configuration

           System center Data access service.

           Microsoft monitoring agent service

                 restarted the all SCOM related services but not working 

      4. Test the network connectivity between Management server and  Gateway server, make sure the port 5723between them are opened

                 Ports are opened bi directional

      


    Thanks, Shiva ravichandran.

    Thursday, August 22, 2019 5:52 PM
  • Was this resolved? I'm having the same issue in operations manager 2019. In my case I completely uninstalled both management servers. I cleaned up all alerts except 31551 because I did not want to disabled the rule. I expanded both the operations manager and datawarehouse databases. I verified that the RMS stayed healthy with no errors for a full day. I flushed the system health and rebooted. There are no 2115 events. I then completely rebuilt one server and patched it. Afterwards I install SCOM 2019 with the management server role and management console. Everything went fine. After about 10 or 20 minutes the new server state went grey and I got an event 5300. My heart was broken.

    Wednesday, November 13, 2019 6:47 PM