locked
HealthService Errors RRS feed

  • Question

  • Hello Everyone,

    I was hoping somebody can shed some light for me on an issue I seem to be having. Firstly, my healthService seems to randomly stop working by itself sometimes (even though it shows the service running). My notifications will stop working and Service Requests get stuck in progress, etc. until I restart the service manually.

    Second, I opened up the event viewer and noticed I have an immense amount of warnings and errors.

    The warning is Event 1103, HealthService: Summary: 1 rule(s)/monitor(s) failed and got unloaded, 1 of them reached the failure limit that prevents automatic reload. Management group "SMAdmins". This is summary only event, please see other events with descriptions of unloaded rule(s)/monitor(s).

    The error is Event 4502, HealthService: A module of type "Microsoft.EnterpriseManagement.Mom.DatabaseQueryModules.GroupCalculationModule" reported an exception Microsoft.EnterpriseManagement.Mom.DatabaseQueryModules.DatabaseQueryModuleException: ManagedTypeId = a604b942-4c7b-2fb2-28dc-61dc6f465c68 does not have a property name = Customer_List. ---> Microsoft.EnterpriseManagement.Common.DataItemDoesNotExistException: ManagedTypeId = a604b942-4c7b-2fb2-28dc-61dc6f465c68 does not have a property name = Customer_List.
       at Microsoft.EnterpriseManagement.DataAccessLayer.TypeSpaceData.GetManagedTypePropertyId(Guid managedTypeId, String propertyName)
       at Microsoft.EnterpriseManagement.Mom.DatabaseQueryModules.ManagementPackElementResolver.ResolveManagedTypePropertyId(String token)
       at Microsoft.EnterpriseManagement.Mom.DatabaseQueryModules.ManagementPackElementResolver.ReplacePathProperty(String path)
       at Microsoft.EnterpriseManagement.Mom.DatabaseQueryModules.ManagementPackElementResolver.ReplacePropertyInPath(String path)
       at Microsoft.EnterpriseManagement.Mom.DatabaseQueryModules.ManagementPackElementResolver.ResolveAllGuidsInPath(String tokenElement)
       at Microsoft.EnterpriseManagement.Mom.DatabaseQueryModules.PathQueryParser.FixThePath(XPathNavigator membershipRuleNav)
       at Microsoft.EnterpriseManagement.Mom.DatabaseQueryModules.PathQueryParser.FixAllPaths(XPathNavigator pathDocumentNav)
       at Microsoft.EnterpriseManagement.Mom.DatabaseQueryModules.MembershipSubscription.ValidateAndInitialize(MembershipSubscription subscription)
       --- End of inner exception stack trace ---
       at Microsoft.EnterpriseManagement.Mom.DatabaseQueryModules.MembershipSubscription.ValidateAndInitialize(MembershipSubscription subscription)
       at Microsoft.EnterpriseManagement.Mom.DatabaseQueryModules.MembershipCalculationManager.UpdateAddedSubscriptions() which was running as part of rule "WorkItemGroup.90d9b6a2880c4821b4645878e732dd6a.Discovery" running for instance "Incident - CUSTOMERNAME" with id:"{2B277A89-7024-0A6C-0970-645049010DC7}" in management group "SMAdmins".

    I replaced the customers name with CUSTOMERNAME as the same error occurs for each customer we have in the system.

    Before these, I get another error, Event 7016, HealthService: The Health Service cannot verify the future validity of the RunAs account DOMAIN\SirLearnAlot for management group SMAdmins due to an error retrieving information from Active Directory (for Domain Accounts) or the local security authority (for Local Accounts).  The error is The RPC server is unavailable.(0x800706BA).

    Below this is yet another error with the same event number, but a different user...

    Event 7016, HealthService: The Health Service cannot verify the future validity of the RunAs account DOMAIN\scsmservice for management group SMAdmins due to an error retrieving information from Active Directory (for Domain Accounts) or the local security authority (for Local Accounts).  The error is The RPC server is unavailable.(0x800706BA).

    Not sure if this is related, but below this is another warning that gets repeated multiple times.. Event 29104, OpsMgr Config Service: OpsMgr Config Service failed to send the dirty state notifications to the dirty OpsMgr Health Services.  This may be happening because the Root OpsMgr Health Service is not running.


    Thursday, March 5, 2015 6:16 PM

All replies

  • Hi,

    Please check whether your DC is online and your management server can reach your DC.

    You should also have SCOM installed on your environment, you may look into your operation manager console and check is there any alert related.

    You may flush health service state and caches on your management servers.

    In addition, hope the below link can be helpful for you:

    http://blog.tyang.org/2011/09/30/event-id-29104-on-scom-rms-cluster/

    Regards,

    Yan Li


    Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Friday, March 6, 2015 5:48 AM
  • Thanks for your reply,

    DC is online ofcourse, and management server HAS to reach DC otherwise it wouldnt have internet connectivity.... regardless I did a ping and there are no problems there.

    I don't have SCOM installed.... nor am I planning on installing it in the near future... that would require a separate license and would not make sense to get just to solve my errors...

    Flushing the health service state doesn't solve anything... I am pretty sure this is credentials / authentication related as it keeps listing problems with the scsm management group (SMAdmins). Additionally, there may be a problem with Customer_List and some discovery rule issue... Unfortunately I am not exactly sure what.

    Friday, March 6, 2015 1:54 PM
  • Event 1103, HealthService: Summary: 1 rule(s)/monitor(s) failed and got unloaded, 1 of them reached the failure limit that prevents automatic reload. 

    this is a summary event telling you that one of the workflows failed, and is being ignored. we'd need to find other related events showing what failure occurred, and how to correct it.

    ManagedTypeId = a604b942-4c7b-2fb2-28dc-61dc6f465c68 does not have a property name = Customer_List. 

    This might be the failed workflow from #1. You might check the groups and queues you have defined, and see if one of them references this removed property. the console shouldn't allow you to create a group or queue against a non-existent property, but if the group already existed, and the property was removed, you might see this error. 

    Event 7016, HealthService: The Health Service cannot verify the future validity of the RunAs account DOMAIN\SirLearnAlot for management group SMAdmins due to an error retrieving information from Active Directory (for Domain Accounts) or the local security authority (for Local Accounts).  The error is The RPC server is unavailable.(0x800706BA)

    this might also be the failed workflow from #1. RPC server unavailable when attempting to check an account critical for Service Manager operations will definitely cause issues. make sure you have domain connectivity, and that domain controller is properly replicating. i doubt this is a service manager error; it's more likely that service manager is the victim of whatever domain authentication issue underlies this.

    Event 29104, OpsMgr Config Service: OpsMgr Config Service failed to send the dirty state notifications to the dirty OpsMgr Health Services.  This may be happening because the Root OpsMgr Health Service is not running

    The health state is how workflows get run. this is legacy behavior from Ops Manager, and directly relates to the workflow server starts it's workflows. the "Root OpsMgr Health Service" in this case referes to the Service Manager data access service running on the workflow server. if it's failed due to the above domain authentication issues, you would see this error. additionally, i would recommend clearing the health state and config state on your workflow server and restarting the service manger services. once again, this is far more likely to indicate that Service Manager is a victim of another deeper authentication issue, rather then the cause of one.

    Monday, March 9, 2015 3:37 PM
  • The second event (suggesting a missing property) confuses me because it occurs for EVERY group/queue I have defined, and the name of the property is correct (Customer_List). Any suggestions on how to narrow this down?

    I do have domain connectivity... however the only thing I can think of that changed was this.... I had a group called SMAdmins as the service manager management group... however I needed to change the group type to "domain local" (vs. global) to be able to add AD members from a different domain (with a one way trust). This kind of change cannot be made to a security group once its already been created so I just deleted the group and recreated it as a "domain local" group and readded all the members... I realize I may have messed something up where that group is not being identified by name alone but by some kind of container ID or some other property that must have changed when I created a new folder. Any ideas on how to fix this?

    I checked my DC and it seems to be replicating properly... DNS is working fine as well otherwise none of the servers would be able to access public addresses... what else can I check?

    Thanks in advance Thomas!
    Tuesday, March 10, 2015 11:57 AM
  • Deleting a group and creating a new one with the same name will result in a new group with a new SID. none of the permissions of the old group will apply to the new group. on the same subject a domain local group can't be used on member servers, only domain controllers. You'll have to re-add this new group to the correct permissions. and you might need to convert it to universal in order to get your membership and usage correct.  

    Tuesday, March 10, 2015 3:39 PM
  • The groups are all created on the DC AD so this should be fine as domain local right? The reason I created domain local is because we have a one-way trust relationship with the company main domain (created another domain for SCSM) so that we can add our analysts from our own AD.

    The reason for this is because our analysts login from within our main domain (at workplace). Makes it easier by auto logging in the user into console by pulling current user credentials from domain AD versus having to connect to DOMAIN\user every time.

    How can I re-add the group w/ correct permissions? Is this within the SMDB? Console Administrative Settings?

    Thanks again for the help mate.

    Tuesday, March 10, 2015 3:53 PM