locked
SCOM 2012 agent authentication issues RRS feed

  • Question

  • Hi all,

    I have a SCOM 2012 environment (7.1.10226.0) with a newly added remote site. The agent I have deployed in the remote site installs fine but then fail to authenticate post installation (both sites are in the same AD forest and the action account has the correct permissions to the relevant domains). Both sites trust one another so from a firewall perspective I am allowing all traffic in both directions. The latency between the sites are just over 250ms - from the research I have done on this issue I have come to assume that the issue I am experiencing are related to the below articles as I do not have a DC for the remote domain in the site where the SCOM server reside in:

    https://nocentdocent.wordpress.com/2012/10/26/opsmgr-2012-agents-across-slow-wan-links-are-unable-to-communicate/

    http://stefanroth.net/2012/12/09/scom-2012-event-id-20070-agent-across-slow-wan-links/

    To deploy a DC in the same site as the SCOM server might be an issue due to the amount of additional remote sites I plan to add. Does anyone know if there is any other fix for this (these articles dates back to 2012)?

    Are there any way around this issue by using a SCOM gateway perhaps? I implemented a gateway in the remote site but it has the same authentication issues to the SCOM server. I guess I can implement a gateway with certificate authentication but will that resolve the agent authentication issue (sorry, I don't have much experience with SCOM gateways).

    Your views on this issue would be much appreciated!

    Monday, January 19, 2015 1:37 PM

Answers

  • Update - even after updating the MPs, the errors on the agents remained. It is a bit bizarre because some of the agents will briefly show "green and connected" on the management server and then minutes later it will stop heart beating again with the error:

    The OpsMgr Connector connected to MyScomServer, but the connection was closed immediately after authentication occurred.  The most likely cause of this error is that the agent is not authorized to communicate with the server, or the server has not received configuration.  Check the event log on the server for the presence of 20000 events, indicating that agents which are not approved are attempting to connect.

    I tried uninstalling and reinstalling the agents, manual installations etc. but still no luck. 

    I resolved the issue by configuring the gateway server in the remote site to use certificate authentication to the management server and then reinstalled all the agents and set their management server as the gateway. I'm still monitoring the situation but so far it looks to be sorted.

    Thanks everyone for your valuable input. I will update this post if anything changes over the next few hours.    

    Wednesday, January 21, 2015 2:17 PM

All replies

  • Implementing a SCOM Gateway with certificate based authentication will work, however that shouldn't be needed. Have you checked your pending management view? If you manually installed the agent the default behavior is to go to pending management. Which you can just right click it and select Approved.
    Monday, January 19, 2015 5:17 PM
  • Hi There,

    Can you analyse the event logs in the agent ? Are they pointing to some issue with authentication ?

    If yes then can you post the event contents here please ?


    Gautam.75801

    Monday, January 19, 2015 7:21 PM
  • Hi Senah,

    The agents were deployed successfully via the SCOM server (pushed). The issue occur post installation when the agent tries to authenticate back to the management server. There are no agents pending approval - I did try manual agent installations as well with the same result. The agent installs but never gets to the pending management stage. The errors below:

    The OpsMgr Connector connected to MyScomServer, but the connection was closed immediately after authentication occurred.  The most likely cause of this error is that the agent is not authorized to communicate with the server, or the server has not received configuration.  Check the event log on the server for the presence of 20000 events, indicating that agents which are not approved are attempting to connect.

    Followed a few entries later by:

    OpsMgr was unable to set up a communications channel to MyScomServer and there are no failover hosts.  Communication will resume when MyScomServer is available and communication from this computer is allowed.

    What confuses me is that these notifications also shows up in the logs:

    The Health Service has validated all RunAs accounts for management group production.

    and 

    All credential references resolved successfully. 

    Any idea what would be causing this? My assumption was that the latency caused this based on the two articles in the original post. The agents do show up in the SCOM server but the state never changes - it just stays "not monitored". 

    Thanks you

    Tuesday, January 20, 2015 7:51 AM
  • Hi Gautam,

    The errors below:

    The OpsMgr Connector connected to MyScomServer, but the connection was closed immediately after authentication occurred.  The most likely cause of this error is that the agent is not authorized to communicate with the server, or the server has not received configuration.  Check the event log on the server for the presence of 20000 events, indicating that agents which are not approved are attempting to connect.

    Followed a few entries later by:

    OpsMgr was unable to set up a communications channel to MyScomServer and there are no failover hosts.  Communication will resume when MyScomServer is available and communication from this computer is allowed.

    What confuses me is that these notifications also shows up in the logs:

    The Health Service has validated all RunAs accounts for management group production.

    and 

    All credential references resolved successfully. 

    Thank you


    Tuesday, January 20, 2015 7:52 AM
  • Hi There,

    This seems to be a authentication issue. Can you try the below.

    1.Ping the MS from the agent where this issue is and see if the ping is successful (Both Netbos name, FQDN, IP)

    2. Can you telnet the MS from the agent where the issue is to port 5723 and see if it allows there (Both Netbos name, FQDN, IP) ?

    3. Also what type of trusts do the domains have here ? (Domain where the MS is and the Agent where authentication issue is happening)

    4. Also what is the state of the Agent in agent managed ? Is it Healthy in green or Healthy in Gray or is it in Not monitored state ?

    5. Any events in the MS with respect to this issue in the MS Event logs ?

    Can you check and post the results here.


    Gautam.75801


    Tuesday, January 20, 2015 8:15 AM
  •    This seems to be a authentication issue but it is not. We spent quite a lot of time and fond that Update to the latest Management packs and it should solve your issue. We had a similar situation last week.


    Shahid Roofi

    • Proposed as answer by Maurador1 Tuesday, January 20, 2015 8:54 AM
    Tuesday, January 20, 2015 8:48 AM
  • What is your AD functional level?
    You may consider deploy Read only DC, supported for windows 2008 R2 and Windows 2012 DC, on remote site.
    Roger
    Tuesday, January 20, 2015 8:52 AM
  • Hi Gautam,

    1. From the agents in the remote domain (they are all in the same state) I can successfully ping the netbios name, FQDN and IP. In my opinion, DNS looks 100%.

    2. I can telnet to the management server from the remote site on 5723

    3. The remote site and the site where the management sever reside in, are both child domains of the same root domain. The action accounts all reside in the root domain and has the required access in the child domains. The "main" action account has Enterprise Admin permissions.

    4.  Agent state

    5. Nothing in the event logs (system / app) indicating any other issues.

    Thanks!

    Tuesday, January 20, 2015 9:33 AM
  • Hi Shahid,

    What are the versions of your agents (or the version of the management packs that resolved the issue for you)? Mine are 7.1.10184.0 and the management server is 7.1.10226.0.

    Thanks!
    • Edited by AvdVyver Tuesday, January 20, 2015 9:54 AM
    Tuesday, January 20, 2015 9:54 AM
  • Hi Roger,

    Domain functional level is 2012.

    Thanks!

    Tuesday, January 20, 2015 9:58 AM
  •  for me, this latest 2015 of MP helped: http://www.microsoft.com/en-us/download/details.aspx?id=9296

     Had same error on agent machines


    Shahid Roofi

    Tuesday, January 20, 2015 10:00 AM
  • Thanks Shahid,

    I've imported the updated management packs - now holding thumbs. Will keep you posted.

    Thanks

    Tuesday, January 20, 2015 1:36 PM
  • Update - even after updating the MPs, the errors on the agents remained. It is a bit bizarre because some of the agents will briefly show "green and connected" on the management server and then minutes later it will stop heart beating again with the error:

    The OpsMgr Connector connected to MyScomServer, but the connection was closed immediately after authentication occurred.  The most likely cause of this error is that the agent is not authorized to communicate with the server, or the server has not received configuration.  Check the event log on the server for the presence of 20000 events, indicating that agents which are not approved are attempting to connect.

    I tried uninstalling and reinstalling the agents, manual installations etc. but still no luck. 

    I resolved the issue by configuring the gateway server in the remote site to use certificate authentication to the management server and then reinstalled all the agents and set their management server as the gateway. I'm still monitoring the situation but so far it looks to be sorted.

    Thanks everyone for your valuable input. I will update this post if anything changes over the next few hours.    

    Wednesday, January 21, 2015 2:17 PM