locked
Newly Installed Gateway Server cannot connect to the management servers RRS feed

  • Question

  • Hello Guys,

    We have a new SCOM 2019 GW installation. We have the right certificate issued, installed and imported (Server, Client authentication) from the environment where the management servers are located. We have verified the registry values for HKEY_LOCAL_MACHINESOFTWAREMicrosoftMicrosoftOperationsManager3.0MachineSettingsChannelCertificateSerialNumber, which are also correct. The communications on port 5723 works as expected. The cache for C:\Program Files\System Center Operations Manager\Gateway\Health Service State\Connector Configuration Cache has been cleared. Although when we restart the health service the following error end events are appearing in the event log:


    The server also does not appear as healthy on the management servers. Please suggest what we can try in addition to resolve this issue.

    Thursday, August 6, 2020 3:05 PM

Answers

  • Hi All,

    as Deyan is a colleague of mine and was owner of this particular task I have encouraged him to write here and ask for help. Being back from vacation I was able to take over and today after some troubleshooting I managed to find the cause for the issue and resolve it. All the settings and tools have been used properly, the script helped confirming that there is no issues with the certificates. Then, after verifying all of the stuff I stumbled on an article that suggests to check the TLS communication between both severs. 

    The management server had all the registry entries, which allow the TLS 1.2 communication (Server and Client), but the Gateway on the other side had no entries under the respective key. So, I exported the hole TLS 1.2 key:

    Registry location: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.2

    and imported it back on the Gateway. After restarting some of the services the communication has been established. 
    I would like to thank you for the supports and the efforts!

    Cheers,

    Stoyan



    (Please take a moment to "Vote as Helpful" and/or "Mark as Answer" where applicable. This helps the community, keeps the forums tidy, and recognizes useful contributions. Thanks!) Blog: https://blog.pohn.ch/ Twitter: @StoyanChalakov



    Tuesday, August 11, 2020 6:52 PM

All replies

  • Hi,

    It would be better if we saw the error descriptions of your errors in your Operations Manager event log, did you run the Microsoft.EnterpriseManagement.GatewayApprovalTool.exe tool?

    Go through the official step-by-step guide for Gateway server installationhere:

    Install a gateway server
    https://docs.microsoft.com/en-us/system-center/scom/deploy-install-gateway-server?view=sc-om-2019

    Here's also a step-by-step guide:
    Step by Step Gateway Server Installation - SCOM 2016
    (Same steps for SCOM 2019)

    Best regards,
    Leon


    Blog: https://thesystemcenterblog.com LinkedIn:


    • Edited by Leon Laude Thursday, August 6, 2020 3:18 PM
    Thursday, August 6, 2020 3:18 PM
  • Hi Leon,

    the deployment steps from the guide have been followed. We ran also successfully the Microsoft.EnterpriseManagement.GatewayApprovalTool.exe.  The errors above are from the Operations manager event log on the gateway server.

    Information 21023 - OpsMgr has no configuration for management group GROUPNAME and is requesting new configuration from the Configuration Service.

    Error 20071 - The OpsMgr Connector connected to Servername, but the connection was closed immediately without authentication taking place.  The most likely cause of this error is a failure to authenticate either this agent or the server .  Check the event log on the server and on the agent for events which indicate a failure to authenticate.

    Error 21001 - The OpsMgr Connector could not connect to SPN/Servername because mutual authentication failed.  Verify the SPN is properly registered on the server and that, if the server is in a separate domain, there is a full-trust relationship between the two domains.

    Error 20057 - Failed to initialize security context for target SPN/Servername The error returned is 0x80090303(The specified target is unknown or unreachable).  This error can apply to either the Kerberos or the SChannel package.

    BR,

    Deyan

    Thursday, August 6, 2020 3:36 PM
  • Make sure you have configured the SPNs first of all, you can follow along here:
    https://kevinholman.com/2011/08/08/opsmgr-2012-what-should-the-spns-look-like

    Also the following blog post goes through most of your events that you're receiving:
    https://blog.ctglobalservices.com/operations-manager-scom/msk/common-issues-when-working-with-certificates-in-opsmgr


    Blog: https://thesystemcenterblog.com LinkedIn:

    Thursday, August 6, 2020 3:47 PM
  • SPN doesn't matter at all since you're using certificates.

    However, it likely means the certificates are wrong somehow.

    You can use that script to check if they are properly made : https://gallery.technet.microsoft.com/scriptcenter/Troubleshooting-OpsMgr-27be19d3
    • Edited by CyrAz Thursday, August 6, 2020 5:23 PM
    Thursday, August 6, 2020 5:22 PM
  • Over 95% of the issues with Gateway servers are the certificates :-) Double check that they are indeed correct, also verify that the host names in the certificates use the Full Computer Name.

    Here's yet another a great blog post that describes everything in detail:
    https://docs.microsoft.com/en-us/archive/blogs/stefan_stranger/monitoring-non-domain-members-with-om-2012


    Blog: https://thesystemcenterblog.com LinkedIn:

    Thursday, August 6, 2020 5:41 PM
  • Hi,


    For the error on Gateway, most issue are cause by certificate. We can check if the certificate is proper.

    Meanwhile, check if the the FQDN name is supplied for the Networkname and AuthenticationName.
    https://michelkamp.wordpress.com/2012/01/05/solving-the-gateway-20071-event/
    Note: Non-Microsoft link, just for the reference.

    Hope it can help.

    Tips: This SCOM Forum will be migrating to a new home on Microsoft Q&A, please refer to this sticky post for more details.


    Best regards.
    Crystal


    "SCOM" forum will be migrating to a new home on   Microsoft Q&A!
      We invite you to post new questions in the "SCOM" forum's new home on   Microsoft Q&A!
      For more information, please refer to the sticky post.


    Friday, August 7, 2020 6:01 AM
  • Hi, 

    thank you all for the responses. I have checked the certificate and it seems to be fine:

    Checking that there are certs in the Local Machine Personal store...
    Verifying each cert...

    Examining cert - Serial number 1..............
    ---------------------------------------------------
    Cert subjectname
    Private key
    Expiration
    Enhanced Key Usage Extension
    Key Usage Extensions
    KeySpec
    Serial number written to registry
    Certification chain
    There is a valid certification chain installed for this cert,
    but the remote machines' certificates could potentially be issued from
    different CAs.  Make sure the proper CA certificates are installed
    for these CAs.

    ***This certificate is properly configured and imported for Ops Manager use.***

    The remote machine certificates are issued by the same CAs.

    I am waiting for the customers AD team to check / create the SPNs


    Friday, August 7, 2020 7:21 AM
  • Can you verify on which servers you have the certificates? And did you import them using the MOMCertImport.exe tool?

    You mentioned that the communication works on port 5723, how did you test it and from which server to where?


    Blog: https://thesystemcenterblog.com LinkedIn:

    Friday, August 7, 2020 7:30 AM
  • Hi Leon,

    I have made the standard test with telnet in both directions - GW to mgmt servers and back. It works. MOMCertImport.exe should be used only when importing the certificate for the gateway server, or I am wrong? On the management servers the certificates were imported only in the certification store.

    Friday, August 7, 2020 1:30 PM
  • You need to use the MOMCertImport.exe tool on each gateway server, management server, and computer that will be agent-managed and that is in a domain that is not trusted.

    Blog: https://thesystemcenterblog.com LinkedIn:

    Friday, August 7, 2020 1:33 PM
  • Do I understand this correctly - This means that the certificate should be imported with MOMCertImport.exe   on all servers if I do not have an AD trust in order to get the communication flowing? In all of the installation guides the authors are pointing only to the tenants gateway  
    Friday, August 7, 2020 2:05 PM
  • Each SCOM Management Server communicating with a gateway or off-domain agents must have its own certificate, imported with momcertimport.

    Each Gateway must have its own certificate, imported with momcertimport.

    Each Gateway must trust the certification authority that created the MS certificate, and each MS must trust the certificate authority that created the Gateway certifciate (they can be different).


    • Edited by CyrAz Friday, August 7, 2020 2:41 PM
    Friday, August 7, 2020 2:14 PM
  • Do I understand this correctly - This means that the certificate should be imported with MOMCertImport.exe   on all servers if I do not have an AD trust in order to get the communication flowing? In all of the installation guides the authors are pointing only to the tenants gateway  

    "In all of the installation guides the authors are pointing only to the tenants gateway"


    Blog: https://thesystemcenterblog.com LinkedIn:

    Friday, August 7, 2020 2:35 PM
  • Hi Leon,

    ok this was a lame statement on my side. Although after the installation on all servers, the communication is still not working. All has been checked, the certificates of course as well.

    Friday, August 7, 2020 4:19 PM
  • Can you clear the Health Service cache once again, and then report which events (errors) in the Operations Manager event log you're receiving, and in which exact order?

    Also give us the error descriptions of each error event, thanks.


    Blog: https://thesystemcenterblog.com LinkedIn:

    Monday, August 10, 2020 6:56 AM
  • Hi,

    Also suggest to restart the Gateway server after we clearing the Health service cache.

    Best regards.

    Crystal


    "SCOM" forum will be migrating to a new home on   Microsoft Q&A!
      We invite you to post new questions in the "SCOM" forum's new home on   Microsoft Q&A!
      For more information, please refer to the sticky post.

    Monday, August 10, 2020 8:30 AM
  • Hi All,

    as Deyan is a colleague of mine and was owner of this particular task I have encouraged him to write here and ask for help. Being back from vacation I was able to take over and today after some troubleshooting I managed to find the cause for the issue and resolve it. All the settings and tools have been used properly, the script helped confirming that there is no issues with the certificates. Then, after verifying all of the stuff I stumbled on an article that suggests to check the TLS communication between both severs. 

    The management server had all the registry entries, which allow the TLS 1.2 communication (Server and Client), but the Gateway on the other side had no entries under the respective key. So, I exported the hole TLS 1.2 key:

    Registry location: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.2

    and imported it back on the Gateway. After restarting some of the services the communication has been established. 
    I would like to thank you for the supports and the efforts!

    Cheers,

    Stoyan



    (Please take a moment to "Vote as Helpful" and/or "Mark as Answer" where applicable. This helps the community, keeps the forums tidy, and recognizes useful contributions. Thanks!) Blog: https://blog.pohn.ch/ Twitter: @StoyanChalakov



    Tuesday, August 11, 2020 6:52 PM