Note: Forums will be making significant UX changes to address key usability improvements surrounding search, discoverability and navigation. To learn more about these changes please visit the announcement which can be found HERE.
TLS authentication on our Lync Front End servers

Proposed TLS authentication on our Lync Front End servers

  • 2012年5月1日 15:24
     
     

    I am continuously receiving errors in SCOM related to TLS authentication on our Lync Front End servers.

    Over the past 19 minutes, Lync Server has experienced TLS outgoing connection failures 3 time(s). The error code of the last failure is 0x80090322 (The target principal name is incorrect.) while trying to connect to the server hostname2.domain.local at address 192.168.1.1, and the display name in the peer certificate is "unavailable"

    Cause: Most often a problem with the peer certificate or perhaps the DNS A record used to reach the peer server. Target principal name is incorrect means that the peer certificate does not contain the name that the local server used to connect. Certificate root not trusted error means that the peer certificate was issued by a remote CA that is not trusted by the local machine.
    Resolution:
    Check that the address and port matches the FQDN used to connect, and that the peer certificate contains this FQDN somewhere in its subject or SAN fields. If the FQDN refers to a DNS load balanced pool then check that all addresses returned by DNS refer to a server in the same pool.

    Whats strange in this example is that the server generating the alert is hostname1 and the target is hostname2, but the IP address it seems to be trying to connect to for hostname2 in the alert is actually the IP address of hostname1 (itself).  This is not always the case though
    The other strange thing is 'the display name in the peer certificate is "unavailable"'
    The certificates are all configured correctly and there are no noticable issues with the platform.

    I have checked DNS A records, reverse lookups, hosts file.... All are correct.
    These errors popup from time to time and then self close.

    Anyone able to help with this?

    Regards
    J


    Best Regards J

全部回复

  • 2012年5月2日 5:26
     
     

    What is Server 1 and Server 2

    Maybe you have dual homed Servers?


    - Belgian Unified Communications Community : http://www.pro-lync.be -

  • 2012年5月3日 4:21
    版主
     
     

    It may due to the certificate missing SAN entry.

    You can try to create a new certificate on the FE server and restart the FE service, then check again.


    Noya Lau

    TechNet Community Support

  • 2012年5月3日 5:54
     
     

    Server 1 and server2 are bothe FE servers. This alert appears for all our FE servers from time to time.
    The FE servers are not dual homed and the SANs contain the required names.
    However, if the server tries to connect to itself, which appears to be the case in the above alert unless i am reading it wrong, using the name of another FE server, then it would not be on the SAN. Or does the "at address ##.##.##.##" simply refere to the location the event was generated

    It is worth noting that i do not see a 'Display Name' filed anywhere in the Certificate but i do see a 'Friendly Name' field. Could this be the cause of the alert?

    Best Regards J



    • 已编辑 JG- 2012年5月3日 6:19
    •  
  • 2012年5月3日 7:10
     
     

    Are you talking about an Enterprise pool and these servers are Front-Ends of this pool?

    Does your certificate in all FE Servers have all Names in the SANs?

    Poolname and FQDN of the local server and possibly also name of the pool for all sip domains?

    Are you using a Hardware Load Balancer?


    - Belgian Unified Communications Community : http://www.pro-lync.be -

  • 2012年5月3日 7:26
     
     

    Correct, Enterprise pool and these are FE servers. The certificat SAN on each FE server contains Pool name, sip domains etc... and the hostname of the server the cert is inatalled on.
    Everything is working 100% fine and has done for a long time.
    These messages do not appear to be of any useful purpose at the moment but need to understand why we are receiving them.
    Yes we are using a HLB but this appears to be related to direct communication between the FE servers but that all looks to be working fine and the alerts do not represent any known issues on the platform.

    Reg

    J


    Best Regards J


    • 已编辑 JG- 2012年5月3日 7:28
    •  
  • 2012年5月5日 4:16
     
     

    well you might try to run ocslogger for a longer period of time to see what the SIP Logs are saying about that and use snooper to view the logs

    The logfile may get large depending on the number of users and the usage of the pool


    - Belgian Unified Communications Community : http://www.pro-lync.be -

  • 2012年5月8日 14:48
     
     

    I have run some logging and used Snooper to try and identify the cause of these alerts but i have failed to turn up any clues.
    All I can see around the time of the alert is user lookups.

    Looking through the event logs it would appear that the above alert is related to the alert listed here.

    ‘The certificate received from the remote server does not contain the expected name. It is therefore not possible to determine whether we are connecting to the correct server. The server name we were expecting is server1.domain.net. The SSL connection request has failed. The attached data contains the server certificate.’

    The certificates on each Front end server are issued to the FE Enterprise pool the server is part of. The SAN also contains the pool name along with the name of the server, the SIP entry for each SIP domain and the name required for meet and dialin etc.

    Still no issues with the platform though.


    Best Regards J


    • 已编辑 JG- 2012年5月8日 14:48
    •  
  • 2012年5月9日 2:46
     
     
    What does the attached data say in the eventlog entry?

    - Belgian Unified Communications Community : http://www.pro-lync.be -

  • 2012年5月9日 13:05
     
     

    The only other information available for the alert is as below:

    - System 
      - Provider 
       [ Name]  Schannel
       [ Guid]  {1F673192-5838-5676-8EBC-C6FF67E26B85} 
     
       EventID 36884 
        Version 0 
        Level 2  
       Task 0 
       Opcode 0 
       Keywords 0x8000000000000000 
      - TimeCreated

       [ SystemTime]  2012-05-09T02:42:01.036406300Z 
       EventRecordID 63725  
       Correlation 
      - Execution

       [ ProcessID]  592
       [ ThreadID]  8316 
        Channel System 
        Computer FEhost1.domain.net  
      - Security 

       [ UserID]  S-1-5-20
    - UserData
    - EventXML
       Name  FEhost2.domain.net

    Host1 been the host the event was generated on and host 2 been the target for the outgoing connection


    Best Regards J


    • 已编辑 JG- 2012年5月9日 13:05
    •  
  • 2012年5月15日 13:07
     
     
    We're seeing the exact same problem after applying CU5 updates. We have two FE servers in an Enterprise Pool using DNS Load Balancing for SIP traffic and HLB for Web traffic. Each FE is reporting TLS errors (Event 14428) while attempting to connect to the alternate FE. The specific error is that the target principal name is incorrect. The IP address and FQDN in the details are mis-matched . For example, FE1 at IP of 10.20.5.1 reports that it cannot connect to FE2 at IP 10.20.5.1; the IP of FE2 is 10.20.5.2.
  • 2012年5月16日 8:49
     
     
    It would appear that we also started to see these errors following the deployment of CU5 updates.

    Best Regards J

  • 2012年5月16日 9:10
     
     

    Hi ,

    Would it possible you to request a fresh certificate for pool and assign to each FE server ?

    I can understand all services are working, however , it's good idea to check with a fresh certificate.

    Note : As you know, this require service restart once new certificate is assigned.

    Thanks

    Salesh


    If answer is helpful, please hit the green arrow on the left, or mark as answer.

  • 2012年5月16日 22:00
     
     

    We are also getting this same error.  We noticed it via SCOM and it seems that the IP address and hostname of our LYNC02 servers are mismatched.  For example LYNC01 is 10.10.10.1 and LYNC02 is 10.10.10.2.  We are using DNS round robin.  The SAN cert on each server has the names of all lync services and the names of the hosts.

    In my estimation, the error is being generated because LYNC01 is attempting to TLS connect to LYNC02, however is trying to connect to IP 10.10.10.1 NOT 10.10.10.2. If it's trying to connect to 10.10.10.1 of course it will receive a certificate error because it would be checking it's own certificate at that point and it would be getting LYNC01 in the name and not LYNC02.

    Error:

    Over the past 26 minutes, Lync Server has experienced TLS outgoing connection failures 2 time(s). The error code of the last failure is 0x80090322 (The target principal name is incorrect.) while trying to connect to the server "LYNC02.LYNCTEST.LOCAL" at address [10.10.10.1:5061], and the display name in the peer certificate is "Unavailable".

    I will try with a "fresh" certificate, but I do not see how that could resolve it, and if it does then is makes no sense.


  • 2012年5月17日 9:50
     
     
    Same problem here. Just installed CU5 last night. Also using DNS Load Balancing.

    Ed

  • 2012年5月17日 12:44
     
     
    I reissued a new certificate for FE Pool with all FE server names defined in the SAN list and assigned it to all FE servers and restarted services. Since then the 14428 errors have stopped. I'm not proposing this as a fix but it is certainly masking the error.
  • 2012年5月17日 12:48
     
     

    Thanks Carolyn for the information.

    Hope it would be helpful for others.

    Thanks
    Saleesh


    If answer is helpful, please hit the green arrow on the left, or mark as answer.

  • 2012年5月17日 14:15
     
     
    Yes that would for sure work Carolyn because the while the FE server thinks it's trying to lookup the cert on the other FE Server, it is in reality looking at itself.  But as you said, it is onyl masking the error, not resolving the underlying issue.  Also, a new SAN cert would need to be issued whenever a server is added to the topology.  Not a big deal for orgs that run a CA, but can be a hassle for orgs that use public certs.


  • 2012年5月21日 12:25
     
     

    Agree with above. This does not address the underlying issue and is something i would prefer not to do.


    Best Regards J

  • 2012年5月22日 18:44
     
     

    Hello,

    I’ve encountered the same TLS issue on a Lync infrastructure of my customer after an upgrade of the Lync CU5.
    I've generated and assigned new certificates on the Front-End servers and the TLS error was solved.

    Regards,

    Nicolas Picard

  • 2012年5月23日 11:12
     
     

    Hi Nicolas,
    Can you confirm you did this without adding the hostnames of All front end server to the SAN please?

    If this is the case then maybe simply re-assigning the existing certs may resovle the issue too. Did you try that by any chance?


    Best Regards J

  • 2012年5月23日 12:38
     
     

    Hello J,

    First, I've tried to reassign the orignal certificate. Not working.

    I don't tried to unassigned and then reasign to original certificate.

    As the objective for my customer is to implement the Lync Mobility service, I've requested a new certificate with 2 new SAN (lyncdiscover and lyncdiscoverinternal).

    To reply to your question, I've added 2 new SAN in my new certificate, assign them to the front end servers and the TLS error disapears in lync FE event log and from SCOM.

    Regards,

    Nicolas

  • 2012年5月23日 13:26
     
     
    Yes but did you include all host names in the new SAN as well? For example LyncFEServer1 and LyncFESrver2?
  • 2012年5月23日 14:51
     
     

    Hello,

    I generated a new certificate with exactly the same subject name & SAN names that are used in the previous certificate + the SAN fqdn for Lync Mobility Service.

    I had 4 front-end and the policy at my customer don't allow to use the same certificate on all FE.

    It means that on the SAN name, I only have the server name  for which the certificate was generate and not the servers names of my 4 Front-End Servers (in my case).

    Server A:

    Common name: fqdn of the pool

    SAN

    -fqdn of the pool

    -sip.domain.com

    -serverA.domain.com

    -simple url's

    -lyndisover.domain.com

    -lyncdiscoverinternal.domain.com

    Server B:

    Common name: fqdn of the pool

    SAN

    -fqdn of the pool

    -sip.domain.com

    -serverB.domain.com

    -simple url's

    -lyndisover.domain.com

    -lyncdiscoverinternal.domain.com

    Server C:

    Common name: fqdn of the pool

    SAN

    -fqdn of the pool

    -sip.domain.com

    -serverC.domain.com

    -simple url's

    -lyndisover.domain.com

    -lyncdiscoverinternal.domain.com

    Server D:

    Common name: fqdn of the pool

    SAN

    -fqdn of the pool

    -sip.domain.com

    -serverD.domain.com

    -simple url's

    -lyndisover.domain.com

    -lyncdiscoverinternal.domain.com

    Regards,

    Nicolas Picard

  • 2012年5月23日 15:46
     
     

    Thanks for confirmation.
    As it happens we will also be updating the certs for mobility capability in the next couple of weeks or so.

    I will report back on the status of these alerts following completion.

    Thanks


    Best Regards J

  • 2012年5月24日 17:35
     
     建议的答复
    I had a ticket open with MS Support. it's a known issue and will be patched in a future update. There is no known negative service impact
    • 已编辑 CCsysad 2012年5月24日 17:36
    • 已建议为答案 JedE 2012年7月20日 10:49
    •  
  • 2012年6月11日 17:31
     
     

    Hi CCsysad,

    Did you get any update from MS PSS.Please update us.Along with this issue, I have another issue with the certificates on the Edge Server. 

    After the Cu5 upgrade,The access edge service on the second Edge serer is not starting while the other services(conf-Edge and a/v-Edge) are started with the same public certificate.

    It throws the below error:

    1.  A configured certificate could not be loaded from store. The serial number is attached for reference. Extended error code: 0x800B0109(A certificate chain processed, but terminated in a root certificate which is not trusted by the trust provider)

    2. Unable to use the certificate configured for the external edge of the access edge server. Error 0x800B0109(A certificate chain processed but terminated in a root certificate which is not trusted by the trust provider).

    Cause:The certificate may have been deleted or may be invalid or permission are not set correctly.

    Resolution: Ensure that a valid certificate is present in the local computer certificate store. Also ensure that the server has sufficient privileges to access the store.

    FYI: The First Edge server in the environment is working fine, the services came up nicely after CU5 upgrade and reboot.

  • 2012年6月12日 14:54
     
     
    No the ticket was closed.  There's nothing to update.  It's a known issue and will be patched in the future. There are no known negative impacts.
  • 2012年7月20日 10:51
     
     建议的答复

    The fix is planned for the September CU7 release. MS have also confirmed to me there is no need to reissue the certificates to fix this issue, it does not have any impact.

    Jed


    Jed Please take a second to hit the green arrow on the left if the post was helpful, or mark it as an answer if it resolved your issue.

    • 已建议为答案 Jason Diaz 2012年11月13日 21:36
    •  
  • 2013年1月7日 15:47
     
     
    I have this exact same problem, and CU7 did not fix it, even though it is listed as a known problem that it will fix.
  • 2013年5月16日 13:34
     
     

    We finally got around to apply CU7 (just in time to see CU8 released) and are seeing these messages too.  Has anyone determined what the cause is or how to get rid of them?

    I would consider re-applying certificates but technically there should be nothing wrong with them since they have not changed.