none
Exchange 2010 AD Topology Discovery Failed

    Question

  • Approximately every 3 days, Exchange loses contact with all the domain controllers and fails. The only way to resolve the issue is to restart the server. Once restarted, it functions perfectly with clean event logs right up until the next failure. I've been working on this for around 4 weeks now, since the problem began, but I am unable to find the root cause. I am also unable to tie the start of these problems with any particular change to the configuration of the network.

    We have 2 x 2008R2 domain controllers and a domain running at 2008R2 functional level. Exchange 2010 is also installed on Server 2008R2, and all three of these servers are virtualised on VMWare ESXi 4.1.

    -----

    The last good application event log entry looks like this:

    Event 2080, MSExchange ADAccess

    Process MSEXCHANGEADTOPOLOGYSERVICE.EXE (PID=1992). Exchange Active Directory Provider has discovered the following servers with the following characteristics:
    (Server name | Roles | Enabled | Reachability | Synchronized | GC capable | PDC | SACL right | Critical Data | Netlogon | OS Version)
    In-site:
    DC01.domain.com CDG 1 7 7 1 0 1 1 7 1
    DC02.domain.com CDG 1 7 7 1 0 1 1 7 1

    Then an error:

    Event 1009, MSExchangeMailSubmission

    The Microsoft Exchange Mail Submission service is currently unable to contact any Hub Transport servers in the local Active Directory site. The servers may be too busy to accept new connections at this time.

    Another error:

    Event 6003, MSExchange SACL Watcher

    SACL Watcher servicelet encountered an error while monitoring SACL change.
    Got error 1722 opening group policy on system DC01.domain.com in domain domain.com

    A warning:

    Event 1007, MSExchange Mailbox Replication

    The Mailbox Replication service was unable to determine the set of active mailbox databases on a mailbox server.
    Mailbox server: EXCHANGE.domain.com
    Error: MapiExceptionNetworkError: Unable to make admin interface connection to server. (hr=0x80040115, ec=-2147221227)

    An informational event:

    Event 2070, MSExchange ADAccess

    Process STORE.EXE (PID=5012).  Exchange Active Directory Provider lost contact with domain controller .  Error was 0x80040951 (LDAP_SERVER_DOWN (Cannot contact the LDAP server)) ().  Exchange Active Directory Provider will attempt to reconnect with this domain controller when it is reachable.

    An error:

    Event 2104, MSExchange ADAccess

    Process STORE.EXE (PID=5012). Topology discovery failed due to LDAP_SERVER_DOWN error. This event can occur if one or more domain controllers in local or all domains become unreachable because of network problems. Use the Ping or PathPing command line tools to test network connectivity to local domain controllers. Run the Dcdiag command line tool to test domain controller health.

    A warning:

    Event 2121, MSExchange ADAccess

    Process STORE.EXE (PID=5012). Exchange Active Directory Provider is unable to connect to any domain controller in domain domain.com although DNS was successfully queried for the service location (SRV) resource record used to locate a domain controller for that domain.
    The query was for the SRV record for _ldap._tcp.dc._msdcs.domain.com
    The following domain controllers were identified by the query:
    dc02.domain.com
    dc01.domain.com

    Meanwhile, the system log shows:

    Event 5719, NETLOGON

    This computer was not able to set up a secure session with a domain controller in domain DOMAIN due to the following:
    The RPC server is unavailable.
    This may lead to authentication problems. Make sure that this computer is connected to the network. If the problem persists, please contact your domain administrator.

    -----

    From this point onwards the event log just fills up with errors and warnings and all the Outlook clients get kicked off. From the Exchange server, via RDP, I can ping both domain controllers no problem.

    One other error I am seeing is an SMB error when I try to browse the network from the Exchange server (while it has failed). I get a message: "The name limit for the local computer network adapter card was exceeded". The documentation I have found for this is quite old and the suggested registry key changes for TcpTimedWaitDelay and MaxUserPort are already set as recommended.

    Once the server has been restarted, I have little avenue for further investigation as the event logs run clean and everything seems fine. Even when the server fails, everything else on the network functions perfectly and there are no errors in the domain controllers' event logs.

    I've been down numerous avenues here, but I'm running out of ideas and I would really really appreciate some help with this problem.

    Many thanks in advance,
    Steve...
    Tuesday, September 18, 2012 12:27 AM

All replies

  • I've experienced similar. Never did find a root cause, but I did discover running netstat on the Exchange server showed connections to the DC hung in a close_wait state.  IIRC, if it was on a virtual server toggling the nic off and on and restarting the AD discovery service made it work again.

    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Tuesday, September 18, 2012 12:37 AM
  • Many thanks for your suggestion. I have followed up on CLOSE_WAIT connections, but I only see a bunch of port 443 CLOSE_WAITs to client computers, not to the DCs, and that only occurs after Exchange fails.

    Tuesday, September 18, 2012 12:55 AM
  • May not be the same problem then. 

    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Tuesday, September 18, 2012 12:58 AM
  • Make sure your virtual NICs' are up to date.

    ExchangeGeek (MCITP,Enterprise Messaging Administrator)

    ***Don't forget to mark helpful or answer***

    Tuesday, September 18, 2012 9:40 AM
  • Hi ,

    Please check if the Netlogon service stop suddenly.

    And please run “ping –t” and verify if there is packet is lost.

    Please test connectivity between DCs and Exchange server use Exchange Best Practices Analyzer.


    Wendy Liu

    TechNet Community Support

    Tuesday, September 18, 2012 3:44 PM
    Moderator
  • Make sure your virtual NICs' are up to date.

    ExchangeGeek (MCITP,Enterprise Messaging Administrator)

    ***Don't forget to mark helpful or answer***

    Many thanks for your suggestion. We are currently running VMWare ESXi 4.1 Update 1, but I can see that Dell has just released (two days ago!) VMWare ESXi 4.1 Update 3, customised with the latest drivers for our servers.

    I will download these and schedule some downtime to update the hosts as soon as possible. I will post back here after updating the hosts, but in the meantime I'd like to keep exploring other possibilities.

    • Edited by DCMIT Wednesday, September 19, 2012 1:31 AM Edit
    Wednesday, September 19, 2012 1:29 AM
  • Hi ,

    Please check if the Netlogon service stop suddenly.

    And please run “ping –t” and verify if there is packet is lost.

    Please test connectivity between DCs and Exchange server use Exchange Best Practices Analyzer.


    Wendy Liu

    TechNet Community Support

    There are no problems pinging between servers, even during a failure. Also, the best practice analyser is clean on all DCs and Exchange.

    I will check for Netlogon service stopping at the next failure and post back here. Many thanks for your suggestions.

    Wednesday, September 19, 2012 1:34 AM
  • Any update?

    ExchangeGeek (MCITP,Enterprise Messaging Administrator)

    ***Don't forget to mark helpful or answer***

    Tuesday, September 25, 2012 1:47 PM
  • Hi, it's hard to say. I've been working on AD and DNS, making sure it is all working really smoothly. DCDIAG was throwing some errors which led me to upgrade SYSVOL replication from FRS to DFS-R, and since then the error hasn't re-occurred.

    http://blogs.technet.com/b/filecab/archive/2008/02/08/sysvol-migration-series-part-1-introduction-to-the-sysvol-migration-process.aspx


    Wednesday, September 26, 2012 12:32 PM
  • Hello Steve,

    maybe this could help.

    I have found it because the same error as described has happened at a Customer Site, in the same Constellation.

    http://www.petenetlive.com/KB/Article/0000664.htm

    Greets

    Wednesday, September 4, 2013 3:34 PM
  • Have you tried Active Directory Sites and Services and checking to see if all of your DC's are in the appropriate site?  That's what happened to me.  Below is what I tried.. hope it helps someone.

    http://www.mountainvistatech.com/2013/10/08/exchange-2010-topology-discovery-failed-error-0x8007077f-v-issue/

    Monday, April 7, 2014 1:39 AM
  • Were you able to figure out this issue? 

    About 22 days ago I experienced same problem and the only solution was to restart the server. Yesterday the problem was repeated. Any suggestions would be greatly appreciated.

    Monday, May 25, 2015 11:18 PM
  • Hi, appreciated if you update me on the status of your problem? we are facing the same since 4 days . appreciated if you can send me more details

    thank you

    Wednesday, July 18, 2018 12:43 PM
  • hello, do you still remember how you fixed this problem? thank you
    Wednesday, July 18, 2018 1:07 PM
  • Hello Tech,

    This issue generated cause the

    Below mentioned KB are installed on the Exchange server

    KB4338830, KB4338818

    Uninstalled the Same. Issue will get fixed.

    If you don't want to uninstalled then refer this table and install correcting kb for the same.

    Operating System Impacted Update Update which must be applied
    Windows Server 2016 KB 4338814 KB 4345418
    Windows Server 2012R2 KB 4338824 KB 4345424
    KB 4338815 KB 4338831
    Windows Server 2012 KB 4338820 KB 4345425
    KB 4338830 KB 4338816
    Windows Server 2008R2 SP1 KB 4338823 KB 4345459
    KB 4338818 KB 4338821
    Windows Server 2008 KB 4295656 KB 4345397
     

    6 hours 56 minutes ago