none
Exchange 2010 AD Topology Discovery Failed

    Question

  • Approximately every 3 days, Exchange loses contact with all the domain controllers and fails. The only way to resolve the issue is to restart the server. Once restarted, it functions perfectly with clean event logs right up until the next failure. I've been working on this for around 4 weeks now, since the problem began, but I am unable to find the root cause. I am also unable to tie the start of these problems with any particular change to the configuration of the network.

    We have 2 x 2008R2 domain controllers and a domain running at 2008R2 functional level. Exchange 2010 is also installed on Server 2008R2, and all three of these servers are virtualised on VMWare ESXi 4.1.

    -----

    The last good application event log entry looks like this:

    Event 2080, MSExchange ADAccess

    Process MSEXCHANGEADTOPOLOGYSERVICE.EXE (PID=1992). Exchange Active Directory Provider has discovered the following servers with the following characteristics:
    (Server name | Roles | Enabled | Reachability | Synchronized | GC capable | PDC | SACL right | Critical Data | Netlogon | OS Version)
    In-site:
    DC01.domain.com CDG 1 7 7 1 0 1 1 7 1
    DC02.domain.com CDG 1 7 7 1 0 1 1 7 1

    Then an error:

    Event 1009, MSExchangeMailSubmission

    The Microsoft Exchange Mail Submission service is currently unable to contact any Hub Transport servers in the local Active Directory site. The servers may be too busy to accept new connections at this time.

    Another error:

    Event 6003, MSExchange SACL Watcher

    SACL Watcher servicelet encountered an error while monitoring SACL change.
    Got error 1722 opening group policy on system DC01.domain.com in domain domain.com

    A warning:

    Event 1007, MSExchange Mailbox Replication

    The Mailbox Replication service was unable to determine the set of active mailbox databases on a mailbox server.
    Mailbox server: EXCHANGE.domain.com
    Error: MapiExceptionNetworkError: Unable to make admin interface connection to server. (hr=0x80040115, ec=-2147221227)

    An informational event:

    Event 2070, MSExchange ADAccess

    Process STORE.EXE (PID=5012).  Exchange Active Directory Provider lost contact with domain controller .  Error was 0x80040951 (LDAP_SERVER_DOWN (Cannot contact the LDAP server)) ().  Exchange Active Directory Provider will attempt to reconnect with this domain controller when it is reachable.

    An error:

    Event 2104, MSExchange ADAccess

    Process STORE.EXE (PID=5012). Topology discovery failed due to LDAP_SERVER_DOWN error. This event can occur if one or more domain controllers in local or all domains become unreachable because of network problems. Use the Ping or PathPing command line tools to test network connectivity to local domain controllers. Run the Dcdiag command line tool to test domain controller health.

    A warning:

    Event 2121, MSExchange ADAccess

    Process STORE.EXE (PID=5012). Exchange Active Directory Provider is unable to connect to any domain controller in domain domain.com although DNS was successfully queried for the service location (SRV) resource record used to locate a domain controller for that domain.
    The query was for the SRV record for _ldap._tcp.dc._msdcs.domain.com
    The following domain controllers were identified by the query:
    dc02.domain.com
    dc01.domain.com

    Meanwhile, the system log shows:

    Event 5719, NETLOGON

    This computer was not able to set up a secure session with a domain controller in domain DOMAIN due to the following:
    The RPC server is unavailable.
    This may lead to authentication problems. Make sure that this computer is connected to the network. If the problem persists, please contact your domain administrator.

    -----

    From this point onwards the event log just fills up with errors and warnings and all the Outlook clients get kicked off. From the Exchange server, via RDP, I can ping both domain controllers no problem.

    One other error I am seeing is an SMB error when I try to browse the network from the Exchange server (while it has failed). I get a message: "The name limit for the local computer network adapter card was exceeded". The documentation I have found for this is quite old and the suggested registry key changes for TcpTimedWaitDelay and MaxUserPort are already set as recommended.

    Once the server has been restarted, I have little avenue for further investigation as the event logs run clean and everything seems fine. Even when the server fails, everything else on the network functions perfectly and there are no errors in the domain controllers' event logs.

    I've been down numerous avenues here, but I'm running out of ideas and I would really really appreciate some help with this problem.

    Many thanks in advance,
    Steve...
    Tuesday, September 18, 2012 12:27 AM

All replies

  • I've experienced similar. Never did find a root cause, but I did discover running netstat on the Exchange server showed connections to the DC hung in a close_wait state.  IIRC, if it was on a virtual server toggling the nic off and on and restarting the AD discovery service made it work again.

    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Tuesday, September 18, 2012 12:37 AM
  • Many thanks for your suggestion. I have followed up on CLOSE_WAIT connections, but I only see a bunch of port 443 CLOSE_WAITs to client computers, not to the DCs, and that only occurs after Exchange fails.

    Tuesday, September 18, 2012 12:55 AM
  • May not be the same problem then. 

    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Tuesday, September 18, 2012 12:58 AM
  • Make sure your virtual NICs' are up to date.

    ExchangeGeek (MCITP,Enterprise Messaging Administrator)

    ***Don't forget to mark helpful or answer***

    Tuesday, September 18, 2012 9:40 AM
  • Hi ,

    Please check if the Netlogon service stop suddenly.

    And please run “ping –t” and verify if there is packet is lost.

    Please test connectivity between DCs and Exchange server use Exchange Best Practices Analyzer.


    Wendy Liu

    TechNet Community Support

    Tuesday, September 18, 2012 3:44 PM
    Moderator
  • Make sure your virtual NICs' are up to date.

    ExchangeGeek (MCITP,Enterprise Messaging Administrator)

    ***Don't forget to mark helpful or answer***

    Many thanks for your suggestion. We are currently running VMWare ESXi 4.1 Update 1, but I can see that Dell has just released (two days ago!) VMWare ESXi 4.1 Update 3, customised with the latest drivers for our servers.

    I will download these and schedule some downtime to update the hosts as soon as possible. I will post back here after updating the hosts, but in the meantime I'd like to keep exploring other possibilities.

    • Edited by DCMIT Wednesday, September 19, 2012 1:31 AM Edit
    Wednesday, September 19, 2012 1:29 AM
  • Hi ,

    Please check if the Netlogon service stop suddenly.

    And please run “ping –t” and verify if there is packet is lost.

    Please test connectivity between DCs and Exchange server use Exchange Best Practices Analyzer.


    Wendy Liu

    TechNet Community Support

    There are no problems pinging between servers, even during a failure. Also, the best practice analyser is clean on all DCs and Exchange.

    I will check for Netlogon service stopping at the next failure and post back here. Many thanks for your suggestions.

    Wednesday, September 19, 2012 1:34 AM
  • Any update?

    ExchangeGeek (MCITP,Enterprise Messaging Administrator)

    ***Don't forget to mark helpful or answer***

    Tuesday, September 25, 2012 1:47 PM
  • Hi, it's hard to say. I've been working on AD and DNS, making sure it is all working really smoothly. DCDIAG was throwing some errors which led me to upgrade SYSVOL replication from FRS to DFS-R, and since then the error hasn't re-occurred.

    http://blogs.technet.com/b/filecab/archive/2008/02/08/sysvol-migration-series-part-1-introduction-to-the-sysvol-migration-process.aspx


    Wednesday, September 26, 2012 12:32 PM
  • Hello Steve,

    maybe this could help.

    I have found it because the same error as described has happened at a Customer Site, in the same Constellation.

    http://www.petenetlive.com/KB/Article/0000664.htm

    Greets

    Wednesday, September 04, 2013 3:34 PM
  • Have you tried Active Directory Sites and Services and checking to see if all of your DC's are in the appropriate site?  That's what happened to me.  Below is what I tried.. hope it helps someone.

    http://www.mountainvistatech.com/2013/10/08/exchange-2010-topology-discovery-failed-error-0x8007077f-v-issue/

    Monday, April 07, 2014 1:39 AM
  • Were you able to figure out this issue? 

    About 22 days ago I experienced same problem and the only solution was to restart the server. Yesterday the problem was repeated. Any suggestions would be greatly appreciated.

    Monday, May 25, 2015 11:18 PM