Experiencing regular, intermittent failures of Outlook and Outlook Web App connectivity: AD Topology errors in event log
Monday, May 14, 2012 12:24 AM
For about 3 days in a row, almost every 12 hours or so Outlook and OWA connectivity would suddenly fail. Outlook Web App would display the message "no server is available that contains information about your mailbox" and Outlook would prompt for a password. In each case, restarting the Exchange AD Topology service would fix it. Although my first thought was a network connectivity issue, we have a monitoring system in place and the system was continuously available to pings even during the outages. Changing the machine to use a secondary NIC also didn't fix the issue. I found the following errors in the event log:
Process MSEXCHANGEADTOPOLOGY (PID=2364). When updating security for a remote procedure call (RPC) access for the Microsoft Exchange Active Directory Topology service, Exchange could not retrieve the security descriptor for Exchange server object SERVERNAME- Error code=80040934.
The Microsoft Exchange Active Directory Topology service will continue starting with limited permissions.
This is followed by several other errors that seem to be related to this first issue.
Process MSEXCHANGEADTOPOLOGYSERVICE.EXE (PID=2364). All Domain Controller Servers in use are not responding:
The CAS is a Hyper-V VM. It does not have a dedicated NIC, but it's on a lightly used NIC in the host. Just prior to the problems appearing, I removed a hub transport server from our install.
Any thoughts on what may be causing the issue? Did I miss something when uninstalling Exchange from the hub transport server, and could it cause these symptoms? Is it a permissions or certificate issue? Any troubleshooting steps to try that may eliminate that as a cause?
The problem has not recurred for about 3 days now (since Friday morning), but I am concerned that it may happen again when the work week resumes.
Monday, May 14, 2012 3:31 AMYou have a performance issue either with the poor CAS server availability or Domain Controller, let's try to find the exact problem by doing the below:
Are you running your Exchange or AD enviornment on Virtual Machines?
Steps: Try to first choose a PC, which you can use to reproduce a problem, on this pc open the host file from "c:\windows\system32\drivers\etc\host" in notepad, and add "cas-array fqdn" with the IP address of specific Exchange 2010 CAS Server, and try to reproduce the problem, and check one by one both the server, which will tell you with which exact server you are experiencing this problem.
Secondly, to further troubleshoot the performance issue, you have to read network monitor traces, you can use Microsoft Network Monitor Tool 3.0 (same as wireshark). Install it on both server (CAS) and client side, and try to produce and see what happens when the issue comes up.
I have seen this problem, comes when Exchange or domain controller is running on VM.
Read this article for making sure your CAS VM is correctly configured:
Read this for Hyper-V performance optimization:
If you have virtualized DC, you should follow the best practices showing here:
Outlook connectivity problems:
It seems like poor directory authentication response is causing this issue, try to add manual static domain controller for exchange server one by one and see for which dc you are facing this problem.
Zahir Hussain Shah | MVP - Exchange Server | Senior Infrastructure Consultant - Messaging | My blog: http://zahirshahblog.com ]If my answer fixes your problem, mark them as answer, so it will help others to find a solution]
Monday, May 14, 2012 12:26 PM
Thanks for your suggestions--can you help me narrow down your response though?
I already have the hotfix applied for Hyper-V network issues. As discussed in my question, I also have an active network monitoring solution in place. None of the machines involved--CAS or DCs--have had any failures of network connectivity as such near the time of the CAS AD topology failure. I've also tried changing the VM to use a different physical NIC on the Hyper-V host, which was still followed by a failure. It seems like the physical connectivity issue is a dead end. I'm also not experiencing any of the problems you link to articles re:. My domain is replicating fine, I'm not having logon problems--only this single CAS server has had problems locating the domain controllers.
What do you mean by poor directory authentication response? I agree that it seems like a problem with authentication of some kind, but I'm hoping for some suggestions on what to check. I'm not sure how to add manual static domain controller--do you mean specify it in the hosts file?
We do have 3 CAS--one on a remote site, and two on our main site, one of which is experiencing the problem. Exchange Autoconfiguration is set to use the FQDN of the CAS server that has been experiencing the problems. The second CAS server is one I've set up to begin doing some testing of DAG.
Thursday, June 07, 2012 8:23 PMWe did end up having some AD replication issues. Fixing those appears to have resolved the symptoms we were experiencing.