none
DNS queries failing when primary server is being rebooted RRS feed

  • Question

  • Hi All,

    We recently had an issue with primary DNS server (top-level domain). As part of our quarterly maintainance, the primary DNS servers was patched and rebooted. During the reboot windows, we have had users who complained of DNS queries failing and application being un-available. These applications are published via Citrix and hence the essentially of DNS being available for these apps to be launched.

    Summary:

    We have 2 DNS servers on our root domain. All the clients have the primary DNS server say 192.168.1.1 as its preferred primary DNS server on it's IP settings and the secondary DNS server 192.168.1.2 as secondary. Now, ideally during the reboot of the primary server, the clients should have it's DNS queries forwarded to secondary after it times out on the primary server. Seems like that didn't happen.

    Is there a way to re-produce the issue on a test env without an actual reboot? Say manually, change the primary DNS IP on the client TCP settings to a non-existent address.

    How to find when a DC is ready to serve DNS queries after reboot.? Netlogon services?

    We also have a secondary UNIX zone on our Windows DNS servers. Most of the shouts came from this zone. Not really sure how Infoblox would handle the switch during the un-availability of the primary DNS server

    How Windows DNS server will handle DNS queries for Primary and Secondary Zones during booting up (server started pinging but either showing applying settings etc.. or in a state where its waiting for all the services to be started)

    Regards,

    Ochen

    Tuesday, September 30, 2014 3:53 AM

Answers

  • Hi,

    Unfortunately, it doesn't work that way. A client will not automatically change from the preferred entry to a DNS tertiary entry. The reason is based on the client side resolver algorithm. Each operating system has a client side resolver service (Windows clients, servers, DCs, Linux, Unix, Apple - Linux anyway, etc...), and they all work exactly the same way because they all follow the industry standard RFCs that state how they should resolve DNS queries. The resolver will cache the lookup results for the length of the TTL on the DNS records in the local cache. The only way to clean that is tell your users to reboot, or run an ipconfig /flushdns to clear it and force it to query again.

    And what's worse, some applications don't play well, such as Outlook and BES! I have examples in my blog, below.

    More on it from my notes below:

    ***********************************************************************
    Client side resolver blurb:

    How the client side resolver service works with multiple DNS entries in the NICs.
     
    There's a general misconception that using multiple DNS addresses is a failover configuration, rather it's not. That's why in any AD environment, we must only specify DNS servers that either are authoritive for the zone (meaning they host a copy), or have a reference to the zone (either by forwarding, stubs, or secondaries). I realize you may already know this, but I thought to mention it for others reading this. You would be surprised on this misconception, too.

    To summarize:
     
    If the first entry responds but doesn't have an answer, which is what we call an NXDOMAIN response (when the DNS server doesn't have an answer but it responded), it won't go to the second entry, because it got an answer, even though it is not the answer we wanted.
     
    If the DNS server does not respond, which we call a NULL response (when the DNS is down and can't respond), it will go to subsequent DNS entries in the order entered in the NIC after a time out period, or TTL, which can last 15 seconds or more as it keeps trying the first one, at which then it REMOVES the first entry from the "eligible resolvers" list, until the list is reset after 15 minutes, to go back to the top. You can also manually reset the list by restarting the DHCP Client Service on 2000/2003/XP, or restarting the DNS Client Service on 2008/Vista and all newer, or simply restart the machine.


    Here's a good read on the client side resolver:
    Scroll down to "DNS Client side resolver service Algorithm"
    http://blogs.msmvps.com/acefekay/2009/11/29/dns-wins-netbios-amp-the-client-side-resolver-browser-service-disabling-netbios-direct-hosted-smb-directsmb-if-one-dc-is-down-does-a-client-logon-to-another-dc-and-dns-forwarders-algorithm/

    And here are some additional reading on the subject:
     
    Technet Thread: "problem with secondary dns"
    http://social.technet.microsoft.com/Forums/en-US/winserverNIS/thread/8fc4597c-d64e-4a87-9cfe-5fe159df5735/
     
    DNS Clients and Timeouts (Part 1 & Part 2)
    karammasri [MSFT] Dec 2011 6:18 AM
    http://blogs.technet.com/b/stdqry/archive/2011/12/02/dns-clients-and-timeouts-part-1.aspxhttp://blogs.technet.com/b/stdqry/archive/2011/12/15/dns-clients-and-timeouts-part-2.aspx
     
    DNS Client side resolver service
    http://technet.microsoft.com/en-us/library/cc779517.aspx
     
    The DNS Client Service Does Not Revert to Using the First Server in the List in Windows XP (applies to other operating systems, too)
     http://support.microsoft.com/kb/320760
    ***********************************************************************


    Ace Fekay
    MVP, MCT, MCSE 2012, MCITP EA & MCTS Windows 2008/R2, Exchange 2013, 2010 EA & 2007, MCSE & MCSA 2003/2000, MCSA Messaging 2003
    Microsoft Certified Trainer
    Microsoft MVP - Directory Services
    Complete List of Technical Blogs: http://www.delawarecountycomputerconsulting.com/technicalblogs.php

    This posting is provided AS-IS with no warranties or guarantees and confers no rights.

    FaceBookTwitterLinkedIn


    Monday, October 6, 2014 3:42 AM

All replies

  • Hi, anyone experienced similar issues? Much appreciated. Thanks....
    Tuesday, September 30, 2014 4:27 PM
  • Hi Ochen,

    Does this issue occurs during the booting up or continues after booting up?

    If this issue only occurs during the booting up, please check if the client send query to the secondary DNS server. To verify this, we can perform a network capture.

    As you have mentioned, if the primary DNS server doesn't repond to the query, the client should send the query to the secondary DNS server. If the client doesn't send the query to the secondary DNS server, it should be a client issue.

    To download Network Monitor, please click the link below,

    http://www.microsoft.com/en-us/download/details.aspx?id=4865

    To find when a DC is ready to serve DNS queries after reboot, please use nslookup to resolve the hostname in the client side.

    • nslookup
    • server the IP address of the rebooted server
    • hostname

    If the server continues after booting up, please check the items below,

    1. Is the DNS service started?
    2. Does the DNS server finish loading zones?
    3. Is there any warning or error in the event viewer of the DNS server?

    Best Regards.



    Steven Lee

    TechNet Community Support

    Wednesday, October 1, 2014 3:50 AM
    Moderator
  • Hi Steve,

    Thanks. The issue occurs only when booting up. After the DNS server is completely online and all zones loaded, no issues after that. I already tried a network capture from a Windows client when re-producing the issue and i can see that the switch-over to the secondary DNS server happens (at least on Windows environment).

    The only blemish i could see was that there is a UNIX secondary zone which rather took a long time to completely come online even after the server was up and the rest of the other zones. When i checked, there was a red X icon against the zone but it disappeared after sometime.

    No errors on event viewer.

    Regards,

    Ochen

    Wednesday, October 1, 2014 4:20 AM
  • Hi Ochen,

    What's the master server of the UNIX zone? Is there any information in that server?

    Have you tried to perform zone transfer manually?

    To perform zone transfer manually, please follow the steps below,

    1. Open DNS console tree
    2. Right click on the secondary zone
    3. Click Transfer form Master

    If server shows any information, please check if there is detailed information in event viewer.

    If nothing occurs, please try to reload the secondary zone.

    Best Regards.



    Steven Lee

    TechNet Community Support

    Friday, October 3, 2014 1:55 AM
    Moderator
  • Hi,

    Check the Local Name server port 53 is accessible through telnet or not. Or you can also use the following tool for FQDN Troubleshoot. 

    http://www.atomicorp.com/wiki/index.php/Local_DNS_resolver


    Regards, Koustov Choudhury

    Friday, October 3, 2014 4:00 AM
  • Few things to look at,

    as mentioned previously, the best way to start troubleshooting is to do a network capture, i personally like wireshark for that. You can either capture on client or on switch level using port mirroring.

    As to how to reproduce the issue the best way would be to add a static route to the machine forcing it to use a null route for the ip address of primary server. this way the primary server becomes unavailable to your client leaving only the secondary server available.

    Now, coming to actual issue, how have you configured the Unix zone? is it set up as a AD integrated secondary zone? or is it simply a secondary zone. If you have configured it as AD integrated zone the partition replication of AD should take care of its propagation to your secondary server. But if you have it set as a non AD integrated and it only exist on your primary server, during the outage of primary server it is obvious that zone will not be available to clients. To check the proper zone propagation you can use the nslookup command:

    nslookup

    lserver = <IP address of Your Secondary Server>

    set type=soa

    YourDomain.Com

    in the results look at SOA number and then issue this command:

    lserver = <IP address of Your primary Server>

    Provided that your AD replication is complete, whicn you can either invoke or confirm via repadmin.exe, the SOA numbers should be same.

    If all this seems a lot, do a packet capture with the static route I mentioned and post the results. DNS resolution issues are very easy to troubleshoot given enough information.

    Friday, October 3, 2014 4:12 AM
  • Hi,

    Unfortunately, it doesn't work that way. A client will not automatically change from the preferred entry to a DNS tertiary entry. The reason is based on the client side resolver algorithm. Each operating system has a client side resolver service (Windows clients, servers, DCs, Linux, Unix, Apple - Linux anyway, etc...), and they all work exactly the same way because they all follow the industry standard RFCs that state how they should resolve DNS queries. The resolver will cache the lookup results for the length of the TTL on the DNS records in the local cache. The only way to clean that is tell your users to reboot, or run an ipconfig /flushdns to clear it and force it to query again.

    And what's worse, some applications don't play well, such as Outlook and BES! I have examples in my blog, below.

    More on it from my notes below:

    ***********************************************************************
    Client side resolver blurb:

    How the client side resolver service works with multiple DNS entries in the NICs.
     
    There's a general misconception that using multiple DNS addresses is a failover configuration, rather it's not. That's why in any AD environment, we must only specify DNS servers that either are authoritive for the zone (meaning they host a copy), or have a reference to the zone (either by forwarding, stubs, or secondaries). I realize you may already know this, but I thought to mention it for others reading this. You would be surprised on this misconception, too.

    To summarize:
     
    If the first entry responds but doesn't have an answer, which is what we call an NXDOMAIN response (when the DNS server doesn't have an answer but it responded), it won't go to the second entry, because it got an answer, even though it is not the answer we wanted.
     
    If the DNS server does not respond, which we call a NULL response (when the DNS is down and can't respond), it will go to subsequent DNS entries in the order entered in the NIC after a time out period, or TTL, which can last 15 seconds or more as it keeps trying the first one, at which then it REMOVES the first entry from the "eligible resolvers" list, until the list is reset after 15 minutes, to go back to the top. You can also manually reset the list by restarting the DHCP Client Service on 2000/2003/XP, or restarting the DNS Client Service on 2008/Vista and all newer, or simply restart the machine.


    Here's a good read on the client side resolver:
    Scroll down to "DNS Client side resolver service Algorithm"
    http://blogs.msmvps.com/acefekay/2009/11/29/dns-wins-netbios-amp-the-client-side-resolver-browser-service-disabling-netbios-direct-hosted-smb-directsmb-if-one-dc-is-down-does-a-client-logon-to-another-dc-and-dns-forwarders-algorithm/

    And here are some additional reading on the subject:
     
    Technet Thread: "problem with secondary dns"
    http://social.technet.microsoft.com/Forums/en-US/winserverNIS/thread/8fc4597c-d64e-4a87-9cfe-5fe159df5735/
     
    DNS Clients and Timeouts (Part 1 & Part 2)
    karammasri [MSFT] Dec 2011 6:18 AM
    http://blogs.technet.com/b/stdqry/archive/2011/12/02/dns-clients-and-timeouts-part-1.aspxhttp://blogs.technet.com/b/stdqry/archive/2011/12/15/dns-clients-and-timeouts-part-2.aspx
     
    DNS Client side resolver service
    http://technet.microsoft.com/en-us/library/cc779517.aspx
     
    The DNS Client Service Does Not Revert to Using the First Server in the List in Windows XP (applies to other operating systems, too)
     http://support.microsoft.com/kb/320760
    ***********************************************************************


    Ace Fekay
    MVP, MCT, MCSE 2012, MCITP EA & MCTS Windows 2008/R2, Exchange 2013, 2010 EA & 2007, MCSE & MCSA 2003/2000, MCSA Messaging 2003
    Microsoft Certified Trainer
    Microsoft MVP - Directory Services
    Complete List of Technical Blogs: http://www.delawarecountycomputerconsulting.com/technicalblogs.php

    This posting is provided AS-IS with no warranties or guarantees and confers no rights.

    FaceBookTwitterLinkedIn


    Monday, October 6, 2014 3:42 AM