locked
Intermittent DNS problems on Windows Server 2008 R2 (DNS cache returning serv fail) RRS feed

  • Question

  • So I'm trying to implement a load balanced split/split DNS infrastructure to replace the current infrastructure.

    I've got the environment more or less in place at this point (not in production) and I'm trying to slowly roll out my caching only resolvers to select user groups, but I'm running in to some nagging problems.

    First I was having issues reliably resolving many sites that use Akamai for hosting/DNS. For example if you do an nslookup for www.bing.com you will receive CNames like search.ms.com.edgesuite.net.  These name servers are utilizing EDNS and our firewall was not playing nice with the packets. I made the suggested registry change here which disables EDNS and resolved those problems
    http://support.microsoft.com/kb/832223

    Now, we're having another problem. Once in awhile a group of sites (all seemingly owned by Aol, in particular, engadget.com) become unresolvable. I can run an nslookup -d2 and it definitively shows a SERV FAIL. If I look at the cache on an affected server I see NS records, but typically no other cached records. If I delete the cache the sites are immediately resolvable. Also, typically after an undetermined amount of time (15'ish minutes) the problem resolves itself. If it had happened one time I'd have forgotten about it, but the problem recurs on a roughly weekly basis.  

    I've also made this change because our caching servers are using root hints even though it doesn't exactly describe our problem:
    http://support.microsoft.com/kb/968372/en-us

    Additionally I can add a forwarder to the server and it immediately starts resolving properly again. So I'm wondering if this bug is still affecting me and what options I have to alleviate it.

    Friday, August 5, 2011 7:35 PM

All replies

  • Hi nangar J,


    Thanks for posting here.

     

    So which DNS servers are these affected servers pointing to use ? Did you configure the server to use root hits for internet name resolution ? What’s the OS version running on this server ?

    Could you also verify if any DNS or name resolution issue was been recorded form event log on these hosts and post back here. Could you also post the nslookup result here?

     

    Meanwhile, have you also disabled IPv6 on a Windows Server 2008 R2 DNS server ?

     

    DNS Server service randomly cannot resolve external names and returns a "Server Failure" error if IPv6 is disabled in Windows Server 2008 R2

    http://support.microsoft.com/kb/2549656

     

    Thanks.

     

    Tiger Li


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
    Monday, August 8, 2011 6:38 AM
  • Hey thanks for the reply!

    The servers are using root hints.

    Server OS is Windows Server 2008 R2 Standard (SP1 NOT applied)

    IPV6 has been disabled on the adapter and in the registry: http://support.microsoft.com/kb/929852

    I don't have an NSLookup handy (surprising because I've taken a million), but I do have a recent DiG failure.  If this pops up again (and it will) I'll grab an nslookup set d2.

    C:\>dig @ProblemServer www.aol.com

     

    ; <<>> DiG 9.8.0-P4 <<>> @ProblemServer www.aol.com

    ; (1 server found)

    ;; global options: +cmd

    ;; Got answer:

    ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 34000

    ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

     

    ;; QUESTION SECTION:

    ;www.aol.com.                   IN      A

     

    ;; Query time: 0 msec

    ;; SERVER: ProblemServerIP#53(ProblemServerIP)

    ;; WHEN: Mon Aug 01 10:22:50 2011

    ;; MSG SIZE  rcvd: 29

     

    Monday, August 8, 2011 1:30 PM
  • Hi nangar j,

    I would like to point out that disabling IPv6 is NOT recommended. It's actually part of the OS. Here's why:

    The Cable Guy - Support for IPv6 in Windows Server 2008 R2 and Windows 7, by Joseph Davies, Microsoft, Inc.
    "IPv6 is a mandatory part of the Windows operating system and it is enabled and included in standard Windows service and application testing during the operating system development process. Because Windows was designed specifically with IPv6 present, Microsoft does not perform any testing to determine the effects of disabling IPv6. If IPv6 is disabled on Windows Vista, Windows Server 2008, or later versions, some components will not function."
    http://technet.microsoft.com/en-us/magazine/2009.07.cableguy.aspx

     

    As far as EDNS0, it's makes resolution more efficient. DNS queries are initially over UDP. If the domain has large amounts of data, say above 512 bytes, it will revert to TCP in order to get the whole response. When the IETF came out with the EDNS0 implementation in the late 90's,it was slow getting adopted by many in the industry. Windows 2003 was the first with it out. ENDS0 support UDP query packet sizes way beyond the 512 byte limit making resolution more efficient instead of having to change up the request protocol to TCP and resending it. Some firewalls look at it as a spoof. If EDNS0 is not enabled on the firewall or if the firewall does not support it, then it can be an issue. Internally, some responses may be larger than 512 bytes, too, therefore it can cause issues in somce cases disabling EDNS0. It really should be addressed at the firewall.

    For example, if it is a Cisco ASA firewall, you can run the following command to enable it providing a 4000 byte limit:

    fixup protocol dns 4000

     

    Here's a quick command to test if there's an EDNS0 restriction in your firewall:
    nslookup -type=TXT rs.dns-oarc.net
    Look for the part in the response that says" ...DNS reply size limit is at least xxxx." The xxxx is what it will support. If it's under 512, then it is blocking EDNS0.

    You can test each individual DNS server by adding a -server=IpAddress in the command.

     

    So basically, disabling IPv6 really is not related to what you're seeing. Disabling EDNS0 and not getting proper responses, indicates an issue elsewhere. It could be the firewall, even if you had disabled EDNS0 on the DNS server.

    Tell you what, re-enable EDNS0, re-enable IPv6, configure a Forwarder to 4.2.2.2 and re-run your nslookup tests. If it works, then it clearly indicates there's more going on in the firewall that must be addressed.

     

    Ace

     

     


    Ace Fekay
    MVP, MCT, MCITP EA, MCTS Windows 2008 & Exchange 2007 & Exchange 2010, Exchange 2010 Enterprise Administrator, MCSE & MCSA 2003/2000, MCSA Messaging 2003
    Microsoft Certified Trainer
    Microsoft MVP - Directory Services
    Complete List of Technical Blogs: http://www.delawarecountycomputerconsulting.com/technicalblogs.php

    This posting is provided AS-IS with no warranties or guarantees and confers no rights.

    Tuesday, August 9, 2011 2:04 AM
  • Thanks Ace.  I will re-enable IPv6 and work with the network team to enable EDNS0.  I have already configured a forwarder at various times and alleviated the issue that way, but that is not an acceptable solution.  My intention is to run load balanced DNS servers for the enterprise that utilize root hints.

    I am in the process of using SCOM to monitor DNS resolution on these servers in order to gain greater visibility into the problem frequency as it is intermittent and self resolving.  Currently only 30 or so users are utilizing these servers... I'm sure as the servers are rolled out for thousands of users the resolution issues would escalate.

    Tuesday, August 9, 2011 2:16 PM
  • I don't believe load balancing DNS servers using NLB is the way to go. It introduces some complexities adding additional IPs on to a DNS server. If you have two DNS servers, that should do the trick.

    Can you elaborate on what the intended load balanced servers are for? Are they for AD or non-AD? If for AD, and the DNS servers are DCs, no, that is definitely not a good idea at all!

    If this is intended for AD, just using the default mechanisms with two DNS servers, putting one as the first, and one as the second, should do the trick. THen it's upu to the client side resolver algorithm to handle resolution. If this is what you mean, take a look at the following thread for more info:

    Good discussion on DC locator process and how the client handles AD Sites, when a DC goes down, and when a client moves between sites.
    Thread Question: "how to control sequence of domain controllers a client computer logging on"
    http://social.technet.microsoft.com/Forums/en-US/winserverDS/thread/77bc547f-4d0d-4a0c-b463-359b1c771a81/

     

    And a big factor is the client side resolver (Windows, Linus, Unix, etc, client side resolver algorithm work pretty much the same). Read more:

    This article discusses:
    WINS NetBIOS, Browser Service, Disabling NetBIOS, & Direct Hosted SMB (DirectSMB).
    The DNS Client Side Resolver algorithm.
    If one DC or DNS goes down, does a client logon to another DC?
    DNS Forwarders Algorithm (if you've configured more than one forwarders)
    Client side resolution process chart
    Published by Ace Fekay, MCT, MVP DS on Nov 29, 2009 at 10:28 PM  1764  1
    http://msmvps.com/blogs/acefekay/archive/2009/11/29/dns-wins-netbios-amp-the-client-side-resolver-browser-service-disabling-netbios-direct-hosted-smb-directsmb-if-one-dc-is-down-does-a-client-logon-to-another-dc-and-dns-forwarders-algorithm.aspx

    DNS Client side resolver service
    http://technet.microsoft.com/en-us/library/cc779517.aspx 

     

    I can post more, but i want to make sure that I understand your reasonings before I go further.

     

    Ace


    Ace Fekay
    MVP, MCT, MCITP EA, MCTS Windows 2008 & Exchange 2007 & Exchange 2010, Exchange 2010 Enterprise Administrator, MCSE & MCSA 2003/2000, MCSA Messaging 2003
    Microsoft Certified Trainer
    Microsoft MVP - Directory Services
    Complete List of Technical Blogs: http://www.delawarecountycomputerconsulting.com/technicalblogs.php

    This posting is provided AS-IS with no warranties or guarantees and confers no rights.

    Tuesday, August 9, 2011 8:51 PM
  • The DNS servers we're talking about would not be DCs.  They also do not host forward look up zones.  They only perform queries to the internet and cache responses.  They are load balanced by an F5 device which monitors up/down via dig requests.

    Internal resources will first hit DC/DNS servers that have copies of all our AD Integrated zones, any zones for which we are authoritative, and any zones that we transfer with business partners.  These servers then query the above mentioned servers for general internet lookups.  

    Still working with the network team to allow the EDNS0 requests through the appropriate firewalls ... and then will need to put the change through our change control process.  I'll update the thread as soon as I can.

    Wednesday, August 10, 2011 1:16 PM
  • The DNS servers we're talking about would not be DCs.  They also do not host forward look up zones.  They only perform queries to the internet and cache responses.  They are load balanced by an F5 device which monitors up/down via dig requests.

    Internal resources will first hit DC/DNS servers that have copies of all our AD Integrated zones, any zones for which we are authoritative, and any zones that we transfer with business partners.  These servers then query the above mentioned servers for general internet lookups.  

    Still working with the network team to allow the EDNS0 requests through the appropriate firewalls ... and then will need to put the change through our change control process.  I'll update the thread as soon as I can.


    Oh, I see. Thanks for elaborating. The load balanced DNS servers are just caching servers that the internal AD DC/DNS servers are forwarding to.

    You'll need to look at the firewall at the perimeter that the caching servers are using, not the internal firewall (second prong) that the AD infrastructure is using. This is because the forwarders to the caching servers is overcoming any limitations, if any, the internal firewall has.

    Matter of fact, that is where you want to run that nslookup test I provided earlier - from the caching servers.

    Ace


    Ace Fekay
    MVP, MCT, MCITP EA, MCTS Windows 2008 & Exchange 2007 & Exchange 2010, Exchange 2010 Enterprise Administrator, MCSE & MCSA 2003/2000, MCSA Messaging 2003
    Microsoft Certified Trainer
    Microsoft MVP - Directory Services
    Complete List of Technical Blogs: http://www.delawarecountycomputerconsulting.com/technicalblogs.php

    This posting is provided AS-IS with no warranties or guarantees and confers no rights.

    Wednesday, August 10, 2011 4:05 PM
  • We have issue at work where win 7 clients accessing shares to a server on a different external domain, get prompted for credentials with saying possible compromise in security, at the same time this happened one of older applications stopped printing to network printers hanging the application.  The app is also on different domain, but same as above.  If win 7 client has dns flushed and/or do klist purge w/ reboot then printing works for remainder of day and breaks next morning.  All this happened around time when other business removed several domains from our shared forest root, now we have a transitive trust.  Any help greatly appreciated, been trying to solve for months.  Not sure how many will see this post, but, can direct reply at jsg @ zoomtown.com       Thanks -Jim

    jim_ea

    Tuesday, September 6, 2016 11:45 PM
  • We have issue at work where win 7 clients accessing shares to a server on a different external domain, get prompted for credentials with saying possible compromise in security, at the same time this happened one of older applications stopped printing to network printers hanging the application.  The app is also on different domain, but same as above.  If win 7 client has dns flushed and/or do klist purge w/ reboot then printing works for remainder of day and breaks next morning.  All this happened around time when other business removed several domains from our shared forest root, now we have a transitive trust.  Any help greatly appreciated, been trying to solve for months.  Not sure how many will see this post, but, can direct reply at jsg @ zoomtown.com       Thanks -Jim

    jim_ea

    Sounds like a client side issue with the DNS  addresses listed. Please post an ipconfig /all of a sample client. And let us know if  the DNS addresses on the client can all resolve the same exact domains, or if they are mixed and resolve different domains.

    Ace Fekay
    MVP, MCT, MCSE 2012, MCITP EA & MCTS Windows 2008/R2, Exchange 2013, 2010 EA & 2007, MCSE & MCSA 2003/2000, MCSA Messaging 2003
    Microsoft Certified Trainer
    Microsoft MVP - Directory Services
    Complete List of Technical Blogs: http://www.delawarecountycomputerconsulting.com/technicalblogs.php

    This posting is provided AS-IS with no warranties or guarantees and confers no rights.

    FaceBookTwitterLinkedIn


    Wednesday, September 7, 2016 2:13 AM
  • How do you post screen pics here?   It could be these issues are totally separate, and possibly not related to the domain uninstalls.  I have a ton of information to post but screen pics would be easier.  Regarding the printing issue, after the klist purse of the Kerberos tickets w/ a reboot it always works after the initial login, though, after that token access expires or session expires overnight is when it breaks in the morning, can't print to network printer and hangs crashing app.  With the prompting when access shares across domain, it seems after 15min of idle you can get it to prompt again, but, if you retry right away it works, when it works it always works for 15-30 mins it seems.  Sometimes it works even after 15-30, but, out of 5 test each spanning 15-30min it will prompt at least 2-3 times the first try after the idle time.   To note, all DNS nslookups work from clients and on DCs to the external domain servers.  I've added host entries on the DCs to aid in the resolution, so, this could be misleading.  I'll try to post other data via screen pics.


    jim_ea


    • Edited by jim_ea Thursday, September 8, 2016 8:54 PM
    Thursday, September 8, 2016 8:51 PM
  • How do you post screen pics here?   It could be these issues are totally separate, and possibly not related to the domain uninstalls.  I have a ton of information to post but screen pics would be easier.  Regarding the printing issue, after the klist purse of the Kerberos tickets w/ a reboot it always works after the initial login, though, after that token access expires or session expires overnight is when it breaks in the morning, can't print to network printer and hangs crashing app.  With the prompting when access shares across domain, it seems after 15min of idle you can get it to prompt again, but, if you retry right away it works, when it works it always works for 15-30 mins it seems.  Sometimes it works even after 15-30, but, out of 5 test each spanning 15-30min it will prompt at least 2-3 times the first try after the idle time.   To note, all DNS nslookups work from clients and on DCs to the external domain servers.  I've added host entries on the DCs to aid in the resolution, so, this could be misleading.  I'll try to post other data via screen pics.


    jim_ea


    Jim,

    You can easily post an ipconfig /all from a client by copying and pasting from the command line. No screenshots necessary. If you can, post one from a sample client and one from a DC.

    If the resource is on a different domain or forest, there must be cross-DNS resolution between the forests. If there are restrictions on the resources (the printer), then that can cause it, too. I'm not sure how the printers are setup.

    But this sounds like a classic example of not being able to resolve across domains and forests. Here's what I mean:

    DNS Design Options in a Multi-Domain Forest – How to create a Parent-Child DNS Delegation, and How to Configure DNS to create a new Tree in the Forest
    http://blogs.msmvps.com/acefekay/2010/10/01/dns-parent-child-dns-delegation-how-to-create-a-dns-delegation/

    .

    As for screenshots, they are usually easily pasted by a right-click and paste, that is if you have the capability in the forum. But if you have many, then it will just clutter up the thread and would be easier if you posted the pictures at a sharing site, such as OneDrive or photobucket, and paste the links here.

    If there are many text files to post, host them there, too, and provide a link to them here.

    Besides, this thread belongs to a person named, nangar j and it appeaers it's over 5 years old. Perhaps starting a new thread would be in your best interest so you own it, and you can modify or delete posts, mark posts as Answer, etc.


    Ace Fekay
    MVP, MCT, MCSE 2012, MCITP EA & MCTS Windows 2008/R2, Exchange 2013, 2010 EA & 2007, MCSE & MCSA 2003/2000, MCSA Messaging 2003
    Microsoft Certified Trainer
    Microsoft MVP - Directory Services
    Complete List of Technical Blogs: http://www.delawarecountycomputerconsulting.com/technicalblogs.php

    This posting is provided AS-IS with no warranties or guarantees and confers no rights.

    FaceBook Twitter LinkedIn

    Thursday, September 8, 2016 9:15 PM
  • 1

    • Edited by jim_ea Wednesday, September 21, 2016 3:26 PM
    Tuesday, September 13, 2016 7:46 PM
  • Hi Jim,

    I assume domain.com is the AD domain name, and (IPs edited out per request) are DCs in that domain and that host DNS for the domain.

    TCP/UDP 88 is a necessary Kerberos port. I'm not sure why you would have that disabled or not permitted. Are there any other ports blocked?

    For DCs to communicate, we usually just allow everything, 1 - 1024, plus the ephemeral ports (Windows 2008 and newer ephemeral are TCP/UDP 49152 - 65535, and Windows 2000 & 2003 are TCP/UDP 1024-5000).


    Ace Fekay
    MVP, MCT, MCSE 2012, MCITP EA & MCTS Windows 2008/R2, Exchange 2013, 2010 EA & 2007, MCSE & MCSA 2003/2000, MCSA Messaging 2003
    Microsoft Certified Trainer
    Microsoft MVP - Directory Services
    Complete List of Technical Blogs: http://www.delawarecountycomputerconsulting.com/technicalblogs.php

    This posting is provided AS-IS with no warranties or guarantees and confers no rights.

    FaceBookTwitterLinkedIn



    Wednesday, September 14, 2016 1:32 AM

  • edit

    • Edited by jim_ea Wednesday, September 21, 2016 3:27 PM
    Wednesday, September 14, 2016 3:24 PM
  • Ace -- can you remove the domain/IPs from your post for now.

    jim_ea

    Wednesday, September 21, 2016 3:28 PM
  • I'd like to create a new dns zone for 2-3 servers outside our internal domain that exist on external domain (non-public, internal).  If the external domain is (eg.  dog.bark) and the servers reside there are say (sam, dave and gary), what would be my new zone name then?  I don't want it to be autorative for the ext. domain so I don't want "dog.bark" as the zone name correct?  Or, just any subdomain name such as  "auth.dog.bark" with my A-records then specified for each, this case (sam, dave, garry)?  But, then would be "sam.auth.dog.bark" which isn't the real fqdn of that server.  Thanks 

    jim_ea

    Wednesday, September 21, 2016 7:43 PM
  • I'd like to create a new dns zone for 2-3 servers outside our internal domain that exist on external domain (non-public, internal). 

    If the external domain is (eg.  dog.bark) and the servers reside there are say (sam, dave and gary), what would be my new zone name then? 

    I don't want it to be autorative for the ext. domain so I don't want "dog.bark" as the zone name correct? 

    Or, just any subdomain name such as  "auth.dog.bark" with my A-records then specified for each, this case (sam, dave, garry)?  But, then would be "sam.auth.dog.bark" which isn't the real fqdn of that server.  Thanks 


    jim_ea

    The question is a bit vague. From what I understand, I would just create a dog.bark zone on the external DNS servers, assuming what it appears is that you want to create a shadow zone of the actual dog.bark internal domain on an external DNS server. Then just create the necessary A (host) records for auth, sam, etc, giving them the external IPs. But you would need to make external machines use the external DNS server.

    If this is incorrect, please provide an Viso drawing explaining what you have, what you want, and why, in order to better understand what the end result is.


    Ace Fekay
    MVP, MCT, MCSE 2012, MCITP EA & MCTS Windows 2008/R2, Exchange 2013, 2010 EA & 2007, MCSE & MCSA 2003/2000, MCSA Messaging 2003
    Microsoft Certified Trainer
    Microsoft MVP - Directory Services
    Complete List of Technical Blogs: http://www.delawarecountycomputerconsulting.com/technicalblogs.php

    This posting is provided AS-IS with no warranties or guarantees and confers no rights.

    FaceBook Twitter LinkedIn

    Wednesday, September 28, 2016 8:01 PM