none
DNS resolution issues when attempting to connect to login.microsoftonline.com RRS feed

  • Question

  • I work for an ISP and starting recently our users have been experiencing issues when trying to reach certain Microsoft sub-domains. The most recent examples were passwordreset.microsoftonline.com, login.microsoftonline.com or portal.office.com.

    The issues happen relatively regularly but inconsistently/intermittently. We have narrowed it down to an issue at the home router levels.

    Our DNS servers have both UDP and TCP port 53 opened and all dns requests are returned properly.

    However, if the device uses it's DHCP assigned DNS values, which is the home router default gateway (usually a 192.168.0.1 type address) acting as a DNS forwarder, the DNS request times out.

    After running a few packet captures under a few different conditions, here is what I found:

    In all scenarios the DNS requests start with a UDP port 53 DNS lookup request which goes through fine and the computer receives a truncated answer, but instead of displaying the results of the truncated answer, a second DNS request, this time using TCP port 53 is sent.

    When polling the DNS server directly (DNS server settings set statically), the DNS lookup results get displayed once the DNS server respond to the request

    However when using the router as a forwader, using the default/out of the box router DHCP-assigned DNS values (the router acts as both the DHCP server and assigns its own gateway address as the DNS server when offering a DHCP lease and DHCP configuration options), the router seems to reject the TCP port 53 DNS lookup packets and responds with RST flagged packets without forwarding this TCP DNS request, leading to a timeout and the inability to load the pages in question.

    I understand that the UDP to TCP switch happens when a DNS record is larger than 512 Bytes and tat is the reason why the request is sent again that way but do not understand why the truncated answers are not used when the full answer cannot be obtained. The issue occurs on Windows, MacOSx & iOS using all sorts of browsers such as firefox, chrome, safari, ie, edge, etc... as well as when just attempting nslookups for those domains from the OS terminals.

    Also tested on Android using the Termux app, and pointing nslookup to the gateway address. No issues with the browsers as it looks like android uses its own DNS settings and seems to ignores the DHCP provided DNS configuration (at least on the test device).

    What can we do to ensure that our customers can access those pages without encountering this issue and without having to fully reconfigure their routers to provide custom DNS settings, which would be pretty backwards and would cause multiple issues whenever a router is reset to factory default?

    Wednesday, January 17, 2018 10:42 PM

All replies