Shares on DFS server become non-responsive, failover does not happen unless server rebooted, domain logons become slow


  • Hi, so we have had an issue recently which happens intermittantly which I am having trouble finding a root cause for. Hoping maybe I can get some ideas from here to point me in the right direction. First, a basic description of the environment.

    Site A - Main office, ~250 users

    Site B - Datacenter, 0 users

    Site C - Branch office, ~60 users

    Site A and C are each connected to Site B by a dedicated 100mb line. All sites are reachable by all users via a routed network. All networking within each site is on 1Gb equipment.

    Each site has a file server configured with DFS (let's call them File Server A, B, & C). We have 7 replication groups, full mesh topology on all. Network drives are mapped using group policy preferences to the \\domain\files\replicationgroup namespace. All workstations are Win 7.

    So, the problem. Every so often, the shares on file server at Site A will become non-responsive. We can still RDP to and browse files on the file server locally, but anything accessed through a UNC share does not respond. This includes \\domain\files\replicationgroup, as well as \\fileserverA\shares\replicationgroup. At the same time, anyone attempting to logon to our domain (All server 2008 R2 DC's) at Site A will find that their logon takes upwards of 15 or more minutes. Anyone logging on at Site C does not have a problem. As well, all mapped drives become non-responsive and cause Windows explorer to hang. We have to reboot File Server A before the drives will fail over to hitting File Server B via the DFS namespace.

    Since this is our most important server, I was not able to do any troubleshooting while the problem was occuring today, simply needed to get the users working again. I am now trying to understand why:

    A: File shares would become non-responsive.

    B: DFS does not failover properly.

    C: This would affect the ability to logon to the domain (my thought is it's to do with the fact that the drives are mapped to the Namespace, which is not failing over properly)

    I have looked at the event logs on File Server A from just before the shares became non-responsive and there don't appear to be any unusual errors. I'm really scratching my head. Can anyone suggest a possible cause or a course of action to try and dig into it? I'm happy to provide more details about the environment if necessary.

    mercoledì 9 gennaio 2013 19:33