locked
Windows Server 2008 R2 DFS replication doesn't start initial replication RRS feed

  • Question

  • Hi there,

    We have 2 datacenter locations and with 1 fileserver (Windows Server 2008 R2) in each datacenter. We decided to add another fileserver in each datacenter and to replicate the data on the fileservers between the datacenters by using DFS replication. When the replication is running smooth we can start using DFS namespaces.

    Fileserver1 is in datacenter1
    Fileserver2 is in datacenter2
    Fileserver3 (new) is in datacenter2
    Fileserver4 (new) is in datacenter1

    We would like create 2 DFS replicationgroups; Fileserver1 with Fileserver3 and Fileserver2 with Fileserver4. The fileservers have a C:\ partition which contains the Windows installation and a D:\ partition which contains the data. On both Fileserver1 and Fileserver2 we have the folder D:\Homedirectories. We have approximately 40 customers on each fileserver. The folder D:\Homedirectories is approximately 300GB on each fileserver.

    The replicationgroup for the fileservers in datacenter1 works, finally. I figured there was an active share on Fileserver1 for a folder that did no longer exist. With the Share and Storage manager I could remove the share. After that point Fileserver1 starting the initial replication to Fileserver3. That was 3 weeks ago and it's still running smooth.

    I created the same replicationgroup for Fileserver2 and Fileserver4 in datacenter2. This replicationgroup won't start the initial replication. First I ran a DFS Diagnostic Report. It showed that there were more than 100 files which could not be replicated. I found out these were temporary files. All of these files seemed to be pictures or other kind of images (.png, .tif, .jpg). I received a Powershell scripts to check all files on D:\homedirectories and it's subfolders and to remove the temporary attribute. This solved the problem for the >100 files who could not be replicated.

    The Diagnostic Report showed another error. This error was quite specific. The error tells me there is a folder with "repeated sharing violations". See the content of the corresponding Windows Event Viewer event below; (the *** are censored customer names)

    =====

    The DFS Replication service failed to get folder information when walking the file system on a journal wrap or loss recovery due to repeated sharing violations encountered on a folder. The service cannot replicate the folder and files in that folder until the sharing violation is resolved.

    Additional Information:

    Folder: D:\Homedirectories\***\Tijdelijk\

    Replicated Folder Root: D:\Homedirectories

    File ID: {00000000-0000-0000-0000-000000000000}-v0

    Replicated Folder Name: Homedirectories

    Replicated Folder ID: CA05B7CE-AF89-4E43-8797-CDB9F871B778

    Replication Group Name: Datacenter2

    Replication Group ID: 2C182A82-041D-4F5A-A147-FE9107E954DE

    Member ID: 4DD60FC5-4E4B-4FCE-929E-17995616F83C

    =====

    I checked the Share and Storage Management to see if there are "corrupt" shares as there were for the replicationgroup1. I did not see any corrupt shares but just to be sure I deleted the share on the folder as given in the Windows Event. I restarted the DFS replication service but after 45 minutes the same Windows Event shows up, telling me there are "repeated sharing violations" on the folder D:\Homedirectories\***\Tijdelijk (keep in mind the *** is just to censor our customer's name. In our situation it's a foldername consisting of 3 alphabetic characters).

    I would really like to know what else can cause the "repeated sharing violations" for this exact folder. It's the only error left for me before the initial replication can start. I google'd a lot to see if anyone has encountered the same problem and there were some things I tried that unfortunately did not work for me;
    - Event 4312 can show up when the pagefile is on the same partition as the folder you want to replicate.
    This did not work for me because the pagefile is located on the C:\ partition and the folder to be replicated is located on the D:\ partition.

    - Event 4312 can show up when you have third party software trying to replicate the folder.
    This did not work for me because I do not have third party software trying to replicate the folder. Also there are no active Robocopy scripts and there is no back-up software running on that time. Even if this would be the case this would not explain why this exact folder keeps giving "repeated sharing violations".

    - Event 4312 can show up when you do not have sufficient rights to access the folder.
    This did not work for me because the sharing and NTFS permissions are the same as the other folders which do not give errors and also this folder has the same sharing and NTFS permissions as folders on the fileserver1, which is actually replicating. When I copy this folder with Robocopy I do not get any errors. SYSTEM and Administrators have full control on this folder.

    So again; what else can cause the error of "repeated sharing violations"?

    Thanks in advantage for any suggestions.
    Roy

    Thursday, October 10, 2013 9:20 AM

Answers

  • Hi Mandy Ye,

    I have checked the URL you suggested. The thread starter does not mention his operating system so I do not know if he is using the same operating system and the same DFS version. The first reply from Shaon Shan refers to a Microsoft KB article. The symptoms in my case are described in Cause 2; the file is in use. I cannot end this process since we are talking about the SYSTEM process. It would make my server crash.

    The second reply is from Ravikumar Pulagouni. I have seen his name a lot in different threads. It seems like he does not read the full description of the problem and just copy-pastes his text about the page file. The thread started mentioned that he found the corrupt file which is blocking the DFS replication but he cannot delete it. Obviously this person is not talking about the page file. Still the moderator marks this as an answer. To be honest this is kind of annoying. I wish there was a way for me to down-moderate his answers or to make a complaint. I know that the thread started does not expose enough details about his setup but the Microsoft professionals, as he claims to be in his forum signature, should then ask for more details in stead of providing irrelevant information.

    I really do appreciate your reply as it is relevant for my situation. Too bad the file is still in use by the SYSTEM process. To release this file I see no other way than to restart the server. We have planned this restart for 8 November 2013.

    I will keep you informed.

    Kind regards,
    Roy

    • Marked as answer by Mandy Ye Friday, November 1, 2013 9:53 AM
    Tuesday, October 22, 2013 11:17 AM

All replies

  • Hi,

    Is there an Event ID: 4004 on the DFSR debug log? If so, you can install the hotfixes in the kb article below on both servers to see if the issue still exists.

    The DFS Replication service may stop responding when it initializes the replication process for the replicated folders on a computer that is running Windows Server 2003 R2, Windows Server 2008, or Windows Server 2008 R2
    http://support.microsoft.com/kb/977381

    CAUSE:

    When the DFS Replication service initializes the replicated folders for the replication process, it traverses all related paths to check whether the replicated folders are reparse points that act as symbolic links or that act as mount points. 
    The DFS Replication service expects to open synchronous handles to access these paths. However, it uses the asynchronous handles incorrectly. The DFS Replication service cannot handle the I/O requests that are held by a filter driver. Therefore, the DFS Replication service stops responding.

    In the meantime, you could try to move the affected folder out of the replication group to see if replication will back to work without that folder. Sharing violation could be caused if file is locked when DFSR trying to replicate.

    Regards,


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

    Monday, October 14, 2013 3:34 AM
  • Hi Mandy Ye,

    Thank you for your reply. I did found a 4004 error in the DFS Replication log. The last time this error occurred was 3 weeks ago. After that I restarted the DFS Replication service multiple times. Now the only error that still occurs is the 4312.

    I will try the hotfix you proposed. The only problem is that I have to restart the fileserver after applying this hotfix. The fileserver is not redundant (yet.. because of DFS Replication is not working) and our customers will experience some downtime because of this restart. We cannot afford this so I will have to wait until we have maintenance.

    Regards,
    Roy

    Monday, October 14, 2013 7:25 AM
  • Hi Mandy Ye,

    As I figured fileserver2 would not be restarted in the near future because of inconvenience for our customers I toke another look at the "Sharing Violations". It had to do something with that and the given folder, D:\Homedirectories\***\Tijdelijk. Someone proposed to use the tool "Handle.exe" on the folder D:\Homedirectories\***\Tijdelijk. There was an executable file in use by the process "SYSTEM".

    I investigated the file. I could not see it's attributes, and I, as an Administrator, could not see the NTFS permissions. Also the owner information of this file could not be retrieved and I could not change the owner. I'm 99% sure this "corrupted" file is the root cause of DFS not starting the replication.

    I used several methods and tools trying to move, rename or delete the corrupted file but every time it tells me I do not have sufficient rights to do so. I also used the tool Process Explorer to close the handle in Process SYSTEM. Again no success. I also stopped the DFS Replication service to see if this is the process that's "locking" the file but again, no success.

    It seems like rebooting the server is the fastest way to unlock the file so I can delete or move it. I still not prefer rebooting this server as it is essential for our customers and some of them work at night too.

    Maybe someone has been in this situation before and knows how to unlock this file. To be honest I doubt anybody knows how to unlock this file since I can't even close the handle using Process Explorer.

    Kind Regards,
    Roy

    Monday, October 14, 2013 1:19 PM
  • Hi Mandy Ye,

    I have checked the URL you suggested. The thread starter does not mention his operating system so I do not know if he is using the same operating system and the same DFS version. The first reply from Shaon Shan refers to a Microsoft KB article. The symptoms in my case are described in Cause 2; the file is in use. I cannot end this process since we are talking about the SYSTEM process. It would make my server crash.

    The second reply is from Ravikumar Pulagouni. I have seen his name a lot in different threads. It seems like he does not read the full description of the problem and just copy-pastes his text about the page file. The thread started mentioned that he found the corrupt file which is blocking the DFS replication but he cannot delete it. Obviously this person is not talking about the page file. Still the moderator marks this as an answer. To be honest this is kind of annoying. I wish there was a way for me to down-moderate his answers or to make a complaint. I know that the thread started does not expose enough details about his setup but the Microsoft professionals, as he claims to be in his forum signature, should then ask for more details in stead of providing irrelevant information.

    I really do appreciate your reply as it is relevant for my situation. Too bad the file is still in use by the SYSTEM process. To release this file I see no other way than to restart the server. We have planned this restart for 8 November 2013.

    I will keep you informed.

    Kind regards,
    Roy

    • Marked as answer by Mandy Ye Friday, November 1, 2013 9:53 AM
    Tuesday, October 22, 2013 11:17 AM
  • As a reply to my post at 22 October 2013;

    I restarted the file server on 8 November 2013. After the restart I could remove the corrupted file. The DFS Replication still did not replicate any files to my other fileserver. I checked the Event Viewer en found an error; Event ID 4004. This information was inside the event:

    The DFS Replication service stopped replication on the replicated folder at local path D:\Homedirectories.

    Additional Information:

    Error: 9003 (The replication group is invalid)

    Additional context of the error:  

    Replicated Folder Name: Homedirectories

    Replicated Folder ID: 1834D93E-93AF-444D-93C8-F34824BFAA19

    Replication Group Name: TeleCity4

    Replication Group ID: 10B7BE01-5ACA-417E-96E2-B9903B4E3762

    Member ID: 41301DB2-A383-4974-8ABA-6D3DAE7F882C

    I checked the replication groups on both fileservers and the only replication group I have on these fileservers were configured in the right way. I headed to the internet and quickly found 2 websites that could help me out;

    http://www.leversuch.co.uk/solved-dfs-error-the-replication-group-is-invalid/
    http://netself.com/wp/?p=434

    The replication database on my primary file server seemed to be corrupted. I found out that every partition on your Windows server which contains a DFS replication folder has a folder called "DFSR" inside the folder "System Volume Information" in the root of that partition. So for me I had to check inside "D:\System Volume Information\DFSR". First you should stop the "DFS Replication" service. Default you cannot check this folder so you have to edit the security settings of the "D:\System Volume Information" and give yourself "Full Control" rights. Once you can access this folder you have to delete this folder. You cannot delete this folder using the Windows Explorer. So open the elevated Command Prompt and head to "D:\System Volume Information". Now use the following command:

    rd DFSR /s /q

    You have now deleted the "DFSR" folder. After starting the "DFS replication" service the folder will be recreated.

    For me everything works fine now. The case can be closed.

    Kind regards,
    Roy

    Monday, November 11, 2013 1:06 PM