none
DFS - Large file has frozen replication

    Question

  • Hi Everyone,

    We have a site to site VPN connection that allows our two Windows Server 2008 file servers to replicate with each other. Recently we put a rather large (19 Gig) image from of a hard drive in the file share folder and have been experiencing issues with the replication ever since. The large file doesnt seem to have been replicated over to the other server and to the best of our knowledge the replication process has been halted. Here is the error in the logs:

    The DFS Replication service failed to clean up old staging files for the replicated folder at local path d:\files\Development. The service might fail to replicate some large files and the replicated folder might get out of sync. The service will automatically retry staging space cleanup in 30 minutes. The service may start cleanup earlier if it detects some staging files have been unlocked.
     
    Additional Information:
    Staging Folder: d:\files\Development\DfsrPrivate\Staging\ContentSet{7308DF43-DD49-4361-934D-86CF949F4C97}-{4DE7DEC0-D875-477C-898D-C356E164B698}
    Configured Size: 4096 MB
    Space in Use: 18826 MB
    High Watermark: 90%
     
    Low Watermark: 60%
     
    Replicated Folder Name: Development

    From the error above the problem seems pretty clear. These gave me good information:

    • http://technet.microsoft.com/en-us/library/cc754229.aspx
    • http://blogs.technet.com/b/askds/archive/2007/10/05/top-10-common-causes-of-slow-replication-with-dfsr.aspx

    It looks like we have to increase our staging quota size, BUT, we're worried that it might mess up DFS and we don't know if it will actually solve the problem at this point.  Has anyone done this or knows the affects? Any information would be very appreciated! Thanks - Mike



     
    Monday, September 26, 2011 11:07 PM

All replies

  • Hi Degromp,

    answering your questions:

    • It looks like we have to increase our staging quota size: yes, this is the problem. As you already indicated, this article http://technet.microsoft.com/en-us/library/cc754229.aspx shows the smartest strategy for Staring Quota in my point of view: use as base the 9 largest files in the replicated folder. I strongly recommend you to increase the Staging Quota. Now as I`ve seen  you are using the default 4GB size. This is OK for small files but when you put a larger than 4GB file you will get in trouble.
    • BUT, we're worried that it might mess up DFS and we don't know if it will actually solve the problem at this point. NO changing the Staging Quota will not harm Replication. I have more than 3000 replicated folders and almost in a daily basis I have to grow some Staging Quotas and reduce because strict storage requirements. Increasing the Staging Quota will allow this large file to be replicated and will make your replication health be improved.

     


    Thank you,

    F. Schubert
    System Administrator

    MCT | Microsoft Certified Technology Specialist: Windows Server 2008 Network Infrastructure, Configuration

    • Edited by CoffeineNerd Tuesday, September 27, 2011 6:30 AM
    Tuesday, September 27, 2011 6:29 AM
  • CaffeineNerd,

    Thanks a lot for such a quick response! I will talk to my boss and make sure he wants to do that and I will post results/mark you as answered. Thanks a lot Ill check back soon.

    Tuesday, September 27, 2011 3:25 PM
  • One more article:

    http://blogs.technet.com/b/filecab/archive/2006/03/20/422544.aspx

    It provided suggestions above setting the size of staging folder and the reason.


    TechNet Subscriber Support in forum |If you have any feedback on our support, please contact tnmff@microsoft.com.
    Wednesday, September 28, 2011 6:49 AM
    Moderator
  • Thanks to all those who have replied!

    Increasing the stage and deletion quota sized have seemed to work. I have checked the logs and they have been error free since the change last Friday and all but one backlog file have gone down to zero again.

    That brings me to my last question: The original hard drive image that initially caused the issue is still on in the backlog of the original server I uploaded it to. How can I remove this? I searched google to no avail about how to clear a backlog properly. See the issue below:

    C:\Windows\system32>dfsrdiag backlog /rgname:"domaingoeshere.local\files\development"
     /rfname:development /smem:Server1 /rmem:Server2
    
    Member <Server2> Backlog File Count: 1
    Backlog File Names (first 1 files)
         1. File name: 38EDDE375097BC56-00-00.mrimg
    
    Operation Succeeded
    
    
    

    I would like to clear this file from the backlog somehow. Not opposed to removing the file completely. Thank you everyone!

    Friday, September 30, 2011 7:59 PM
  • Hi everyone,

    No updates on this last issue?

    Thursday, October 06, 2011 9:54 PM
  • Hi Degromp,

     

    please give a look on the following articles, some hints from Shaon and an Article from Ned Pyle:

    DFSR issues - Backlog reset

    and

    Top 10 Common Causes of Slow Replication with DFSR

    Hope this helps you to remove this Backlog file.

    Please Vote as Helpful or Mark as Answer if the content was helpful.


    Thank you,

    F. Schubert
    System Administrator

    MCP | Microsoft Certified Professional
    MTCS 70-642 | Microsoft Certified Technology Specialist: Windows Server 2008 Network Infrastructure, Configuration

    Friday, October 07, 2011 2:48 AM