none
DFSR: Recommended Staging File Size and Strategy for Clearing DFSR Backlog

    Question

  • We recently deployed a branch office DFSR system, and a sizable backlog (3,500 files) developed while we shipped the branch office server.  It's recently been up and running but nothing on the DFSR seems to be going through from the main office to the branch.  I have a sneaking suspiscion that it's just trying to build up a massive staging file, but I don't know and could use some confirmation.

    The main server is running Windows Server Std. 2008 SP1, and the branch Windows Server Std. 2008 R2.  I properly pre-seeded and verified replication prior to shipping.  Additionally, once setup at the branch office, A/D and DNS pass the best practice tests, as well as DCDIAG.  DFSR from the branch to the main is working and timely.  The branch office firewall and server LAN connection status show that 2.4GB has been received by the branch office in the last 5.5 hours, which could only be attributeable to the DFSR sync. I've been watching the event logs on both the main and branch servers and nothing wrong is showing up there.

    However, the backlog count continues to grow, and when I run a DFS diagnostic report the branch office server "Summary of replicated folder status" shows:

    • Backlogged Receiving Transactions: 3902
    • # of Files Received: 0
    • DFS Replication Bandwidth Savings: 0.00%

    I've been running DFSRDIAG replicationstate /member:<branch office server> fairly frequently, and the results are typically thus:

    • Active inbound connections: 1
    • Number of Updates: 124
    • Total number of inbound updates being processed: 0
    • Updates Scheduled: <file list> (varies every now and then)
    • Total number of inbound updates scheduled: 124
    • Active inbound connections: 1
    • Updates received: 124

     So:

    1. Is this normal behaviour for clearing a backlog?
    2. What should my staging file size strategy be?  Does the staging quota affect how often the system actually does a transfer?  Is there a way to set it so that it flushes frequently even if that's not the optimal bandwidth usage?  I'm aware of the BP (start at 4GB and slowly increase until you stop getting 4202 warnings), but I'm concerned that it's just going to keep staging without ever actually distributing.  For both servers assume processing power and hard-disk space are plentiful, but bandwidth is restrictive.  I'm additionally concerned that I can't maintain 100% uptime between the branch office and main office. 

    Thanks for any advice.

    Sunday, June 12, 2011 7:26 PM

Answers

  • As a follow-up, here's how the situation was resolved, although I'm not sure what actually happened...

    Situation:

    • At the main office, configured and pre-staged the two DFSR member servers as per MSDN best practices.  Allowed them to run for several days to verify that all synchronization was occuring properly.
    • Shipped one server to the branch office.  As expected, a backlog of files developed while the server was in transit.
    • After setting the server up at the branch office not a single was being synched, even after days.  Verified all A/D, DNS, firewall, networking issues, etc..  The backlog just grew, and a lot of bandwidth was being used, but zero files were actually being transferred.  (Again, this was after 4-5 days -- not just hours, but days.)  All the diagnostics, event logs and BPAs said everything was working properly.

     

    Resolution: 

    I removed the replicated folder and member servers from DSFR and began recreating it.  Luckily it still "recognized" my pre-staging and only had to replicate the new content since the server had shipped.  I began by adding replicated folders that were sub-folders of my final goal; these started synching files within minutes.  Because you can't have nested replicated folders, this was essentially a 10-day process of replicating sub-folders, waiting for them to fully synch, removing the replication and replicating their parent folders until the top-level replicated folder was synched.  I have no idea if this was strictly necessary (as opposed to just recreating the top-level replicated folder) but it worked.

    Also, there was one issue that cropped up which might have been the problem:  When I recreated the replicated folder on one of the subfolders, it looked like the same issue was appearing.  On closer examination, this subfolder was shared on the main office server, but not on the branch office server (since no one at the branch office needed to access the share).  The security permissions were the same, but the folder wasn't shared.  I created the same share on that folder on the branch office server, recreated the replicated folder, and the problem seemed to go away.  However this configuration never caused a problem during the initial setup and testing when both servers were at the main office; the backlog always cleared quickly.  Maybe the bandwidth difference...?  I don't know.  Either way, it works now, and I'm quite happy with its performance and functionality.

    • Marked as answer by obrienw Wednesday, June 29, 2011 3:36 PM
    Wednesday, June 29, 2011 3:36 PM

All replies

  • Hi,

    1. Staging folder will be cleared automatically when used staging space is higher than the high watermark. Generally we do not need to manually clean it.

    This article provided detailed instruction about staging folder:

    Staging Folder Guidelines for DFS Replication
    http://blogs.technet.com/b/filecab/archive/2006/03/20/422544.aspx

    2. For the staging file size, when we need to do an initial replication, we can set the size with a large number to make sure replication working fine. Generally it should larger than the biggest file in the files we need to replicate.

    Following article mentioned how to setup staging folder size when doing an initial replication. Meanwhile, it is recommanded to do a pre-staging instead of waiting DFSR to do the initial replication. We can use Robocopy or backup-restore to do the pre-staging.

    http://blogs.technet.com/b/askds/archive/2007/10/05/top-10-common-causes-of-slow-replication-with-dfsr.aspx

     


    Shaon Shan |TechNet Subscriber Support in forum |If you have any feedback on our support, please contact tngfb@microsoft.com
    Monday, June 13, 2011 8:42 AM
    Moderator
  • Thanks for your reply, but neither of those particularly answer my questions.  Let me rephrase it:

    Are the staging file quota size and the time between processing inbound updates related?  How come sometimes the recipient downloads for 21 hours, but doesn't process any files at all and other times, with the same quota, it receives and processes a single file immediately?  Can I control this? 

    I have the hard drive capacity to create a massive staging file, but at the same time if it's going to wait 21 hours to process a massive staging file, that's not going to do at all.  If I set a small staging file, will it process updates more often (even if overall it's less efficient)?  Or am I just seeing a connection that doesn't really exist?

    Monday, June 13, 2011 2:58 PM
  • Hi,

    DFS save and compress changed files to Staging folder and ready to send to Staging folder on target server. When target server receive the files, it decompress the data and update replication folder. Thus it may happen that the replication is working but files are not created as they are still in Staging folder.

    Here is an article which related to the size of staging folder:

    Staging Folder Guidelines for DFS Replication

    http://blogs.technet.com/b/filecab/archive/2006/03/20/422544.aspx


    Shaon Shan |TechNet Subscriber Support in forum |If you have any feedback on our support, please contact tngfb@microsoft.com
    Sunday, June 19, 2011 9:01 AM
    Moderator
  • As a follow-up, here's how the situation was resolved, although I'm not sure what actually happened...

    Situation:

    • At the main office, configured and pre-staged the two DFSR member servers as per MSDN best practices.  Allowed them to run for several days to verify that all synchronization was occuring properly.
    • Shipped one server to the branch office.  As expected, a backlog of files developed while the server was in transit.
    • After setting the server up at the branch office not a single was being synched, even after days.  Verified all A/D, DNS, firewall, networking issues, etc..  The backlog just grew, and a lot of bandwidth was being used, but zero files were actually being transferred.  (Again, this was after 4-5 days -- not just hours, but days.)  All the diagnostics, event logs and BPAs said everything was working properly.

     

    Resolution: 

    I removed the replicated folder and member servers from DSFR and began recreating it.  Luckily it still "recognized" my pre-staging and only had to replicate the new content since the server had shipped.  I began by adding replicated folders that were sub-folders of my final goal; these started synching files within minutes.  Because you can't have nested replicated folders, this was essentially a 10-day process of replicating sub-folders, waiting for them to fully synch, removing the replication and replicating their parent folders until the top-level replicated folder was synched.  I have no idea if this was strictly necessary (as opposed to just recreating the top-level replicated folder) but it worked.

    Also, there was one issue that cropped up which might have been the problem:  When I recreated the replicated folder on one of the subfolders, it looked like the same issue was appearing.  On closer examination, this subfolder was shared on the main office server, but not on the branch office server (since no one at the branch office needed to access the share).  The security permissions were the same, but the folder wasn't shared.  I created the same share on that folder on the branch office server, recreated the replicated folder, and the problem seemed to go away.  However this configuration never caused a problem during the initial setup and testing when both servers were at the main office; the backlog always cleared quickly.  Maybe the bandwidth difference...?  I don't know.  Either way, it works now, and I'm quite happy with its performance and functionality.

    • Marked as answer by obrienw Wednesday, June 29, 2011 3:36 PM
    Wednesday, June 29, 2011 3:36 PM