none
This member is waiting for initial replication for replicated folder RRS feed

  • Question

  • Hi there, sorry for posting this as a duplicate of microsoft.public.windows.server.dfs_frs on news.microsoft.com

    I have the following problem:

    I set up some DFS shares for some folders containing large amounts of data between some Windows 2003 R2 (branch servers) and they all replicate to a Windows 2008 Server.
    I used the wizard to set up replication for Data Collection, as I intend to back up the data on the Windows 2008 server.

    I have the following problem:
    All DFS'es started, but then stopped!

    When running a Replication Health Report I see the following:

     

    Warnings:
       I've masked the foldername as <folder>

    On branch servers:
    Premature purging of staging files is impacting performance on replicated folder
    - Staging files are being purged prematurely because the staging quota for replicated folder <folder> is too small. This purging can cause excessive disk I/O and CPU usage. To avoid this problem, increase the quota of the staging folder. This has occurred 80 times in the past 72 hours. Event ID: 4202

    On the central Windows 2008 Server:
    This member is waiting for initial replication for replicated folder
     - This member is waiting for initial replication for replicated folder <folder> and is not currently participating in replication. This delay can occur because the member is waiting for the DFS Replication service to retrieve replication settings from Active Directory. After the member detects that it is part of replication group, the member will begin initial replication.

    Premature purging of staging files is impacting performance on replicated folder
      - Staging files are being purged prematurely because the staging quota for replicated folder <folder> is too small. This purging can cause excessive disk I/O and CPU usage. To avoid this problem, increase the quota of the staging folder. This has occurred 58 times in the past 72 hours. Event ID: 4202


    Well what do I do? I've tried increasing the Staging quota to 16GB but there seem to be no help.

    • Edited by HAL07 Thursday, January 29, 2009 7:39 AM
    Wednesday, January 28, 2009 8:25 AM

Answers

  • HAL07 said:

    I've now finally managed to copy all data but it still waits for initial data replication at the central server (tried for over 2 days).

    I wonder if I should clear the DfsrPrivate folders on both servers. I tried stopping both servers' DFS Service but I am unable to delete the folder contents on the central server. It just gives me file not found when I try deleting it...

    The files I am not able to delete is in the folders
    DfsrPrivate\Staging\ContentSet{72366CAE-09CD-4E7E-AD20-FE9A4B5C4634}-{5F974C92-BB5A-4060-96A9-711DF1B0AF0D}\
    DfsrPrivate\Staging\ContentSet{72366CAE-09CD-4E7E-AD20-FE9A4B5C4634}-{9ED1A4B0-0BDA-4595-A095-BD77A8A2BCCD}\
    Where they also have a lot of subfolders.



    Hi HAL07,

     

    Here is the information about staging folders which may be helpful to you.

    Staging folders

    DFS Replication uses staging folders to act as caches for new and changed files to be replicated from sending members to receiving members. The sending member begins staging a file when it receives a request from the receiving member. The process involves reading the file from the replicated folder and building a compressed representation of the file in the staging folder. This is the staged file. After being constructed, the staged file is sent to the receiving member; if remote differential compression [RDC] is used, only a fraction of the staging file might be replicated. The receiving member downloads the data and builds the file in its staging folder. After the file has completed downloading on the receiving member, DFS Replication decompresses the file and installs it into the replicated folder.

    Each replicated folder has its own staging folder, which by default is located under the local path of the replicated folder in the DfsrPrivate\Staging folder. The default size of each staging folder is 4,096 MB. This is not a hard limit, however. It is only a quota that is used to govern cleanup and excessive usage based on high and low watermarks (90 percent and 60 percent of staging folder size, respectively). For example, when the staging folder reaches 90 percent of the configured quota the oldest staged files are purged until the staging folder reaches 60 percent of the configured quota.

    It is important to note that the staging folder quota does not determine the largest file that can be replicated. In other words, it is possible to replicate a file that is larger than the configured quota of a staging folder. The large file is placed in the staging folder, and the staging folder cleanup process is triggered when the file is finished staging and space usage is at or above the high watermark. If the cleanup fails because the large file is still in the process of being replicated to receiving members, the cleanup process will be retried later and eventually the large file will be purged from the staging folder.

    Although you can adjust the size of each staging folder, you may need to take the following factors into account while doing so:

    • For good operational performance, increasing the quota size of a staging folder is recommended when you have multiple large files that change frequently. We also recommend that you increase the staging folder quota on hub members that have many replication partners.
    • The size of each staging folder on a member is cumulative per volume. For example, if you have three replicated folders on a member on the same volume, it is possible for DFS Replication to use 12 GB or more for staging purposes. However, staging space is not preallocated; disk space is only used when staged files are present.

    Note

    In this example, staging cleanup might not be triggered when the staging files occupy 10.8 GB (0.9*4 GB*3), because staging space is distributed across the three replicated folders. If free disk space is a concern, you might need to configure the staging quota to be lower than the default quota when several replicated folders share staging space on the same volume. This ensures that staging cleanup is triggered.

    • If the size of a staging folder is below 90 percent of configured capacity (the high watermark), then staged files are kept in the folder and can be used in case new members are added.
    • Staged files are purged (based on a "least recently used" algorithm) when a staging folder reaches 90 percent of the configured quota (the high watermark). Files are purged until the staging size falls below the low watermark (default is 60 percent) of the configured quota.
    • If a staging folder quota is configured to be too small, DFS Replication might consume additional CPU and disk resources to regenerate the staged files. Replication might also slow down because the lack of staging space can effectively limit the number of concurrent transfers with partners.
    • For the initial replication of existing data on the primary member, it is important that you size the staging folder quota large enough so that if multiple large files are blocked in staging due to partners not being able to download the files, the remaining files can continue replicating. To properly size the staging folder for initial replication, you may take into account the size of the files to be replicated. At a minimum, the staging folder quota should at least be two times the size of the largest file in the replicated folder. For increased performance, the staging folder quota should be increased to the size of the four largest files in the replicated folder on spoke members and to the size of the sixteen largest files in the replicated folder on hub members.
    • During normal operation, if the event that indicates the staging quota is over its configured size (event ID 4208 in the DFS Replication event log) is logged multiple times in an hour, increase the staging quota by 20 percent.

     

    Hope it helps.


    David Shen - MSFT
    • Marked as answer by David Shen Monday, February 9, 2009 2:27 AM
    Friday, February 6, 2009 5:03 AM
  • Okay. I think I finally resolved it.

    I just deleted the entire Replication Group and then synced domain and made sure both servers had understood the change ( ran pollad on both). Then I deleted the DfsrPrivate at both sides, but did not erase any content.

    Then I re-created the repliaction group.

    After two days, the DFS is finally in sync.

    Mr Shen, you were of invaluable help to me and I sure learned a lot during this problem.

    It seem to me that Microsoft got to experiment more on initial replication, as it seems this has been the problem area, and make a smarter DFS that can detect this somehow strange "waiting for initial repliaction". Set up a test lab and use the default DFS settings on a normal 2003 R2 server, syncing 800GB of data. Be sure to add some latency and bandwidth to the virtual lab to make it a "real environment".
    • Marked as answer by HAL07 Monday, February 9, 2009 7:43 AM
    Monday, February 9, 2009 7:43 AM

All replies

  • Hi HAL07,

    You may encounter the issue when no primary member is specified on the DFS Replication group. Please check David's suggestion in the following thread, which may be helpful to you.

    DFS Warnings - Win2k8
    http://social.technet.microsoft.com/Forums/en-US/winserverfiles/thread/4c9910e8-4088-4812-8af3-e36f98359abb/

    Thanks and regards,
    Scorprio


    MCTS, MCITP:Enterprise Admin
    • Proposed as answer by David Shen Thursday, January 29, 2009 7:23 AM
    • Unproposed as answer by HAL07 Thursday, January 29, 2009 7:39 AM
    Wednesday, January 28, 2009 9:13 AM
  • I have now set staging quota to 32GB and it's still no news.

    I also tried the steps:
    dfsradmin Membership Set /RGName:<replication group name> /RFName:<replicated folder name> /MemName:<member you want to be primary> /IsPrimary:True

    Then Dfsrdiag Pollad /Member:<member name>

    I made sure the AD was synced before running pollad on both members.  I have now waited 15 hours and still no news.

    I still get the event logs on DFS Replication:

    First
    Event Type:    Warning
    Event Source:    DFSR
    Event ID:    4202
    User:        N/A
    Description:
    The DFS Replication service has detected that the staging space in use for the replicated folder at local path <folder> is above the high watermark. The service will attempt to delete the oldest staging files. Performance may be affected.
     
    Additional Information:
    Staging Folder: <folder>
    Configured Size: 32096 MB
    Space in Use: 28886 MB
    High Watermark: 90%
     
    Low Watermark: 60%
     
    Replicated Folder Name: <folder>
    Replicated Folder ID: 72366CAE-09CD-4E7E-AD20-FE9A4B5C4634
    Replication Group Name: <rgname>
    Replication Group ID: 7559DEE4-31B0-4AED-8519-364F3F022A25
    Member ID: 5F974C92-BB5A-4060-96A9-711DF1B0AF0D


    and then

    Event Type:    Information
    Event Source:    DFSR
    Event ID:    4204
    Description:
    The DFS Replication service has successfully deleted old staging files for the replicated folder at local path <folder>. The staging space is now below the high watermark.
     
    Additional Information:
    Staging Folder: <folder>
    Configured Size: 32096 MB
    Space in Use: 19257 MB
    High Watermark: 90%
     
    Low Watermark: 60%
     
    Replicated Folder Name: <folder>
    Replicated Folder ID: 72366CAE-09CD-4E7E-AD20-FE9A4B5C4634
    Replication Group Name: <rgname>
    Replication Group ID: 7559DEE4-31B0-4AED-8519-364F3F022A25
    Member ID: 5F974C92-BB5A-4060-96A9-711DF1B0AF0D

    This gets repeated every 4 hour or so

    Thursday, January 29, 2009 7:39 AM
  • Hi HAL07,

     

    According to the description, it appears that you tried replicating the large amount files among Windows Server 2003 R2 based-computer in different sites via DFS replication.

     

    To narrow down the root cause, would you please refer to the following steps to troubleshoot the issue?

     

    1. As the server may be located at different sites, the replication may fail if there is an Black Hole router among these sites. Please verify that the network connection works normally among the sites. You may test with ICMP protocol to ping the sites each other to see if they can access properly.

     

    How to Troubleshoot Black Hole Router Issues

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;314825

     

    2. Please apply the hotfixes in the following KB article on the problematic Windows Server 2003 R2 based-computer, and then reboot them to see if the it is helpful to fix the issue.

     

    List of currently available hotfixes for Distributed File System (DFS) technologies in Windows Server 2003 R2

    http://support.microsoft.com/kb/958802

     

    Replicated files are copied over the network when you use the Distributed File System (DFS) Replication feature on a Windows Server 2003 R2-based computer

    http://support.microsoft.com/?id=931685

     

    3. Event ID 4202 shows that the staging files might be purged prematurely because the replicated folder contains files that are larger than the configured staging quota, or because the configured maximum staging size has been exceeded. As DFS Replication will stage any file for which disk space is available. This means that the user defined staging area quota can be exceeded for a single file as long as disk space is available.

     

    Event ID 4202 notifies that this behavior has been triggered, if you are seeing this error often or if you are replicating several large files among sites, you may need to increase the size of the staging area space in order to avoid excessive hard drive and CPU activity. The slow replication might be due to the reason that DFSR Staging directory is too small for the amount of data being modified.

     

    It is recommended that you set the staging folder size as twice the size of the data being replicated via the sites. Afterwards, you may also restart the "DFS Replication" and the "DFS namespace" on the problematic server to test.

     

    Top 10 Common Causes of Slow Replication with DFSR

    http://blogs.technet.com/askds/archive/2007/10/05/top-10-common-causes-of-slow-replication-with-dfsr.aspx

     

    4. If these data still doesn't replicate in this case, I would like to suggest that you use Robocopy.exe or Xcopy.exe utility to prestage these data on the Windows Server 2003 R2-based DFS member server.

     

    For more information, please refer to:

     

    How to use the Backup program to prestage data before DSFR synchronization in Windows Server 2003 R2

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;947726

     

    Hope it helps.


    David Shen - MSFT
    • Marked as answer by David Shen Tuesday, February 3, 2009 1:56 AM
    • Unmarked as answer by HAL07 Thursday, February 5, 2009 7:49 AM
    Thursday, January 29, 2009 9:36 AM
  • Ping -f -l 1472 works fine.

     

     

    I have not yet added the hotfixes.


    > It is recommended that you set the staging folder size as twice the size of the data being replicated via the sites.

    What? I have 600GB of data replicating here. I cannot make the staging twice as large.

     

     

    Sp what you are saying is that we mostly can copy the data from one server to another and then set up the sync. Is that better?


    Thursday, January 29, 2009 12:49 PM
  • HAL07 said:

    Ping -f -l 1472 works fine.

     

     

    I have not yet added the hotfixes.


    > It is recommended that you set the staging folder size as twice the size of the data being replicated via the sites.

    What? I have 600GB of data replicating here. I cannot make the staging twice as large.

     

     Sp what you are saying is that we mostly can copy the data from one server to another and then set up the sync. Is that better?




    Hi HAL74,

     

    As you have 600GB data want to being replicated via DFS replication, it might be not proper to configure the staging folder size to 1200GB. In this case, it is recommended you copy the data from one server to another server via Robocopy or xcopy, afterwards you may sync the target folders via DFS replication. This may decrease time to consistent data  among DFS member servers.

     

    Hope it helps.


    David Shen - MSFT
    • Marked as answer by David Shen Tuesday, February 3, 2009 1:56 AM
    • Unmarked as answer by HAL07 Thursday, February 5, 2009 12:37 PM
    Friday, January 30, 2009 9:22 AM
  • Can I do this whilst the DFS service is already syncing, or should I stop the DFS somehow? Not sure how I can stop DFS running or pause it without stopping all DFS'es on the same central server (windows 2008).

    I tried copying whilst the DFS service is running, and I get a really low data-rate when copying data...

    • Marked as answer by HAL07 Thursday, February 5, 2009 12:36 PM
    • Unmarked as answer by HAL07 Thursday, February 5, 2009 12:38 PM
    Friday, January 30, 2009 9:33 AM
  • HAL07 said:

    Can I do this whilst the DFS service is already syncing, or should I stop the DFS somehow? Not sure how I can stop DFS running or pause it without stopping all DFS'es on the same central server (windows 2008)

    You may try temporarily pause replication to the selected member by disabling the member's connection of the sending members on the specefic DFS replication group, by disabling this membership will make data to stop replicating to and from the selected replicated folder on the selected member.
     
    Steps:
     
    1. Open DFS management console
    2. Expand Replication node, locate to the replication group that you want to pause replication
    3. In the middle area of the console, click on Connections tab, right-click on the member server under the Sending member and select Disable.
    4. Run "dfsrdiag pollad" on all the DFS member servers.
    5. prestage the data among the DFS member servers
    6. After it completes, please enable the Connection to make it works.
     
    Hope it helps. 

    David Shen - MSFT
    • Marked as answer by David Shen Tuesday, February 3, 2009 1:55 AM
    • Unmarked as answer by HAL07 Wednesday, February 4, 2009 5:43 PM
    Friday, January 30, 2009 9:51 AM
  • I've now finally managed to copy all data but it still waits for initial data replication at the central server (tried for over 2 days).

    I wonder if I should clear the DfsrPrivate folders on both servers. I tried stopping both servers' DFS Service but I am unable to delete the folder contents on the central server. It just gives me file not found when I try deleting it...

    The files I am not able to delete is in the folders
    DfsrPrivate\Staging\ContentSet{72366CAE-09CD-4E7E-AD20-FE9A4B5C4634}-{5F974C92-BB5A-4060-96A9-711DF1B0AF0D}\
    DfsrPrivate\Staging\ContentSet{72366CAE-09CD-4E7E-AD20-FE9A4B5C4634}-{9ED1A4B0-0BDA-4595-A095-BD77A8A2BCCD}\
    Where they also have a lot of subfolders.
    Wednesday, February 4, 2009 5:42 PM
  • HAL07 said:

    I've now finally managed to copy all data but it still waits for initial data replication at the central server (tried for over 2 days).

    I wonder if I should clear the DfsrPrivate folders on both servers. I tried stopping both servers' DFS Service but I am unable to delete the folder contents on the central server. It just gives me file not found when I try deleting it...

    The files I am not able to delete is in the folders
    DfsrPrivate\Staging\ContentSet{72366CAE-09CD-4E7E-AD20-FE9A4B5C4634}-{5F974C92-BB5A-4060-96A9-711DF1B0AF0D}\
    DfsrPrivate\Staging\ContentSet{72366CAE-09CD-4E7E-AD20-FE9A4B5C4634}-{9ED1A4B0-0BDA-4595-A095-BD77A8A2BCCD}\
    Where they also have a lot of subfolders.



    Hi HAL07,

     

    Here is the information about staging folders which may be helpful to you.

    Staging folders

    DFS Replication uses staging folders to act as caches for new and changed files to be replicated from sending members to receiving members. The sending member begins staging a file when it receives a request from the receiving member. The process involves reading the file from the replicated folder and building a compressed representation of the file in the staging folder. This is the staged file. After being constructed, the staged file is sent to the receiving member; if remote differential compression [RDC] is used, only a fraction of the staging file might be replicated. The receiving member downloads the data and builds the file in its staging folder. After the file has completed downloading on the receiving member, DFS Replication decompresses the file and installs it into the replicated folder.

    Each replicated folder has its own staging folder, which by default is located under the local path of the replicated folder in the DfsrPrivate\Staging folder. The default size of each staging folder is 4,096 MB. This is not a hard limit, however. It is only a quota that is used to govern cleanup and excessive usage based on high and low watermarks (90 percent and 60 percent of staging folder size, respectively). For example, when the staging folder reaches 90 percent of the configured quota the oldest staged files are purged until the staging folder reaches 60 percent of the configured quota.

    It is important to note that the staging folder quota does not determine the largest file that can be replicated. In other words, it is possible to replicate a file that is larger than the configured quota of a staging folder. The large file is placed in the staging folder, and the staging folder cleanup process is triggered when the file is finished staging and space usage is at or above the high watermark. If the cleanup fails because the large file is still in the process of being replicated to receiving members, the cleanup process will be retried later and eventually the large file will be purged from the staging folder.

    Although you can adjust the size of each staging folder, you may need to take the following factors into account while doing so:

    • For good operational performance, increasing the quota size of a staging folder is recommended when you have multiple large files that change frequently. We also recommend that you increase the staging folder quota on hub members that have many replication partners.
    • The size of each staging folder on a member is cumulative per volume. For example, if you have three replicated folders on a member on the same volume, it is possible for DFS Replication to use 12 GB or more for staging purposes. However, staging space is not preallocated; disk space is only used when staged files are present.

    Note

    In this example, staging cleanup might not be triggered when the staging files occupy 10.8 GB (0.9*4 GB*3), because staging space is distributed across the three replicated folders. If free disk space is a concern, you might need to configure the staging quota to be lower than the default quota when several replicated folders share staging space on the same volume. This ensures that staging cleanup is triggered.

    • If the size of a staging folder is below 90 percent of configured capacity (the high watermark), then staged files are kept in the folder and can be used in case new members are added.
    • Staged files are purged (based on a "least recently used" algorithm) when a staging folder reaches 90 percent of the configured quota (the high watermark). Files are purged until the staging size falls below the low watermark (default is 60 percent) of the configured quota.
    • If a staging folder quota is configured to be too small, DFS Replication might consume additional CPU and disk resources to regenerate the staged files. Replication might also slow down because the lack of staging space can effectively limit the number of concurrent transfers with partners.
    • For the initial replication of existing data on the primary member, it is important that you size the staging folder quota large enough so that if multiple large files are blocked in staging due to partners not being able to download the files, the remaining files can continue replicating. To properly size the staging folder for initial replication, you may take into account the size of the files to be replicated. At a minimum, the staging folder quota should at least be two times the size of the largest file in the replicated folder. For increased performance, the staging folder quota should be increased to the size of the four largest files in the replicated folder on spoke members and to the size of the sixteen largest files in the replicated folder on hub members.
    • During normal operation, if the event that indicates the staging quota is over its configured size (event ID 4208 in the DFS Replication event log) is logged multiple times in an hour, increase the staging quota by 20 percent.

     

    Hope it helps.


    David Shen - MSFT
    • Marked as answer by David Shen Monday, February 9, 2009 2:27 AM
    Friday, February 6, 2009 5:03 AM
  • Okay. I think I finally resolved it.

    I just deleted the entire Replication Group and then synced domain and made sure both servers had understood the change ( ran pollad on both). Then I deleted the DfsrPrivate at both sides, but did not erase any content.

    Then I re-created the repliaction group.

    After two days, the DFS is finally in sync.

    Mr Shen, you were of invaluable help to me and I sure learned a lot during this problem.

    It seem to me that Microsoft got to experiment more on initial replication, as it seems this has been the problem area, and make a smarter DFS that can detect this somehow strange "waiting for initial repliaction". Set up a test lab and use the default DFS settings on a normal 2003 R2 server, syncing 800GB of data. Be sure to add some latency and bandwidth to the virtual lab to make it a "real environment".
    • Marked as answer by HAL07 Monday, February 9, 2009 7:43 AM
    Monday, February 9, 2009 7:43 AM