none
Files backlogged in several DFS Replication Folder

    Pergunta

  • Hey guys,

    I have a DFS setup between two sites. The sites are linked through a 1 Mbps dedicated symetrical link, and as of yet there are no aparent conectivity issues. DFS is running Windows 2008r2 and ejenal2 is running Windows Storage Server 2008 r2. Servers are fully patched until one week ago. The total size of the replicated folders between sites is roughly 1.7 TB.

    As of a couple of months ago the several of  replicated folders seem to have fallen out of sync. I have viewed a couple of dfsrdiag backlog commands and have encountered mixed results:

    dfsrdiag.exe backlog /smem:ejenal2 /rmem:dfs /rgname:ejenal-dfs /rfname:dv    over 14K backlogged files
    dfsrdiag.exe backlog /smem:ejenal2 /rmem:dfs /rgname:ejenal-dfs /rfname:di    over 256K backlogged files
    dfsrdiag.exe backlog /smem:ejenal2 /rmem:dfs /rgname:ejenal-dfs /rfname:dsi   over 1452 backlogged files

    however, there are some folders that still seem to be functioning:

    dfsrdiag.exe backlog /smem:ejenal2 /rmem:dfs /rgname:ejenal-dfs /rfname:dt           20 backlogged files

    Some of the steps I have tried:

    On the most critical folders I have increased the quota from 40 GiB to 400 GiB, bandwith throttling has been turned off, I have updated firmware and drivers of ejenal2 site (dfs is a vm).

    I am really out of options other than deleting, reseeding and recreating the DFSR groups. Anybody has any aditional ideas?

    TIA

    Javier

    - - - - - Follow up: - - - - - -

    As of this morning (April 10), I re ran the above querries and there was notable progress!:

    dfsrdiag.exe backlog /smem:ejenal2 /rmem:dfs /rgname:ejenal-dfs /rfname:dv    over 120 backlogged files
    dfsrdiag.exe backlog /smem:ejenal2 /rmem:dfs /rgname:ejenal-dfs /rfname:di    over 49K backlogged files
    dfsrdiag.exe backlog /smem:ejenal2 /rmem:dfs /rgname:ejenal-dfs /rfname:dsi   fully synced! backlogged files

    I fiddled a bit with the quota settings on DV, putting them from 40GiB to 400GiB. I also updated the firmware on the Ejenal2 server. That I did just moments before I posted my original post three days ago. BTW the 400 GiB setting I pulled out of my <explicit>. The 32 largest files on DV only amount to 24 GiB.



    • Editado Havs123 quarta-feira, 10 de abril de 2013 16:05
    quarta-feira, 3 de abril de 2013 20:32

Todas as Respostas

  • If your data were to change at a rate of 1% per day in almost perfect conditions it would use your link solidly for 5 hours. Have you checked the performance of the link to see if it is at maximum usage ? 

    Adam

    quarta-feira, 3 de abril de 2013 21:11
  • Hi there,

    Can you look at your events log and post any events ID related to dfs-r? I am looking to see if you have any of the following staging area events:

    Event ID: 4202 
    Severity: Warning

    The DFS Replication service has detected that the staging space in use for the replicated folder at local path (path) is above the high watermark. The service will attempt to delete the oldest staging files. Performance may be affected.

    Event ID: 4204 
    Severity: Informational

    The DFS Replication service has successfully deleted old staging files for the replicated folder at local path (path). The staging space is now below the high watermark.

    Event ID: 4206 
    Severity: Warning

    The DFS Replication service failed to clean up old staging files for the replicated folder at local path (path). The service might fail to replicate some large files and the replicated folder might get out of sync. The service will automatically retry staging space cleanup in (x) minutes. The service may start cleanup earlier if it detects some staging files have been unlocked.

    Event ID: 4208 
    Severity: Warning

    The DFS Replication service detected that the staging space usage is above the staging quota for the replicated folder at local path (path). The service might fail to replicate some large files and the replicated folder might get out of sync. The service will attempt to clean up staging space automatically.

    Event ID: 4212 
    Severity: Error


    Isaac Oben MCITP:EA, MCSE,MCC View my MCP Certifications

    quarta-feira, 3 de abril de 2013 21:27
  • Sorry for the delay.

    Yes, there are a couple of instances of each of those errors on the Ejenal2 server. The Ejenal2 server is the one used on the local office, the DFS is in the data center and used more as a data collector than an actual share.

    . Here is a breakdown of them:

    5014 80 instances in the last 7 days
    4202 19 instances in the last 7 days
    4304 1 instance in the last 7 days

    In the past I have tried to get rid of them, but have not been succesful at that.

    sexta-feira, 5 de abril de 2013 19:55
  • There are some spikes, but the connection is not near a 100% use.
    sexta-feira, 5 de abril de 2013 19:56
  • Hello,

    Have you look at you staging quota size? I am thinking it might be too small and you may need to increase it. Take a look at this articles about the errors

    http://blogs.technet.com/b/askds/archive/2007/10/05/top-10-common-causes-of-slow-replication-with-dfsr.aspx

    and this to calculate min staging size

    http://blogs.technet.com/b/askds/archive/2011/07/13/how-to-determine-the-minimum-staging-area-dfsr-needs-for-a-replicated-folder.aspx


    Isaac Oben MCITP:EA, MCSE,MCC View my MCP Certifications

    segunda-feira, 8 de abril de 2013 21:24
  • Hey Isaac,

    I have done some numbers on the shares with the most back logs. In all cases, the quota is well above the size of the 32 largest files, in some of them I might have gone overboard and assigned a too ample quota (400 GB quota, 24 GB 32 largest files).

    I have come to the conclusion that if I can't fix it by this next week, I'll just erase everything and reseed DFS with a copy of the current data.

    Thanks


    • Editado Havs123 segunda-feira, 8 de abril de 2013 22:07
    segunda-feira, 8 de abril de 2013 22:07
  • Hi,

    Before testing to recreate the DFS replication group, you could test to recreate DFS database:

    1.       Stop and ALSO disable the DFSR service on <ServerA> server (don't just simply stop it)

    2.       In Windows Explorer open the specific drive

    3.       Right click on the "System Volume Information" directory and select Properties\Security

    Note: You might need to select the option for "Show hidden files, folders or drives" and also uncheck "Hide protected operating system files" in the folders view options to be able to even see the "System Volume Information" directory.

    4.       Grant your user account that you're logged in with (if a member of Administrators group this will also suffice) "Full Control" to the "System Volume Information" directory.

    Note: You may get an error on setting security on some files - this is expected.

    5.       Open an elevated/Administrative command prompt. Switch to the "<drive letter>:\System Volume Information" directory

    6.       Type the command "rmdir DFSR /s"

    7.       Enable and re-start the DFSR service on <ServerA> server

    8.       We will then set the <ServerA> server as the Primary member with dfsradmin.exe utility –

    Dfsradmin Membership Set /RGName:<RG Name> /RFName:<RF Name> /MemName:<Member Name> /IsPrimary:True

    Note: Files will be replicated from ServerA to all other targets. So if there is any newer file on other target servers, backup before starting replication.


    TechNet Subscriber Support in forum |If you have any feedback on our support, please contact tnmff@microsoft.com.

    terça-feira, 9 de abril de 2013 05:02
    Moderador
  • Thanks for the reply Shanon. Just a couple of questions:

    1. By <Server A> I guess you mean the server on regional offices, and not on the datacenter. If that is the case it would be Ejenal2?

    2. The problematic servers I exposed in my original post are actually part of a larger array. We have four  remote offices, all which are replicating to our data center. Will removing the database affect any of the other locations?

    Thanks

    quarta-feira, 10 de abril de 2013 15:04
  • Just a follow up: I rechecked the backlogs on the servers today and there was some substantial progress. I documented it in my original post, so you might want to recheck it.
    quarta-feira, 10 de abril de 2013 15:34
  • Hi,

    Sorry for not explaining more detailed.

    AsI mentioned in the end of my reply, <ServerA> will be set as Primary member. So it should be the server which has more updated data.

    Note: As DFSR stopped working for a while, we may get updated data on more than 1 server target. If that is the case, you will first try to do a robocopy for copying udpated files from other servers to <ServerA>.


    TechNet Subscriber Support in forum |If you have any feedback on our support, please contact tnmff@microsoft.com.

    segunda-feira, 15 de abril de 2013 08:36
    Moderador