none
DPM 2010 - Windows Storage Server 2008 Replica Constantly Inconsistent RRS feed

  • Question

  • Hi,

    I hope someone can help me and this is driving me mad and starting to worry me, we have recently move all our student and staff user areas over to 2 next storage servers (Dell NX3000) with replicated volumes to DR. This is working brilliant however we are having a problem backing this up using DPM.

    All the shares and areas are on the same partition about 9TB and I have added the top level shares on that drive to be backed up, this worked perfectly before on our original file servers (Win 2003R2, 2TB) the problem is I have only managed to backup twice in about a month. The drive is either constantly inconsistent or a recovery fails, sometime even locking the entire backup server and forcing me to do a hard reboot.

    At the moment I’m trying to do a consistently check by according to the transferred data amount it’s still zero and has been running for 2 days now.

    I’m trying to back up about 1.4TB in total worth which I hope should be manageable.

    Please could anyone advise what I need to check to find out what’s going on?

     

    You help is much appreciated,

    Rob
    Great Baddow High School

    Monday, May 16, 2011 10:17 AM

Answers

  • Hello,

    The DPM error logs are located:

    DPM Intallation          C:\DPMLogs
    Client Side Activity     %Program Files%\Microsoft Data Protection Manager\DPM\Temp
    DPM Server Activity    %Program Files%\Microsoft DPM\DPM\Temp

    Some of the logs howerver are not easy to read unless you have experience with it.

    So to reiterate:

    1.) Chimney and RSS was disabled on both sides for testing.
    2.) NIC drivers were updated.
    3.) This article was followed: http://technet.microsoft.com/en-us/library/ff399439.aspx

    Still no transfer of data was done. Correct?
    If you go to the monitoring tab and create a custom filter for that server for the past 7 days for all events. Do you see:
    1.) Scheduled jobs at least
    2.) Completed or failed jobs. If you see failed jobs do they all show a transfer of "0" or did at least some data transfer?

    Can you check to make sure that this DPM server is brought to the latest binaries of 7707.  http://support.microsoft.com/kb/2465832

    Thanks,
    Shane

    • Marked as answer by Rob Fuller Tuesday, June 21, 2011 9:24 AM
    Thursday, June 2, 2011 1:20 PM
    Moderator

All replies

  • Can you provide more information on failures - like error code.

    consistency check shows only data transferred - which can be quite less than the data verified.

     

    /Arun


    Tuesday, May 17, 2011 3:35 PM
  • I've been doing some more digging on the server being backed up, we also have a problem on the server were our DFS shares will stop replicating randomly without error, running DFS testing would bring back nothing concussive. Very similar to DPM problem without reporting any errors, but it basically doesn't transfer any data and evenly gives up.

    You know DPM is working as you can watch it in performance monitor as the agent scans the disks and copies the files over the network exactly like DFSRS but other times this just completely stops, even restarting the services doesn’t help. I have to reboot the server to get everything working again.

    Looking in the system event log I can see one error which looks interesting

    Event ID: 2012

    Source: srv

    "While transmitting or receiving data, the server encountered a network error. Occassional errors are expected, but large amounts of these indicate a possible error in your network configuration.  The error status code is contained within the returned data (formatted as Words) and may point you towards the problem."

    Not sure if its related, but its like a network glitch of some sort causes the services to fail and not recover. I’ve checked the switch and cannot see any errors on the ports.

    I could try upgrading the NIC firmware and drivers (Broadcoms)

    Tuesday, May 17, 2011 9:19 PM
  • Hello,


    The 2012 srv error is usually network related. I would start by updating the NIC drivers on both the DPM and protected server.

     

    Thanks
    Shane

    Wednesday, May 18, 2011 12:15 PM
    Moderator
  • We have seen these network related  issues with old NIC drivers or when TCP chimney offloading is enabled.

    After updating the drivers, if the issue is not resolved, you can try disabling TCP chimney on both production server and DPM server.

    To know about TCP Chimney offloading and how to disable it, check out these links

    http://support.microsoft.com/kb/951037

    http://portal.sivarajan.com/2010/02/replica-creation-is-in-progress-dpm.html


    Thanks, Surendra Singh [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Wednesday, May 18, 2011 8:20 PM
  • I shall update the drivers and firmware tonight and report back.

    Many thanks,
    Rob

    Thursday, May 19, 2011 10:12 AM
  • Drivers and Firmware have not helped, though only one server needed an update. I could try completely uninstalling rebooting and then reinstalling.

    Have disabled TCP Chimney on the storage servers first and see how that goes.

     

    Rob

    Wednesday, May 25, 2011 9:49 PM
  • If these are user documents, then it may be that they are open during the process which would prevent access.  If DPM 2010 hits more than 100 failed files (corrupt, locked, quarantined from AV) it will stop the synchronization / consistency check.  You need to verify your own errors in the DPM logs of the Primary DPM server.
    Wednesday, June 1, 2011 9:27 PM
  • Sorry but could you tell me the best logs to look in, are they located in the DPM program files folder. Have looked though some but finding it difficult to diagnose if there useful.

     

    Thanks,
    Rob

    Wednesday, June 1, 2011 10:45 PM
  • Just from the event viewer of the DPM server there is a DPM Alerts under Applications and Service Logs (Server 2008 R2).  I had to look it up, but it is Event 3106, DPM-EM which references ID: 32538 which is explained mostly here:

    http://social.technet.microsoft.com/Forums/en-US/dpmfilebackup/thread/27c48816-dabb-41a0-b800-6e1ac2fc1ed5

    Also, if these are file shares, DPM has shown some significant replication times in handling quantity of files without scale to the size of the data.  So do not be surprised that it does take a significant amount of time to back up file shares with thousands of files.

    Wednesday, June 1, 2011 11:23 PM
  • Hello,

    The DPM error logs are located:

    DPM Intallation          C:\DPMLogs
    Client Side Activity     %Program Files%\Microsoft Data Protection Manager\DPM\Temp
    DPM Server Activity    %Program Files%\Microsoft DPM\DPM\Temp

    Some of the logs howerver are not easy to read unless you have experience with it.

    So to reiterate:

    1.) Chimney and RSS was disabled on both sides for testing.
    2.) NIC drivers were updated.
    3.) This article was followed: http://technet.microsoft.com/en-us/library/ff399439.aspx

    Still no transfer of data was done. Correct?
    If you go to the monitoring tab and create a custom filter for that server for the past 7 days for all events. Do you see:
    1.) Scheduled jobs at least
    2.) Completed or failed jobs. If you see failed jobs do they all show a transfer of "0" or did at least some data transfer?

    Can you check to make sure that this DPM server is brought to the latest binaries of 7707.  http://support.microsoft.com/kb/2465832

    Thanks,
    Shane

    • Marked as answer by Rob Fuller Tuesday, June 21, 2011 9:24 AM
    Thursday, June 2, 2011 1:20 PM
    Moderator
  • Sorry for the delay in replying, it taken me a while to test each suggestions but I think I may have a break though but will monitor over the coming week to check.

    I think even though DPM was reporting all agents were up-to-date, under closer inspection the server being back up had an older agent that the other servers! I have now manually updated this and looking good so far. Will report back once 100% happy all is working again.

     

    Thank you for your help and suggestions,

    Rob

     

    Friday, June 10, 2011 6:17 PM
  • Hello,

    Just so that I am understanding you correctly, the DPM gui showed the protected server as having the proper agent version with NO "update availible" but in reality the protected server had an older version. Is this correct?

     

    Thanks,
    Shane


    Friday, June 10, 2011 9:57 PM
    Moderator
  • Yes that’s right Shane, not sure if the original update got corrupted if a file was locked for whatever reason. Still the reinstall worked.

    Many thanks for your help
    Rob

    Tuesday, June 21, 2011 9:24 AM
  • Hi All,

    I have Installed DPM 2010 in my Production Environment Last week. Its working fine.

    Here I have dought, I Given Retension Range 3 Days, and Replica will be Create Daily Basises, and the Storage Pool Size is 55 GB and Data Size is 35 GB. So how the data Grooming is happening.

       
    Storage Pool details
    on DPM (in GB)
    Protection Group
    Replica
    Replica Allocated
    ShadowCopy Allocated
    Used by Replica
    Used by Shadow Copy
    Jun 22
    CGRDOMKOR01- USERDATA & APPLICATION GROUP
    D:\
    50.73
    3.59
    27.52
    0.75

    Pls advice me..

    Regards

    Ganga

    Thursday, June 23, 2011 2:07 PM
  • Hi All,

    I got the soluction for Grooming the data in DPM 2010. The steps below...

    The data Grooming is deffence upon your assining Retension Range, Once the Retension Range is exceeded, the Old Data will be Groom. E.g

    If you Assign Retension Range 3 Days, and DPM 2010 Will take backup Continusly up to 6th day, 7 Day Morning Old 3 Date data will be Groom Automatically. So This Retension Range is helping us to save the Disk space.

    I hope this will help you.

    Regards

    Ganga

    • Proposed as answer by Gangaiyan Friday, July 1, 2011 6:37 AM
    Friday, July 1, 2011 6:37 AM