none
DPM 2010 Slow Incremental Tape Backup RRS feed

  • Question

  • The issue is one that has been raised here before but the answers that have worked for others haven't worked for us.

    Basically we have a file server that has a 690GB shared drive.  Yes we know that lots of small files = slow backup compared with non-file servers but we have had another backup software backing up this server previously and the time differences are too large to ignore.  The other backup software is EMC Networker and the approx times are shown below.

    To do an incremental backup takes on average 2 hours on Networker, whereas on DPM it takes about 9 hours.  To do a full backup on DPM takes 14 & 3/4 hours.


    Remedies that we have already tried are
    1) Add exceptions in Forefront Client Security to the DPM program folder on the file server
    2) Increased the page file size on the file server from 2-4GB to 6-8GB (DPM server page file 8GB)
    3) Turned on On-The-Wire compression
    4) Removed Forefront Client Security on the file server (and then reinstalled when it made no difference) - DPM Server does not have antivirus     installed.
       

    In the event viewer, because we have audit logon/logoff events on via group policy for the file server (and others) we noticed that the DPM server computer account is logging on to the file server once every few seconds for the duration of the backup.  Some examples are as follows.

    Event Type:    Success Audit
    Event Source:    Security
    Event Category:    Logon/Logoff
    Event ID:    540
    Date:        26/09/2011
    Time:        22:00:01
    User:        <DOMAIN>\<DPM SERVER>$
    Computer:    <FILE SERVER>
    Description:
    Successful Network Logon:
         User Name:    <DPM SERVER>$
         Domain:        <DOMAIN>
         Logon ID:        (0x0,0x8E2F0C4)
         Logon Type:    3
         Logon Process:    Kerberos
         Authentication Package:    Kerberos
         Workstation Name:   
         Logon GUID:    {bf8cf819-66cf-24a9-f811-901e0b07ffce}
         Caller User Name:    -
         Caller Domain:    -
         Caller Logon ID:    -
         Caller Process ID: -
         Transited Services: -
         Source Network Address:    <DPM SERVER IP ADDRESS>
         Source Port:    52465
    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    Event Type:    Success Audit
    Event Source:    Security
    Event Category:    Logon/Logoff
    Event ID:    552
    Date:        26/09/2011
    Time:        22:00:01
    User:        NT AUTHORITY\SYSTEM
    Computer:    <FILE SERVER>
    Description:
    Logon attempt using explicit credentials:
     Logged on user:
         User Name:    <FILE SERVER>$
         Domain:        <DOMAIN>
         Logon ID:        (0x0,0x3E7)
         Logon GUID:    {b4d07fe8-a703-84c8-4089-d029d7dc9fdb}
     User whose credentials were used:
         Target User Name:    <FILE SERVER>$
         Target Domain:    <DOMAIN>.<DOMAIN>.<DOMAIN>
         Target Logon GUID: {8c380c57-8355-90f9-73a9-d49ab1f7269b}

     Target Server Name:    <DPM SERVER>.<DOMAIN>.<DOMAIN>.<DOMAIN>
     Target Server Info:    HOST/<DPM SERVER>.<DOMAIN>.<DOMAIN>.<DOMAIN>
     Caller Process ID:    468
     Source Network Address:    -
     Source Port:    -
    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.


    Turned off auditing of the logon and logoff events for the file server (note: the DPM server does not have logon/logoff auditing) but it made no difference to the backup times.


    DPM Server set up

    DPM Installation - DPM 2010 Build 3.0.7706.0
    O/S - Windows 2008 R2 Std SP1
    Processor - Intel Xeon E3113 @3GHz
    Memory - 8GB
    Disc - 278GB (formatted), 35GB used
    DPM Database - SQL 2008 Database Server
    Tape Library - SAS to a Dell TL2000 LTO4 single tape library
    Backup method - Short term tape backup (once a week full, other days incremental) - There is no disc backup
    Servers being backed up - Initially it was set up for approx 12 (mainly physical with the odd virtual) but when the issue was found this was reduced to 2 physical servers (one of which is the file server) plus the DPM databases.

    I cannot understand why the differences between the DPM tape backup and EMC Networker are so big, considering that it is all done over the same network and Networker is backing up to LTO2 (the DPM/Networker/File server are all connected to the same switch).
    • Edited by AndyHeywood Tuesday, October 4, 2011 2:33 PM
    Thursday, September 29, 2011 10:39 AM

Answers

  • Hi,

    I received confirmation that our incremental tape backup [D-T only support] is not based on usn journal. To improve the performance for small files we need to redesign some of the flows in the product.  Unfortunatly there is no workaround for this issue. Thanks for reporting your experience,  I will work with the product group to see if we can make our product better.


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Thursday, October 13, 2011 7:22 PM
    Moderator
  • Hi,

    This will most likely be a Post-RTM DPM2012 enhancement.


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    • Marked as answer by AndyHeywood Wednesday, October 19, 2011 11:26 AM
    Monday, October 17, 2011 2:53 PM
    Moderator

All replies

  • Hi,

    What is the average size file on that volume ?    If there are tens of thousands of small files (100K to a few MB in size), then this is going to be expected behavior. 

    We had another costomer observe the following:

    For the volumes where the tape backup is very slow (12MB/s - 19 MB/s) the average file size varies from (288KB – 683KB)

    Number of files avg size per file (data size/number of files) avg backup speed to tape
    H: 191k+ 683 KB 14.8 MB/s
    I: 764k+ 602 KB 12.1 MB/s
    P: 789k+ 288 KB 13.4 MB/s
    X: 800k+ 582 KB 18.8 MB/s
    V: 40k+ 15.6 MB 41.8 MB/s   <---  This seemed to be acceptable backup speed , but still not optimal.

    DPM writes to tape using the Microsoft tape Format (MTF) - the same format that ntbackup.exe used and the Windows createfile API is called to write a file to tape.  That is the same call used to write a file to disk, and in fact if you were to backup the same file set to disk, you would actually see similar backup times.  The difference being is disk backup would be able to backup multiple files simultaniously, whereas tape backup is sequential in nature, so times would be longer.   I'm suspectng that EMC networker is writing to tape in a propriatory format that does not use the windows createfile API.

     


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Friday, September 30, 2011 4:05 AM
    Moderator
  • Thanks for the reply Mike - However, when it has to do an incremental backup it is only backing up 1.65GB of files but it takes nearly 9 hours.  Are you seriously telling me a write to disk would take the same time?

    It doesn't just seem to be the case that Networker can write to LTO2 faster than DPM can write to LTO4, but also that it can interrogate the file server quicker to find the incremental files it needs to back up.


    • Edited by AndyHeywood Friday, September 30, 2011 9:14 AM
    Friday, September 30, 2011 9:12 AM
  • Hi,

    <snip>
    To do an incremental backup takes on average 2 hours on Networker, whereas on DPM it takes about 9 hours. 
    >snip<

    So are you saying Both of the times above are only for 1.65GB ?   It sounds like you do indeed have bigger problems.

    About the disk numbers, I was referring to initial replication when you first protect the volume and we need to transfer all files from PS to DPM replica.  The incremental synchronizations would be fast since those are only block level changes.  You would not see any degregation if doing DPM Disk to Disk backups.

     


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Friday, September 30, 2011 12:54 PM
    Moderator
  • Thanks again for your reply

    The last backup time from Networker to do an incremental backup was faster than I previously thought (35 minutes to backup 1.25GB)

    As stated previously we are not doing any disk to disk backups on this system, the reason being it is a stop-gap while the servers are being virtualised.  Once they are virtualised they will be disk to disk to tape (using another DPM server).  We are still having to back up the servers with Networker as well (including this file server) until DPM is able to back up direct to tape in a fashion that stops the backups running into our operating hours.  We have made sure that the Networker backups have finished before we start the DPM backups.  If DPM is able to back up quick enough we can then decommission our Networker server and LTO2 tape library, and use DPM as our sole backup solution.  At the moment though, it is not proving viable for tape only backup.

    The figures you requested are:

    1,407,876 files at an average of 490KB.  It takes 14 ¾ hours to do a complete full backup to tape at an average speed of 12.6MB/sec.

    I don’t have the statistics for a full backup with Networker as it only displays the results of the last backup (which in this case was incremental) but I’m pretty sure it’s a lot quicker than 14 ¾ hours.

    Just so you know I’m not anti-DPM, we have had it since DPM 2007 and have seen great improvements in functionality and usability with the 2010 release, and the disk to disk to tape backup in that setup works perfectly fine.

    Sunday, October 2, 2011 1:52 PM
  • Here is some additional info that may assist in locating where the problem may lie.

    I have compared the backup times from the additional physical server that is also being protected by our tape only DPM server (labelled Reference Server below) to see what the difference is and here are the results

    File server - 1,407,876 files at an average of 490KB (approx 680GB total)

    Reference server – 36409 files at an average of 1513KB (approx 55GB total)

    Full backup times

    File server – 885 minutes to complete, 676GB transferred, equates to 780MB/min or 13MB/sec

    Reference server – 38 minutes to complete, 41GB transferred, equates to 1.4GB/min or 23MB/sec

    Incremental backup times

    File server (quickest) – 525 minutes to complete, 1.65GB transferred, equates to 3.2MB/min or 55KB/sec

    File server (slowest) – 660 minutes to complete, 300MB transferred, equates to 450KB/min or 8KB/sec

    File server (Networker) - 35 minutes to complete, 1.25GB transferred, equates to 36MB/min or 610KB/sec

    Reference server (quickest) – 16 minutes to complete, 320MB transferred, equates to 20MB/min or 333KB/sec

    Reference server (slowest) – 25 minutes to complete, 7GB transferred, equates to 286MB/min or 5MB/sec

    Reference server (Networker) – 25 minutes to complete, 6.6GB transferred, equates to 270MB/min or 4.5MB/sec

    Obviously the write speeds of the incremental backups are representative rather than average write speeds

    As a result I believe that the time it takes to do full tape backups is acceptable given the data that has come off from our test, relative to average file size, and comparing it against the results from the other customer that you posted.  However, the difference in time in the reference server is minimal between DPM and Networker when incrementally backing up similar amounts of data, so it appears that DPM only really falls short when incrementally backing up file shares with lots of small files.  Any ideas?

    Additionally, how does DPM look for changed files that it has to back up when doing an incremental tape backup?  Does it search the entire drive comparing the files with ones it has backed up and looking for files that have changed since the last backup?

    As it looks like the issue is relates to the incremental backups only, I have changed the title of the posting to reflect this. 

    Tuesday, October 4, 2011 2:35 PM
  • Hi,

     

    Very interesting results, and to be honest, I'm shocked to see incrementals taking that long - I guess I will need to test this myself in a lab.

    DPM should be leveraging the NTFS USN Journal to see what files have been added / changed on the volume between the last backup time and the current backup time.   This "should" be a very quick and effecient way of just backing up files that are new or changed, so I'm at a total loss as to why it is taking so long to perform the incremental backups.

     


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Tuesday, October 4, 2011 3:25 PM
    Moderator
  • Thanks for the reply. Is there any way of confirming that the NTFS USN Journal is operating and configured correctly on the file server, or is this something that just "works" by default? Just in case this has any bearing on the matter, we have Volume Shadow Copy enabled on the file share.
    Wednesday, October 5, 2011 1:16 PM
  • Hi,

    DPM will create a 300MB usn journal on all volumes added to protection.  You can verify it's existance and size using the following command:

     

    C:\Windows\system32>fsutil usn queryjournal c:
    Usn Journal ID   : 0x01ca5cc3a50814f5
    First Usn        : 0x0000000281380000
    Next Usn         : 0x0000000294e16930
    Lowest Valid Usn : 0x0000000000000000
    Max Usn          : 0x7fffffffffff0000
    Maximum Size     : 0x0000000012c00000  <--- equals 314,572,800 decimal
    Allocation Delta : 0x0000000001e00000

    As changes occur on the volume, the "Next Usn" value will increase.


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Wednesday, October 5, 2011 3:44 PM
    Moderator
  • Hi

     

    Results from our file server

     

    C:\Windows\system32>fsutil usn queryjournal d:

    Usn Journal ID   : 0x01cc706f57c47228

    First Usn        : 0x00000000076c0000

    Next Usn         : 0x000000000bb14720

    Lowest Valid Usn : 0x0000000000000000

    Max Usn          : 0x00000fffffff0000

    Maximum Size     : 0x0000000004000000

    Allocation Delta : 0x0000000000640000

    Wednesday, October 5, 2011 3:55 PM
  • Hi,

    I'm having disussions with the product group, and they have told me that we actually check each files modifed date, so that means walking the directory structure for every file - ouch.   I'm going to double-verify their remarks and confirm USN Journal is not used for Disk to Tape protection and only for Disk to Disk.

    That's what I love about my job, I learn something new every day. 


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Wednesday, October 5, 2011 4:09 PM
    Moderator
  • I await your double-verification with interest

    I’m guessing it would use either Date Created or Date Modified otherwise you could unpack a zip file of old files to the file server and they would be missed on an incremental backup as their modified date would be older, even though the created date is newer.  I think I’ll test that just to be sure….

    Test file

    Created: 01 February 2010, 13:39:18

    Modified: 02 April 2003, 21:07:40

    Accessed: 06 October 2011, 10:54:33

    Test file copied to file server

    Created: 06 October 2011, 10:56:08

    Modified: 02 April 2003, 21:07:40

    Accessed: 06 October 2011, 10:56:08

    Will post the result tomorrow after an incremental backup

    Thursday, October 6, 2011 10:40 AM
  • Update: File backed up on the incremental backup last night so there is no issue with that

    Friday, October 7, 2011 11:21 AM
  • Hi - Just wondering if you had any update on the verification you were undertaking?
    Thursday, October 13, 2011 1:54 PM
  • Hi Andy,

    Sorry, this fell off my plate, I'm waiting on the product group to confirm our design.  I'll ping them again. 


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Thursday, October 13, 2011 2:40 PM
    Moderator
  • Hi,

    I received confirmation that our incremental tape backup [D-T only support] is not based on usn journal. To improve the performance for small files we need to redesign some of the flows in the product.  Unfortunatly there is no workaround for this issue. Thanks for reporting your experience,  I will work with the product group to see if we can make our product better.


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Thursday, October 13, 2011 7:22 PM
    Moderator
  • Hi,

    Thanks for the update.  Is there any timescale on this (are we talking weeks, months etc).

    Also is this likely to be a patch/service pack for existing installations or will it be only available in a new release?

    Sunday, October 16, 2011 9:30 AM
  • Hi,

    This will most likely be a Post-RTM DPM2012 enhancement.


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    • Marked as answer by AndyHeywood Wednesday, October 19, 2011 11:26 AM
    Monday, October 17, 2011 2:53 PM
    Moderator
  • Thanks for letting us know Mike, however until this is implemented we will have to use another backup product.

     When this functionality is introduced into DPM, where will the information be published/how will users be notified?

    Wednesday, October 19, 2011 11:26 AM