none
Deduplication of the DPM 2012 disks RRS feed

  • Question

  • Apologies but my knowledge of the inner workings of DPM 2012 is pretty new to me, coming from the Backup Exec background - and can't wait to get rid of it!

    I understand the basic mechanics of how deduplication works in Backup Exec. Each time a file is to be backed up, it's broken down into 64k blocks. A hash is calculated for each 64k block and this hash is looked up in the deduplication database. If the block already exists, then a reference to the existing block is made otherwise the new 64k block is written to the deduplication disk store. Or something a bit like that ;-)

    I'm aware that whilst DPM 2012 can backup Windows Server 2012 volumes where deduplication is enabled, it doesn't actually use deduplication itself and it's recommended to look at 3rd party tools to deduplicate the backup store. We happen to use StarWind's excellent SAN product which has deduplicated disks so there was a thought to use those. With BE, we'd be running a mile because the BE deduplication engine is pretty dire performance wise and takes days & days to backup to iSCSI. But DPM 2012 has a very different architecture what with it's synchronisation feature that the concept of long backup windows overnight or at the weekend become a thing of the past. So one can consider slower iSCSI storage and even RAID-5 (our BE system has to use RAID-10 for any kind of reasonable speed).

    But the question (yes there was one) is - if the volume being backed up is itself deduplicated (Windows Server 2012 dedup), then is having dedupe of the DPM backup disk a waste of time - as the data is already deduplicated?

    To answer this question I guess requires a deep understanding of both the Windows Server 2012 dedup architecture and how DPM 2012 backs up deduplicated files. I have said deep understanding of neither ;-)

    Cheers, Rob.

    Tuesday, April 9, 2013 8:34 PM

Answers

  • Hi,

    The answers to your questions can be found here:

    Architecture info: http://www.techrepublic.com/blog/datacenter/windows-server-8-data-deduplication-what-you-need-to-know/4887

    Protecting deduplicated volumes
    http://technet.microsoft.com/en-us/library/jj656644.aspx

    Install and Configure Data Deduplication
    http://technet.microsoft.com/en-us/library/hh831434


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Wednesday, April 10, 2013 2:47 PM
    Moderator
  • Hi,

    <snip>
    If the entire volume is protected, then DPM provides only optimal protection. If only partial deduplicated volume is protected, then DPM will provide normal backup.
    >snip<

    What this is saying, if you choose to protect the entire volume (IE D:) when adding it to a protection group, then DPM will protect it in a deduped state on the DPM Server.  If however, you choose to protect only some shares, or subfolders (IE: D:\Userdata and D:\Important) - then DPM will protect it in a non-deduped state on the DPM Server.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Thursday, April 11, 2013 3:24 PM
    Moderator

All replies

  • Hi,

    The answers to your questions can be found here:

    Architecture info: http://www.techrepublic.com/blog/datacenter/windows-server-8-data-deduplication-what-you-need-to-know/4887

    Protecting deduplicated volumes
    http://technet.microsoft.com/en-us/library/jj656644.aspx

    Install and Configure Data Deduplication
    http://technet.microsoft.com/en-us/library/hh831434


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Wednesday, April 10, 2013 2:47 PM
    Moderator
  • Hi Mike,

    Thanks for the links. The first describes the way dedupe works in a compact and understandable way. There is one sentence in the 2nd link which I don't understand:

    If the entire volume is protected, then DPM provides only optimal protection. If only partial deduplicated volume is protected, then DPM will provide normal backup.

    As far as I know, you can only turn on DPM at an entire volume level so don't understand what "partial deduplicated volume" means. Maybe if I understood what "Optimal protection" means I might understand the rest ;-)

    Although it doesn't say it directly, I suspect that when DPM synchronises the volume, it synchronises the chunk store as well as file spare/reparse metadata, specifically it's *not* writing the unoptimised file to the DPM store. This means that the data written to DPM is already deduplicated and therefore adding deduplication via StarWind/3rd parties to the DPM backup disk is not going to add any benefit. In fact, deduplicating already deduplicated data might even add an overhead!

    Most of our documents are typically Office documents like PowerPoint files. If a user saves an exisiting deduplicated document, does it get written back to disk in an unoptimised format, and then is deduplicated again (say) five days later? I guess a joint question about how PPT saves (does it write to temporary, delete and then rename or simply update blocks in the PPT file) and how dedupe then handles an update to an existing file.

    Cheers, Rob.

    Thursday, April 11, 2013 8:49 AM
  • Hi,

    <snip>
    If the entire volume is protected, then DPM provides only optimal protection. If only partial deduplicated volume is protected, then DPM will provide normal backup.
    >snip<

    What this is saying, if you choose to protect the entire volume (IE D:) when adding it to a protection group, then DPM will protect it in a deduped state on the DPM Server.  If however, you choose to protect only some shares, or subfolders (IE: D:\Userdata and D:\Important) - then DPM will protect it in a non-deduped state on the DPM Server.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Thursday, April 11, 2013 3:24 PM
    Moderator
  • Thanks Mike - understanding improving now.

    Cheers, Rob.

    Thursday, April 11, 2013 3:31 PM
  • Appologies for bumping an old thread.

    What this is saying, if you choose to protect the entire volume (IE D:) when adding it to a protection group, then DPM will protect it in a deduped state on the DPM Server.  If however, you choose to protect only some shares, or subfolders (IE: D:\Userdata and D:\Important) - then DPM will protect it in a non-deduped state on the DPM Server.

    How do file/folder exclusions work? Will data be backed up in optimized form if the entire volume is selected but files/folders excluded? This is different to selectively choosing only some files/folders within volume in which case I understand the backup will be unoptimized.

    Thanks

    Monday, July 22, 2013 9:30 AM
  • 1 more bump
    Monday, July 29, 2013 5:14 AM