Data Duplication Elimination is a Windows Server 2012 \ 2012R2 \ 2016 role service. This service identifies and removes data duplications without compromising data integrity. 

Purpose: To store more data and use less physical disk space. 

Improvements in Windows Server 2016:

  • Support for volume sizes up to 64 TB. Data Duplication Elimination in Windows Server 2012 R2 does not perform well in volumes larger than 10 TB;
  • Support for file sizes up to 1TB. In Windows Server 2012 R2, many large files are not good candidates for Deletion of Data Duplication;

Volume Duplication Elimination Volume Requirements:

After you install the role service, you can enable Data Duplication Elimination by Volume, it includes the following requirements:

  •  Volumes must not be a boot or system volume;
  •  Volumes can be partitioned using the boot master record (MBR) or GPT (GUID partition table) format and must be formatted using the NTFS or ReFS file system;
  •  Volumes must be connected to Windows Server and can not be displayed as non-removable drives .;
  •  Volumes can be in shared storage, such as Fiber Channel, iSCSI SAN, or SAS array;
  •  Files with extended attributes, encrypted files, files smaller than 32 KB, and files from the reparse point will not be processed for Deletion of Data Duplication.
  •  Data Replication Elimination is not available for Windows client operating systems.

In Windows Server 2016, Data Duplication Elimination transparently removes duplication without changing access semantics.

Planning a Data Duplication Elimination deployment:

  •  Segment deployments. Data Duplication Elimination is designed to be applied to primary data volumes - and not logically extended - without adding dedicated hardware;
  •  Determine which volumes are candidates for duplicate deletion. Duplicate deletion can be very effective in optimizing storage and reducing the amount of disk space consumed - saving 50-90 percent of the system's storage space when applied to the right data;

Useful commands:

  • Type Optimization : Optimize
  • Type Scrubbing : Cancel Job
  • Type GarbageCollection : Garbage Collection
  • Type Unoptimization : Cancel Optimization
  • Get-DedupStatus : Most commonly used, this cmdlet returns the status of duplicate deletion of volumes that have Data Duplication Elimination metadata;
  • Get-DedupVolume : This cmdlet returns the status of the de-duplication of volumes that have Data Duplication Elimination metadata;
  • Get-DedupJob : This cmdlet returns the status and de-duplication information for jobs with deduplication running or in the queue.

Video of VIVO setup: