locked
Powershell: How to find duplicate files in MOSS2007? RRS feed

Answers

  • I tried to get the Powershell code working on MOSS 2007.

    I had two issues translating this for MOSS2007. 

    1. The md5 hash values did not work well - even not for SP2010. If an Office 2007 document gets loaded into Sharepoint it adds due to property promotion/demotion certain properties to the document. This changes the md5 hash. I have now used a very easy criteria.. the filename to check if a document is unique.
    2. I do not know why.. the arrays did not work well either.

    The code might not be good, but it works for MOSS 2007 ;-)

    #Add-PSSnapin Microsoft.SharePoint.PowerShell -ErrorAction SilentlyContinue
    [system.reflection.assembly]::LoadWithPartialName("Microsoft.SharePoint")
    
    
    
    function Get-DuplicateFiles ($RootSiteUrl)
    
    {
    
    #$spSite = Get-SPSite -Identity $RootSiteUrl
    $spsite = new-object Microsoft.SharePoint.SPSite($RootSiteUrl)
    
    $Items = @()
    $Duplicates = @()
    $duplicateItems = @()
    $duplicateshelper = @()
    
    foreach ($SPweb in $spSite.allwebs)
    
    {
    
    Write-Host "Checking " $spWeb.Title " for duplicate documents"
    
    foreach ($list in $spWeb.Lists)
    
    {
    
    if($list.BaseType -eq "DocumentLibrary" -and $list.RootFolder.Url -notlike "_*" -and $list.RootFolder.Url -notlike "SitePages*")
    
    {
    
    foreach($item in $list.Items)
    
    {
    
    $record = New-Object -TypeName System.Object
    
    if($item.File.length -gt 0)
    
    {
    
    $record | Add-Member NoteProperty FileName ($item.file.Name)
    
    $record | Add-Member NoteProperty FullPath ($spWeb.Url + "/" + $item.Url)
    
    $Items += $record
    
    }
    
    }
    
    }
    
    }
    
    $spWeb.Dispose()
    
    $duplicateItems = $Items | Group-Object Filename| Where-Object {$_.Count -gt 1}
    
    foreach($dup in $duplicateItems)
    {
    foreach($item in $Items | Where-Object {$_.Filename -eq $dup.Name})
    {
    if ($duplicateshelper -notcontains $item.Fullpath) 
    {
    $duplicateshelper += $item.Fullpath
    $found  = New-Object -TypeName System.Object
    $found  | Add-Member NoteProperty Filename ($item.FileName)
    $found  | Add-Member NoteProperty Fullpath ($item.Fullpath)
    $duplicates += $found 
    }
    }
    }
    
    
    }
    
    return $duplicates  | Out-GridView
    
    }
    
    Get-DuplicateFiles("http://sp2007")


    • Edited by Sven W Sunday, March 4, 2012 6:10 PM
    • Marked as answer by Sven W Saturday, March 10, 2012 1:35 PM
    Sunday, March 4, 2012 6:06 PM

All replies

  • I tried to get the Powershell code working on MOSS 2007.

    I had two issues translating this for MOSS2007. 

    1. The md5 hash values did not work well - even not for SP2010. If an Office 2007 document gets loaded into Sharepoint it adds due to property promotion/demotion certain properties to the document. This changes the md5 hash. I have now used a very easy criteria.. the filename to check if a document is unique.
    2. I do not know why.. the arrays did not work well either.

    The code might not be good, but it works for MOSS 2007 ;-)

    #Add-PSSnapin Microsoft.SharePoint.PowerShell -ErrorAction SilentlyContinue
    [system.reflection.assembly]::LoadWithPartialName("Microsoft.SharePoint")
    
    
    
    function Get-DuplicateFiles ($RootSiteUrl)
    
    {
    
    #$spSite = Get-SPSite -Identity $RootSiteUrl
    $spsite = new-object Microsoft.SharePoint.SPSite($RootSiteUrl)
    
    $Items = @()
    $Duplicates = @()
    $duplicateItems = @()
    $duplicateshelper = @()
    
    foreach ($SPweb in $spSite.allwebs)
    
    {
    
    Write-Host "Checking " $spWeb.Title " for duplicate documents"
    
    foreach ($list in $spWeb.Lists)
    
    {
    
    if($list.BaseType -eq "DocumentLibrary" -and $list.RootFolder.Url -notlike "_*" -and $list.RootFolder.Url -notlike "SitePages*")
    
    {
    
    foreach($item in $list.Items)
    
    {
    
    $record = New-Object -TypeName System.Object
    
    if($item.File.length -gt 0)
    
    {
    
    $record | Add-Member NoteProperty FileName ($item.file.Name)
    
    $record | Add-Member NoteProperty FullPath ($spWeb.Url + "/" + $item.Url)
    
    $Items += $record
    
    }
    
    }
    
    }
    
    }
    
    $spWeb.Dispose()
    
    $duplicateItems = $Items | Group-Object Filename| Where-Object {$_.Count -gt 1}
    
    foreach($dup in $duplicateItems)
    {
    foreach($item in $Items | Where-Object {$_.Filename -eq $dup.Name})
    {
    if ($duplicateshelper -notcontains $item.Fullpath) 
    {
    $duplicateshelper += $item.Fullpath
    $found  = New-Object -TypeName System.Object
    $found  | Add-Member NoteProperty Filename ($item.FileName)
    $found  | Add-Member NoteProperty Fullpath ($item.Fullpath)
    $duplicates += $found 
    }
    }
    }
    
    
    }
    
    return $duplicates  | Out-GridView
    
    }
    
    Get-DuplicateFiles("http://sp2007")


    • Edited by Sven W Sunday, March 4, 2012 6:10 PM
    • Marked as answer by Sven W Saturday, March 10, 2012 1:35 PM
    Sunday, March 4, 2012 6:06 PM
  • This is how the output looks like:


    Sunday, March 4, 2012 6:09 PM