none
Powershell - searching for empty folders and writing them into an excel spreadsheet

    Question

  • I haven't really used Powershell before so please bear with me ...

    I would like to write a Powershell script to search through a complex folder structure several tiers deep, and find all of the folders which include a "PDF" folder that is "empty".  The results of this need to go into a usable file such as an excel spreadsheet or .txt document.

    E.G. M:\DO\Dies\*\*\PDF\

    I expect there to potentially be hundreds, if not thousands of results ... I am not interested in other empty folders ... just ones with the name PDF.  It is expected that the PDF folders will otherwise include a file of type .PDF if they are not empty.

    Can anyone help with this?  Thanks in advance  :)

    Monday, February 27, 2012 1:33 AM

Answers

  • I try to avoid get-childitem on big directory structures. 

    Try this:

    ((cmd /c dir M:\DO\Dies\ /s /ad /b) -match '\\pdf?')|
    where {-not (test-path $_\*.pdf)} |
    out-file emptypdf.txt


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Monday, February 27, 2012 1:59 AM
    Moderator

All replies

  • I try to avoid get-childitem on big directory structures. 

    Try this:

    ((cmd /c dir M:\DO\Dies\ /s /ad /b) -match '\\pdf?')|
    where {-not (test-path $_\*.pdf)} |
    out-file emptypdf.txt


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Monday, February 27, 2012 1:59 AM
    Moderator
  • This is how I'd do it in pure Powershell:

    $rootpath = 'C:\Scripts'
    $emptyFolders = @()
    $folders = Get-ChildItem -Path $rootpath -Filter "PDF" -Recurse| Where-Object {$_.PSIsContainer} 
    foreach ($folder in $folders) {
        if (@(Get-ChildItem $folder.pspath).count -eq 0) {
            $emptyFolders += "$($folder.PSPath | split-path -noqual) is Empty"}
        else { "$($folder.PSPath | split-path -noqual) has files" }
        }
    Out-File -FilePath 'Empty PDF Folders.txt' -InputObject $emptyFolders


    Grant Ward, a.k.a. Bigteddy

    What's new in Powershell 3.0 (Technet Wiki)

    Monday, February 27, 2012 7:40 AM
  • Triggered by what mjolinor said about avoiding get-childitem on big directory structure I have run a comparison to see how big the difference between dir and get-childitem is. I ran it against folder structures on the same storage device to see how dir performs in relation to get-childitem. For get-childitem i ran two different queries one with | select fullname to match the output generated by dir and one normal run without any pipelining.

    Here are the results and the commands I ran:

    ~640k files/folders
    [int]$a=0;1..2 | foreach {$a+=(measure-command {(cmd /c dir $testpath /b /s)}).totalmilliseconds};$a/2
    200424 
    [int]$a=0;1..2 | foreach {$a+=(measure-command {(get-childitem $testpath -recurse -force | select fullname)}).totalmilliseconds};$a/2
    575880.5
    [int]$a=0;1..2 | foreach {$a+=(measure-command {(get-childitem $testpath -recurse -force)}).totalmilliseconds};$a/2
    298176.5




    ~13k files/folders
    [int]$a=0;1..10 | foreach {$a+=(measure-command {(cmd /c dir $testpath /b /s)}).totalmilliseconds};$a/10
    2619.9
    [int]$a=0;1..10 | foreach {$a+=(measure-command {(get-childitem $testpath -recurse -force | select fullname)}).totalmilliseconds};$a/10
    5105.4
    [int]$a=0;1..10 | foreach {$a+=(measure-command {(get-childitem $testpath -recurse -force)}).totalmilliseconds};$a/10
    4219.7


    ~1k files/folders
    [int]$a=0;1..10 | foreach {$a+=(measure-command {(cmd /c dir $testpath /b /s)}).totalmilliseconds};$a/10
    982.9
    [int]$a=0;1..10 | foreach {$a+=(measure-command {(get-childitem $testpath -recurse -force | select fullname)}).totalmilliseconds};$a/10
    1644.4
    [int]$a=0;1..10 | foreach {$a+=(measure-command {(get-childitem $testpath -recurse -force)}).totalmilliseconds};$a/10
    1587

    Monday, February 27, 2012 10:17 AM
    Moderator
  • So it seems like gci is just under twice as slow.  We are talking milliseconds here, your example for 13000 files was 5 sec vs. 2.6 sec.

    So the user would have to wait for and extra 2.4 seconds to get his results over 13000 files.  That's not too bad.  As you show, for about 1000 files, the difference is less than a second.

    I don't think it's worth writing a "mixed" script for such a slight increase in performance.


    Grant Ward, a.k.a. Bigteddy

    What's new in Powershell 3.0 (Technet Wiki)

    Monday, February 27, 2012 10:38 AM
  • At about 300,000 files the performance of get-childitem goes "hockey stick":

     

    http://blogs.msdn.com/b/powershell/archive/2009/11/04/why-is-get-childitem-so-slow.aspx

     


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Monday, February 27, 2012 11:25 AM
    Moderator
  • I would say the real test will be when/if the OP compares the two methods on his file structure.   The OP mentioned "hundeds, if not thousands" of files, not hundreds of thousands.  The time difference may be significant, or it may not.

    Grant Ward, a.k.a. Bigteddy

    What's new in Powershell 3.0 (Technet Wiki)

    Monday, February 27, 2012 11:57 AM
  • He said he expects the results to be "hundeds, if not thousands". Given that the results he's after is just the empty PDF folders, I inferred (maybe incorrectly) that the initial file structure he's dealing with could be get into hundreds of thousands of directory items.

    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "


    Monday, February 27, 2012 12:11 PM
    Moderator
  • Yes, I suppose we can assume that there will be lots and lots of files to search through.

    I know that gci isn't as good as cmd for large directory searches.  I'm just a bit of a purist, and like to use "pure" PS where possible, for the sake of consistency in syntax and understandability/maintainability.


    Grant Ward, a.k.a. Bigteddy

    What's new in Powershell 3.0 (Technet Wiki)

    Monday, February 27, 2012 12:21 PM
  • I tried to avoid searching the files by using test-path instead of gci to find out if the directory is empty. GCI is going to return all the items in the directory and then you have to test the count to get a true/false on it being empty.  Test-path will return True as soon as it finds the first one.

    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "


    Monday, February 27, 2012 12:38 PM
    Moderator
  • O.T. : What will be the policy for the Scripting Games 2012?  Will they allow us to use cmd /c ?  Or will it be restricted to 'pure' Powershell?  What is 'pure' Powershell?  Does ipconfig count?

    Grant Ward, a.k.a. Bigteddy

    What's new in Powershell 3.0 (Technet Wiki)

    Monday, February 27, 2012 12:41 PM
  • I imagine they'll let you use anything that would be available to your script in a Windows environment. The only reason I used cmd /c for that is because Get-ChildItem is aliased to DIR.  I could do without the cmd /c by just removing that alias.

    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "


    Monday, February 27, 2012 12:45 PM
    Moderator
  • I tried to avoid searching the files by using test-path instead of gci to find out if the directory is empty. GCI is going to return all the items in the directory and then you have to test the count to get a true/false on it being empty.  Test-path will return True as soon as it finds the first one.

    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "



    I think that's a really cool tip.  Thanks!

    Grant Ward, a.k.a. Bigteddy

    What's new in Powershell 3.0 (Technet Wiki)

    Monday, February 27, 2012 12:51 PM
  • Here is my new, optimised 'pure' Powershell version:

    filter ListEmptyFolders {if ($_.PSIsContainer -and !(Test-Path "$_\*")){$_.fullname}}
    Get-ChildItem -Recurse | ListEmptyFolders | Out-File -FilePath 'EmptyFolders.txt'


    Grant Ward, a.k.a. Bigteddy

    What's new in Powershell 3.0 (Technet Wiki)

    Tuesday, February 28, 2012 7:08 AM
  • Testing these two methods, surprisingly, $_.getfiles() is faster than Test-Path:

    Measure-Command {
    filter ListEmptyFolders {if ($_.PSIsContainer -and !(Test-Path "$_\*")){$_.fullname}}
    Get-ChildItem C:\Users\Grant\Documents -Recurse | ListEmptyFolders | Out-Null
    }

    TotalMilliseconds : 2644.0623

    Measure-Command {
    filter ListEmptyFolders {if ($_.PSIsContainer -and !($_.getfiles())){$_.fullname}}
    Get-ChildItem C:\Users\Grant\Documents -Recurse | ListEmptyFolders | Out-Null
    }

    TotalMilliseconds : 1496.3132

    Grant Ward, a.k.a. Bigteddy

    What's new in Powershell 3.0 (Technet Wiki)

    Tuesday, February 28, 2012 9:43 AM