locked
Best Performance for looping over Files - ForEachObject vs Looping over Variable RRS feed

  • Question

  • Hello 

    I need a fast way of looping over files and I thought that the foreach object cmdlet was the best way, because I do not have to save it first in a variable.

    But when I run the below script, the version which first saves the files in a variable is faster, any explanation for that? And is this the best way or does somebody has suggestions how to increase the performance of my script?

    Thanks

    $Global:WorkSpace = "C:\temp"
    
    $DirectLoopResult = Measure-Command { 
    Get-Childitem -Path $Global:WorkSpace -File -Recurse -ErrorAction SilentlyContinue | ForEach-Object {
        $Position = $_.Name.indexof("_")
        $ObjectType = $_.Name.Substring(0,$Position)
        Add-Member -InputObject $_ -MemberType "NoteProperty" -Name "ObjectType" -Value $ObjectType
    }
    }
    
    
    
    $VariableLoopResult = Measure-Command {
    $Files = Get-Childitem -Path $Global:WorkSpace -File -Recurse -ErrorAction SilentlyContinue
    
    foreach ($File in $Files) {
        $Position = $File.Name.indexof("_")
        $ObjectType = $File.Name.Substring(0,$Position)
        Add-Member -InputObject $File -MemberType "NoteProperty" -Name "ObjectType" -Value $ObjectType
    }
    }
    
    Write-Host "DirectLoop: $($DirectLoopResult.TotalMilliseconds)"
    Write-Host "VariableLoop: $($SeperateLoopResult.TotalMilliseconds)"
    
    $Files | Select-Object -Property {$_.Directory.Name},ObjectType,Name,FullName,LastWriteTime| Out-GridView -PassThru -OutVariable $SelectedFiles
    Write-Host $SelectedFiles
    

    Result: 

    DirectLoop: 1543.0484
    VariableLoop: 1328.9268
    Monday, July 1, 2019 8:19 AM

All replies

  • Writing a correct pipeline would be faster than either of your examples.  FOr most things, when correctly used, a pipeline is faster.


    \_(ツ)_/


    • Edited by jrv Monday, July 1, 2019 8:28 AM
    Monday, July 1, 2019 8:28 AM
  • While Lee's links are good information the exact issue of performance in a loop is far more complex.  In PowerShell V1 the issue of performance in many situations favored a "foreach()" construct.  Later versions have favored the "ForEach-Object" assuming that you are feeding a new collection into the pipeline.   Storing a collection in memory first will almost always kill performance for any kind of loop. Larger collections mean worse performance.

    In programming we always write for clarity and simplicity first. Optimization only comes when we have a need to beat some performance criteria.  In scraping this is seldom an issue.  Correct use of the resources of the overall scripting environment are much more important than simple loop optimization. If you are using loops when other facilities are available to perform the taks then you are likely not understanding the overall capabilities of the OS or the task of programming.

    New user to PowerShell that have no programming experience should not be concerned with technical issue like loops.  First learn the basics of programming.  Applying the basics and understanding the technology comes first.  The rest will become clear once you have had sufficient experience.


    \_(ツ)_/

    • Proposed as answer by BOfH-666 Tuesday, July 2, 2019 5:52 AM
    Tuesday, July 2, 2019 4:07 AM
  • Hi,

    Was your issue resolved?

    If you resolved it using our solution, please "mark it as answer" to help other community members find the helpful reply quickly.

    If you resolve it using your own solution, please share your experience and solution here. It will be very beneficial for other community members who have similar questions.

    If no, please reply and tell us the current situation in order to provide further help.

    Best Regards,

    Lee


    Just do it.

    Wednesday, July 31, 2019 8:40 AM