locked
Parallel Processing RRS feed

  • Question

  • Hi,

    I'm working on something that takes a line-separated list of computers and copies some files to a directory on the C:\ drive. I'm using ForEach-Object which means it will process the computers one-by-one.

    I've read a bit on Parallel processing but not sure how best to tackle this one. I want only the computers to be processed parallel and not the files, and I need each file copy operation to be timed.

    Being able to set the "batch" size (as in 200 computers in list, but do in batches of 25 for example) would be cool, but not a necessity.

    $computerlist = Get-Content "computerlist.txt"
    $filelist = Get-ChildItem -Path "Source" -File
    $timer = New-Object -TypeName System.Diagnostics.Stopwatch
    
    Write-Host ""
    
    $computerlist | ForEach-Object {
    
        $computer = $_
        $destination = ("\\" + $computer + "\C$\Somedirectory")
    
        if (-Not (Test-Path -Path $destination)) {
        
            New-Item -ItemType Directory -Path $destination | Out-Null
    
        }
            
        $filelist | ForEach-Object {
    
            $timer.Start()
            Copy-Item $_.FullName -Destination $destination -Force
            $timer.Stop()
            Write-Host "TRANSFER RATE: {0:N2}MB/sec" -f (($_.Length / $timer.Elapsed.TotalSeconds)/1MB))
            $timer.Reset()
    
        }
    
    }
    
    Write-Host ""

    Thanks

    Saturday, December 21, 2019 10:58 PM

Answers

  • Help about_workflows

    workflow CopyFiles{
        foreach -Parallel($computer in $computers){
            # code
        }
    }
    


    \_(ツ)_/

    Saturday, December 21, 2019 11:58 PM
  • Here's a basic example of copying a file to multiple destinations using jobs (or threadjobs):


    'file2.txt','file3.txt','file4.txt' |
      foreach { start-job { cd $using:pwd
        copy file1.txt $using:_ -whatif } } |
      receive-job -wait -auto

    What if: Performing the operation "Copy File" on target "Item: /Users/js/foo/file1.txt Destination: /Users/js/foo/file2.txt".
    What if: Performing the operation "Copy File" on target "Item: /Users/js/foo/file1.txt Destination: /Users/js/foo/file3.txt".
    What if: Performing the operation "Copy File" on target "Item: /Users/js/foo/file1.txt Destination: /Users/js/foo/file4.txt".

    Sunday, December 22, 2019 10:40 PM

All replies

  • Help about_workflows

    workflow CopyFiles{
        foreach -Parallel($computer in $computers){
            # code
        }
    }
    


    \_(ツ)_/

    Saturday, December 21, 2019 11:58 PM
  • Here's a basic example of copying a file to multiple destinations using jobs (or threadjobs):


    'file2.txt','file3.txt','file4.txt' |
      foreach { start-job { cd $using:pwd
        copy file1.txt $using:_ -whatif } } |
      receive-job -wait -auto

    What if: Performing the operation "Copy File" on target "Item: /Users/js/foo/file1.txt Destination: /Users/js/foo/file2.txt".
    What if: Performing the operation "Copy File" on target "Item: /Users/js/foo/file1.txt Destination: /Users/js/foo/file3.txt".
    What if: Performing the operation "Copy File" on target "Item: /Users/js/foo/file1.txt Destination: /Users/js/foo/file4.txt".

    Sunday, December 22, 2019 10:40 PM
  • Help about_workflows

    workflow CopyFiles{
        foreach -Parallel($computer in $computers){
            # code
        }
    }


    \_(ツ)_/

    Can the progress bar from a Workflow be suppressed? It's quite annoying as it cycles through all the commands it's doing.
    Tuesday, December 24, 2019 11:15 PM
  • Thanks, I'll give this one a go.

    This is my current workflow block. Can this be repeated in a Job, specifically the Write-Log stuff (which is just a function that does Write-Host):

    Workflow CopyFiles {
    
        param([object]$Files, [string]$Destination, [int]$Parallel)
    
        ForEach -Parallel -ThrottleLimit $Parallel ($file in $Files) {
    
            Sequence {
    
                [datetime]$datetime = (Get-Date -Format "dd MMMM yyyy, HH:mm:ss")
                [string]$name = $file.Name
                [string]$path = $file.FullName
                [string]$size = Format-FileSize -Size $file.Length
    
                $timer = Measure-Command {Copy-Item -Path $path -Destination $Destination -Force -ErrorAction SilentlyContinue}
                Write-Log -Text (" -> DATE: {0} | FILE: {1} | SIZE: {2} | TIME TAKEN: {3:mm\:ss\.fff} | TRANSFER RATE: {4:N2} MB/sec" -f $datetime, $path, $size, ([TimeSpan] $timer), (($file.Length / $timer.TotalSeconds) / 1MB)) -Screen $true -File $true -Color White
    
            }
    
        }
    
    }

    Thanks

    PS: Yeah I know Write-Host is a no-no, but Write-Output can't set the font color and this doesn't work when called from a workflow:

    Function Write-Log([string]$Text, [boolean]$Screen, [boolean]$File, [string]$Color = ("White")) {
    
        $fgcolor = $host.UI.RawUI.ForegroundColor
        $host.UI.RawUI.ForegroundColor = $Color
        
        if ($Screen -eq $true) {Write-Host $Text -ForegroundColor $Color}
        if ($File -eq $true) {Add-Content -Path ((Get-Date -Format "yyyy-MM-dd") + "-Log.log") -Value $Text -ErrorAction SilentlyContinue}
    
        $host.UI.RawUI.ForegroundColor = $fgcolor
    
    }




    • Edited by Lanky Doodle Tuesday, December 24, 2019 11:20 PM
    Tuesday, December 24, 2019 11:17 PM
  • workflow does not allow any console direct commands such as Write-Host. Throttle Limit should not be changed unless you understand what it does and how to use it.  The system will pick an optimum limit based on the architecture,

    Type the following:

    help CopyFiles -full

    Read all carefully until you understand what is available

    CopyFiles … params … -AsJob

    $ProgressPreference = 'SilentlyContinue'


    \_(ツ)_/

    Tuesday, December 24, 2019 11:31 PM
  • Thanks.

    I'm playing with Start-Job and have something rough working. What I'd like to do is have each job output to screen as soon as it's finished, rather than waiting for all jobs to finish.

    Max 2 jobs running at the same time.

    $files = Get-ChildItem -Path C:\Resource\STH\Sample -File | Where {$_.Length -ge 1073741824 -And $_.Length -le 1073741824}
    $dest = "C:\Temp\Test"
    
    $CopyFile = {
    
        param([object]$file, [string]$dest) 
        $timer = Measure-Command {Copy-Item -Path $file -Destination $dest -Force -ErrorAction SilentlyContinue}
        Write-Output ("{0} | {1} | {2:mm\:ss\.fff}" -f (Get-Date -Format "dd MMMM yyyy, HH:mm:ss"), $file, [Timespan] $timer)
    
    }
    
    ForEach ($file in ($files)) {
    
        While($(Get-Job -State Running).Count -ge 2) {
    
            Start-Sleep -Milliseconds 5
    
        }
    
        Start-Job -Name $file.Name -ScriptBlock $CopyFile -ArgumentList $file.FullName, $dest | Out-Null
    
    }
    
    While ((Get-Job -State Running).Count -gt 0) {
    
        Start-Sleep -Milliseconds 5
    
    }
    
    Get-Job | ForEach {
    
        Receive-Job -Id ($_.Id)
        Remove-Job -Id ($_.Id)
    
    }
    
    Get-ChildItem -Path $dest -File | Remove-Item

    Obviously the Write-Ouput bit from CopyFiles ScriptBlock doesn't appear until all jobs are finished. I know why, just can't quite figure out where to put the Receive-Job bit to invoke when the job is finished.

    I have read up on Register-ObjectEvent but the examples I've seen break Parallelism I'm trying to achieve.

    Merry Christmas!


    • Edited by Lanky Doodle Wednesday, December 25, 2019 8:57 AM
    Wednesday, December 25, 2019 8:48 AM
  • Wednesday, December 25, 2019 8:49 AM
  • Workflows will do this but jobs cannot. Jobs must be received to get output.

    I recommend taking the time to learn basic workflow and job usage in PowerShell. Once you understand what these are an how they work you will be able to correctly design a solution.

    help Receive-Job -online

    help about_jobs -showwindow


    \_(ツ)_/

    Wednesday, December 25, 2019 8:51 AM
  • Please do not post pictures of code. It is no helpful,

    Read the following:
    An image of code is not helpful


    \_(ツ)_/

    Wednesday, December 25, 2019 8:52 AM
  • Here's a nice way to wait for jobs.  It seems to come out as they finish.


    1..5 | foreach-object {
      start-job { param ($num) sleep $num; $num } -args $_
    } | receive-job -wait -AutoRemoveJob

    1
    2
    3
    4
    5


    This is the more typical powershell way to send output, as an object.  Then you can pipe it to where-object, sort-object, etc.


    [pscustomobject]@{Date = '1/1'
      File = 'file.txt'
      Time = '1:00 pm'}

    Date File     Time
    ---- ----     ----
    1/1  file.txt 1:00 pm

    • Edited by JS2010 Wednesday, December 25, 2019 5:16 PM
    Wednesday, December 25, 2019 2:47 PM
  • Thanks. In my testing from before, and even refactoring your example in to my code, the -Wait parameter makes it wait for the job to complete each time.

    This is evident from having File Explorer open to the destination folder; with -Wait on files land there one-by-one and without it they land in parallel.

    So it seems Workflows is the only way. But that means I have to use Write-Host to have colored text... but Write-Host is now supposedly a no-no!! And Workflows have the overhead of not being able to access parameters outside of itself!

    So in summary, PowerShell needs fixing. I can't be the first and only one to want real-time status of jobs displayed on-screen.

    Thanks for both your help, especially over the holiday period :)


    Friday, December 27, 2019 12:32 PM
  • Thanks. In my testing from before, and even refactoring your example in to my code, the -Wait parameter makes it wait for the job to complete each time.

    This is evident from having File Explorer open to the destination folder; with -Wait on files land there one-by-one and without it they land in parallel.

    So it seems Workflows is the only way. But that means I have to use Write-Host to have colored text... but Write-Host is now supposedly a no-no!! And Workflows have the overhead of not being able to access parameters outside of itself!

    So in summary, PowerShell needs fixing. I can't be the first and only one to want real-time status of jobs displayed on-screen.

    Thanks for both your help, especially over the holiday period :)


    No. PowerShell does not need fixing but you need to learn PowerShell.  Every one of these things is addressable once you learn PowerShell and how computing works.

    A workflow is a fundamental capability of computers and has been around in various forms since the first computer.  PS workflow iss a way to access the Windows Workflow engine which is a stock Windows service. 

    In Windows and all other computer systems direct console access is not possible between processes. Both workflows and jobs run in separate processes which is why Write-Host and direct output are not possible.  It is a limitation of the computer system and not PowerShell.  Learning PowerShell and gaining a more complete understanding of computer systems would make it  easier to move forwards.  You must learn enough to know when what you are asking is just not doable with a computer.

    Workflows can access variables if passed in as arguments and handled correctly.  This would also become obvious if you were to carefully read the workflow documentation and examples.  Of course understanding the documentation completely will guarantee that you learn PowerShell core4ctly.

    I guess it is time to stop guessing and get down to learning PS.

    help about_workflow

    Learning to script properly with PowerShell


    \_(ツ)_/

    Friday, December 27, 2019 12:47 PM
  • Another thing to keep in mind. PowerShell accesses a number of external systems; workflow, jobs and "Desired State Configuration". All are external processes and cannot write to the console.  Any output will have to be sent as output and collected as the workflow processes.  Jobs must be polled for output.

    To use any of these advanced capabilities will require learning what they are in the computing world and how Windows and PowerShell implement access.  This is the same as needing to learn NTFS in order to use the file system CmdLets.  PS is just a container and a language syntax that allows us to access these systems.  Each one is external to PowerShell in some way although many are system resources and others are system processes. Processes are a barrier that can only be crossed in specific ways since processes run in isolation.

    When attempting to solve complex problems you will need to do some research to understand the components of your request and to discover what Windows has to offer in the way of support. Learning each subsystem of Windows is what learning how to be a Windows tech is all about.  The good thing about Windows is that the GUI handles most of this for you so you can be a GUI only tech and do most things that are required.  Moving to a programming environment removes all of the help provided by the GUI and requires a deeper understanding of Windows and computer technologies.

    Windows is built from mostly industry  standard technologies and is a wrapper and presentation method that gives access to capabilities available on nearly all modern computers.  Learning the underlying computer engineering technology is a prerequisite for managing a computer from the non-GUI side - the API.  PowerShell provides CmdLets that expose common tasks in a simple way assuming you understand the technology of the subsystem they address.  For simple things like listing files the default for the CmdLets is to do just that.  For other tasks and subsystems more knowledge may be required.  The Microsoft documentation site provides deep and rich information on all subsystems but assumes that you have a considerable amount of computer technical rating.  There are also many books written on all subsystems within Windows which are invaluable in gaining sufficient technical knowledge of the subsystem.

    You cannot learn Windows technologies by asking questions in forums.  You can get excellent help in forums once you understand the technology and can ask good questions.


    \_(ツ)_/

    Friday, December 27, 2019 1:05 PM
  • Here is a way to output text to the pipeline from a workflow.

    workflow CopyFiles{
        param(
            [string[]]$computers
        )
        foreach -Parallel($computer in $Computers){
           Write-Verbose "$computer test message"
        }
    }
    CopyFiles  1,2,3,4,5,6 -Verbose


    \_(ツ)_/

    Friday, December 27, 2019 1:11 PM
  • The wait works for me.  This should normally take 100 seconds, but it only takes 33 seconds.  There's some overhead though because start-job starts a new powershell process for each job.  By the way, you can use write-host with colored text in a job.


    measure-command { 1..10 |
      foreach { start-job { sleep 10; write-host hi -fore green  } } |
      receive-job -wait -auto } | % seconds

    hi
    hi
    hi
    hi
    hi
    hi
    hi
    hi
    hi
    hi

    33

    • Edited by JS2010 Friday, December 27, 2019 3:16 PM
    Friday, December 27, 2019 3:14 PM
  • Yes but jobs will only release that when the job completes unless you poll the job. Job output will be interleaved with the write-host output in strange ways. It is useful if you can figure out how to design the job code to give you what you want.


    \_(ツ)_/

    Friday, December 27, 2019 3:17 PM

  • No. PowerShell does not need fixing but you need to learn PowerShell.

    Sorry, I think there was some confusion with my statement.

    Workflows do not allow some things; one of those things I need to be able to do (using color on screen) without seemingly using a bad practice... Write-Host.

    The now recommended approach is Write-Output instead, but that doesn't support coloring and changing the coloring via $host is not supported from a Workflow.

    My requirement is testing LAN/storage performance, and the best practices out there for this are all about parallel file copies and not sequential/serial. So parallelism is my absolute top priority above all else.

    The on-screen feedback provides: informational (white), green (success), warning (yellow), error (red) and summary (cyan). This is second top priority.

    I showed a screenshot before because it shows the output I am expecting. I cannot use the other Write-* options because they don't provide the aesthetics I am looking for in that feedback.

    I think most people would consider 'traffic' light coloring above essential for effective feedback in this type of requirement. The fact I am restricted to a bad practice to provide that is where my statement came from.

    All they need to do is simply add coloring to Write-Output or make it possible for Workflows to use $host coloring. I don't believe this is missing because of a "limitation of the computer system and not PowerShell"



    Friday, December 27, 2019 4:41 PM
  • By the way, if you're willing to try powershell 7 rc, it has a foreach-object -parallel.  It also uses threads instead of processes.  And workflows will be going away.


    1..5 | foreach-object -Parallel { write-host hi -ForegroundColor red }

    hi
    hi
    hi
    hi
    hi

    • Edited by JS2010 Saturday, December 28, 2019 8:08 PM
    Friday, December 27, 2019 4:51 PM
  • Yes - the latest version of PS7 does separate the command from the console write-host but it will still be interleaved incorrectly.  I am sure it only works with that CmdLet which is not a workflow CmdLet but is a simple way to execute some things in parallel.  It operates on objects passed in.  It5 also wil not work with remoting last I checked.

    Also PS 7 does not yet support workflow.


    \_(ツ)_/

    Friday, December 27, 2019 5:09 PM