none
Putting multiple commands/lines in a foreach-object loop RRS feed

  • Question

  • I have this powershell code, that doesn't work :

    $Test | foreach-object -ThrottleLimit 15 -Parallel {
        $a = $_.'name1'
        $b = $_.'name2'
        $key = "$a" + "-" + "$b"
        $hash.Add($key,$b)
        $hash
    } 
    I want to concatenate the values of column1 and column2 and the result should be the key of a new hashtable. What's wrong with the syntax? I want to use -parallel because the file is huge. The error I get is on the $hash.Add line and says "You cannot call a method on a null-valued expression". I've tried write-host each of the other values and they are what they should be.



    • Edited by badsector82 Wednesday, October 7, 2020 8:46 PM
    Wednesday, October 7, 2020 8:40 PM

All replies

  • Obviously you have null values in your file.


    \_(ツ)_/

    Wednesday, October 7, 2020 9:23 PM
  • If I try write-host $key and $b, there are the correct values.
    Wednesday, October 7, 2020 9:51 PM
  • How do you create your hash?


    \_(ツ)_/

    Wednesday, October 7, 2020 10:32 PM
  • This would be the correct way to do this:

    $hash = @{}
    Import-Csv $your_file | 
        ForEach-Object -ThrottleLimit 15 -Parallel {
            $hash.Add($_.name1,$_.name2)
        } 
    $hash
    The issue you will run into is that hashtable object may no be thread safe.


    \_(ツ)_/


    • Edited by jrv Wednesday, October 7, 2020 10:36 PM
    Wednesday, October 7, 2020 10:35 PM
  • The point is that the key must contain both values from column A and column B delimited with a dash.
    Friday, October 9, 2020 7:22 AM
  • There is nothing stopping your from doing that.


    \_(ツ)_/

    Friday, October 9, 2020 7:46 AM
  • I guess so but I can't figure the syntax. Actually I did without the -parallel parameter and it took only 17 seconds. Still I'm trying to figure how to use -parallel for other tasks.
    Monday, October 12, 2020 7:03 AM
  • I guess so but I can't figure the syntax. Actually I did without the -parallel parameter and it took only 17 seconds. Still I'm trying to figure how to use -parallel for other tasks.
    $hashes = $Test | foreach-object -ThrottleLimit 15 -Parallel {
        @{"$($_.name1)`-$($_.name2)" = "$($_.name2)}
    } 
    same thing can be easyer done with select w/o foreach

    The opinion expressed by me is not an official position of Microsoft

    • Edited by Vector BCO Monday, October 12, 2020 8:00 AM
    Monday, October 12, 2020 7:59 AM
  • Warning, a collection of hashes is not the same as a hash set of keys.   A collection is not useful as a dictionary.


    \_(ツ)_/

    Monday, October 12, 2020 8:23 AM
  • Warning: this use of parallel processing will be less efficient than a correctly designed loop.

    $hash = @{}
    Import-Csv $your_file | 
        ForEach-Object{
            $hash.Add($_.name1 + '-' + $_.name2,$_.name2)
        } 

    Learning PowerShell would prevent all of this vagueness and confusion.  Adding hundreds of quotes does not accomplish anything useful.  Learning how to use quotes and how they work in PowerShell is critical.  YOu are not writing a novel.  Quotes in programming are not like quotes in stories.

    help about_quoting


    \_(ツ)_/


    • Edited by jrv Monday, October 12, 2020 8:31 AM
    Monday, October 12, 2020 8:29 AM
  • Warning: this use of parallel processing will be less efficient than a correctly designed loop.

    $hash = @{}
    Import-Csv $your_file | 
        ForEach-Object{
            $hash.Add($_.name1 + '-' + $_.name2,$_.name2)
        } 
    That's how I finally did it, only using foreach instead of ForEach-Object. The question about -parallel arose in the first place because I thought creation of the hash table would take a lot of time. It turned out to be less than 20 seconds for 2,8M records. But searching through it takes hours. What I want to do is take all keys from the hashtable that have certain value. For each value there are between 5 and 15 keys. Total values are approximately 65K and the hashtable has about 7M records. That means I want to extract these keys from the hashtable that have values matching the 65K records. So far the best performance scenario I've found is this :
    $valuesForHashTable = Import-csv -Path "some path here"
    $valuesToExtractFromHashTable = Import-csv -Path "some other path here"
    
    # creating the hash table
    $hashTable = $null
    $hashTable = @{}
    foreach ($item in $hashTable) {
        $hashTable.Add($item.'some id' + '-' + $item.'some other id',$item.'some other id')
    }
    
    # extract all the keys that have $_.'some other id' in them
    $matchedValues = foreach ($value in $valuesToExtractFromHashTable) {$hashTable.Keys -match "$value.'some other id'"}
    Probably I can split the 7M records into smaller chunks but I guess there's some faster method that I fail find.

    Wednesday, October 14, 2020 12:19 PM
  • "ForEach-Object" when used correctly is always master than your use of foreach().


    \_(ツ)_/

    Wednesday, October 14, 2020 5:36 PM
  • If you're really interested in fast access to the keys that contain a certain value, build the hash using the values as the key and your "key" as the value.

    I don't have millions of records to test its execution speed, but this is the concept (using the System.Collections.ArrayList class to avoid rebuilding a potentially big array every time a new "key" is added to the "value").

    $hash = @{}
    
    @"
    Key,Val
    1,1
    2,1
    3,5
    4,7
    5,5
    "@ | ConvertFrom-CSV |
    ForEach-Object{
        if ($hash.ContainsKey($_.Val)){
            [void]$hash.($_.Val).Add($_.Key)
        }
        else{
            $hash.($_.Val) = [System.Collections.ArrayList]::new()
            [void]$hash.($_.Val).Add($_.Key)
        }
    }


    --- Rich Matheisen MCSE&I, Exchange Ex-MVP (16 years)

    Wednesday, October 14, 2020 7:20 PM
  • foreach() is faster than ForEach-Object but it needs to load all the items in computer's RAM. That's why I chop the big list into smaller pieces. Otherwise the powershell process takes all the free RAM, and since my machine has only 16GB it gets stuck.

    As for the suggestion to "build the hash using the values as the key and your "key" as the value", I can't do that because the list has duplicate values in both columns. It looks something like this :

    ID1       ID2
    -------------- ------
    value1    anotherValue7
    value2    anotherValue1
    value3    anotherValue4
    value2    anotherValue2
    value5    anotherValue2
    value1    anotherValue3

    Anyway the code I posted in the previous post did the job. So I'm done with this topic. Thanks to all the participants!


    • Edited by badsector82 Friday, October 16, 2020 7:42 AM
    Friday, October 16, 2020 7:40 AM
  • I do not know why you insist that foreach() is faster.  It is demonstrably slower than using a pipeline.  This has been noted and documented by the designers of PowerShell.

    I only post this for others because those  that start out with this misinformed view seldom change or take the time to learn why the pipeline is faster.

    The issue of performance is very hard to measure and the code being measured can have an unexpected influence on measurement.  I will not say more as it would take too long to explain.


    \_(ツ)_/

    Friday, October 16, 2020 8:03 AM
  • The duplicate problem isn't a problem . . . the "value" part of the hash is an ArrayList. The values added need not be unique.

    In retrospect, I'd change the way I created the ArrayList (that class has been deprecated) and use a System.Collections.Generic.List([object]) in it's place.


    --- Rich Matheisen MCSE&I, Exchange Ex-MVP (16 years)

    Friday, October 16, 2020 3:23 PM
  • I do not know why you insist that foreach() is faster.  It is demonstrably slower than using a pipeline.  This has been noted and documented by the designers of PowerShell...

    Could you please provide a link to that documentation? Because I've read otherwise. ForEach-Object pipes the elements one by one as they come to the pipeline because there's no way to know all the elements at the beginning. This way each object is processed individually, which makes one element = one process. With foreach() we know all the elements and process them at once. That's why we must have enough RAM to store them in the beginning. I'd really appreciate more info on the matter, especially from the source.
    Monday, October 19, 2020 11:33 AM
  • The site is having issues and will not allow posting of more than a few words without throwing an exception.


    \_(ツ)_/

    Monday, October 19, 2020 12:04 PM