none
Ways to use a Filter RRS feed

  • General discussion

  • I have just recently come across the "Filter" construct, which is similar to "Function".  I have rarely seen this used (only once ever), and the documentation is thin.  This is what the help has to say:

    Filters
          A filter is a type of function that runs on each object in the pipeline.
          A filter resembles a function with all its statements in a Process block.

          The syntax of a filter is as follows:

              filter [<scope:>]<name> {<statement list>}

          The following filter takes log entries from the pipeline and then
          displays either the whole entry or only the message portion of the entry:

              filter ErrorLog ([switch]$message)
              {
                  if ($message) { out-host -inputobject $_.Message }
                  else { $_ }  
              }

    This seems like a very cool and powerful tool.  Can anyone think of other creative ways we could make use of this?


    Grant Ward, a.k.a. Bigteddy

    What's new in Powershell 3.0 (Technet Wiki)

    Tuesday, February 28, 2012 5:39 AM

All replies

  • You can Download the second edition of Sapien PowerShell V.1.0 TFM eBook here:
    http://www.sapien.com/downloads#Free%20e-Books/Windows%20PowerShell%201.0%20TFM.pdf

    <Snipp>

    Page 259:
    Filters
    Filters are essentially the same as functions. The big differences are that: 1) filters are declared using the
    Filter keyword, and 2) when objects are piped to a filter, the filter executes one time for each object in
    the pipeline, rather than just one time for the entire pipeline. Basically, a filter is like a function that contains
    only a Process script block.
    ......
    Functions vs. Filters
    The differences between a function and a filter can be summarized as follows:
    • When something is piped to a function, the piped data goes into the special $input variable and
    the function is executed once.
    • When something is piped to a filter, the filter is executed one time for each object in the piped
    data. The current object is available in the special $_ variable, and there’s no $input variable.
    One thing that can make it difficult to understand these differences is that you can write functions that
    behave exactly the same way filters behave.

    </Snipp>

    In my point of view, Filters make no sense...
    So i decided to ignor them and allways use Functions with a Process Block!
    This you can extend later, with a Begin or end block.


    Please click “Mark as Answer” if my post answers your question and click Vote as Help if my Post helps you.
    Bitte markiere hilfreiche Beiträge von mir als Hilfreich und Beiträge die deine Frage ganz oder teilweise beantwortet haben als Antwort.
    My PowerShell Blog http://www.admin-source.info
    [string](0..21|%{[char][int]([int]("{0:d}" -f 0x28)+('755964655967-86965747271757624-8796158066061').substring(($_*2),2))})-replace



    Tuesday, February 28, 2012 9:31 AM
  • I rarely see filters used at all now that we are using V2.  Here is a quick example of a filter and a function using Process that do the same thing.

    Filter Get-ProcessName_Filter {
        $_.Name
    }
    
    Get-Process | Get-ProcessName_Filter

    Function Get-ProcessName_Function {
        Process{
            $_.Name
        }
    }
    
    Get-Process | Get-ProcessName_Function

    Both of these will give the exact same type output as they behave exactly the same when taking pipeline input.


    Boe Prox

    Please remember to mark the best solution as the answer using Mark as Answer. If you find a solution to be helpful, please use Vote as Helpful.

    Looking for a script? Check out the Script Repository
    Need a script written for you? Submit a request at the Script Request Page

    Tuesday, February 28, 2012 12:13 PM
  • Even in PowerShell V 1 there was both: Filters and Functions!
    And it is not only exact the same output and the same behavior, it is exact what i meant! ;-))


    Please click “Mark as Answer” if my post answers your question and click Vote as Help if my Post helps you.
    Bitte markiere hilfreiche Beiträge von mir als Hilfreich und Beiträge die deine Frage ganz oder teilweise beantwortet haben als Antwort.
    My PowerShell Blog http://www.admin-source.info
    [string](0..21|%{[char][int]([int]("{0:d}" -f 0x28)+('755964655967-86965747271757624-8796158066061').substring(($_*2),2))})-replace

    Tuesday, February 28, 2012 1:41 PM
  • I use them quite regularly when I want to explicitly name some functionality but don't need a function per se.  The same thing can be accomplished with expressions, scriptproperties and the like.  The big plus is seeing a clear transformative step in the pipeline.  I did a very lightweight write up a while back:

    http://learningpcs.blogspot.com/2010/08/powershell-filters.html

    Tuesday, February 28, 2012 2:06 PM
  • I'll use them in cases where it seems appropriate, working from the premise that any particular piece of code will be more efficient if it doesn't have to worry about things it's never going to be asked to do.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Tuesday, February 28, 2012 3:01 PM
  • Functions vs. Filters

    The differences between a function and a filter can be summarized as follows:
    • When something is piped to a function, the piped data goes into the special $input variable and
    the function is executed once.
    • When something is piped to a filter, the filter is executed one time for each object in the piped
    data. The current object is available in the special $_ variable, and there’s no $input variable.
    One thing that can make it difficult to understand these differences is that you can write functions that
    behave exactly the same way filters behave.

    The distiniction between what is being passed to function ($input) versus filter ($_) is important. Knowing that can get you out of some funky rabbit trails before you start.

    Tuesday, February 28, 2012 3:03 PM
  • Seems like that could also have some performance consequences.  If all the data is read into $input from the pipeline and the the function executed once, then inside the pipeline a fuction would act like a blocking cmdlet, and a filter would act like a streaming cmdlet.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Tuesday, February 28, 2012 3:14 PM
  • It just seems like the Filter is more appropriate for some operations, because processing begins straight away, it does not wait for the $input variable to be populated.  That's what the documentation implies to me, anyway.


    Grant Ward, a.k.a. Bigteddy

    What's new in Powershell 3.0 (Technet Wiki)

    Tuesday, February 28, 2012 3:17 PM
  • Seems like that could also have some performance consequences.  If all the data is read into $input from the pipeline and the the function executed once, then inside the pipeline a fuction would act like a blocking cmdlet, and a filter would act like a streaming cmdlet.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "


    Exactly what I was talking about in my previous post.  Surely the "Filter" has some merits?

    Grant Ward, a.k.a. Bigteddy

    What's new in Powershell 3.0 (Technet Wiki)

    Tuesday, February 28, 2012 3:19 PM
  • Going back to what mjolinor said earlier, I find if I have to make a simple calculation, transformation, or, analysis in flight, filters work well.  With these there is no opportunity (so, theoretically need) for begin/process/end.  You know exactly what you are getting and what you want.  This is just a plugin approach to touch something with no need (or option) for rollback.  One I use it for is appending trailing slashes on URL's and IIS paths.  Run a quick check to see if a trailing slash exists.  If not, add it:

    PS > filter Add-TrailingSlash {
    	if(-not ($_.ToString().EndsWith('\\'))) {
    		$_ + '\'
    	}
    }
    
    PS > 'C:\test' | Add-TrailingSlash
    C:\test\
    Tuesday, February 28, 2012 3:26 PM
  • I wrote a filter to convert tab-delimited text format to csv format, because it seemed like a good idea at the time. I never used it in any situation where the volume of data processed was a problem, so it could have just as easily been written as a function.

    That said, the main reason I would write some new code as a filter is if it was something that was conceived as part of an application where it was likely to be used as a filter in a pipeline. That would remove concern over the blocking nature of a function as well as the storage impact of a huge $input variable. If I conceived of the functionality more as a function, then the design of the application would be such that these two issues were not a concern.

    As to how else to use a filter, wWhat would be the point of developing a filter and then just calling it like a function?


    Al Dunbar

    Tuesday, February 28, 2012 4:56 PM
  • Here's one's I am literally using in a script right now:

    filter Get-ParentNameFromFileInfo {
    	param(
    		[Parameter(
    			ValueFromPipeline = $true
    		)]
    		 $filepath
    	)
    	
    	$split = ((Split-Path $filepath.fullname) -Split '\\')
    	$split[$split.length - 1]
    }

    The FileInfo objects I am piping won't give me the parent directory name.  So, this returns it:

    	foreach($file in (Get-ChildItem $path | Where {$_.Extension -eq '.iso'})) {
    		Write-Output "$(wdt): $file|$(Get-ParentNameFromFileInfo $file)"
    	}

    Also, with regards to testing, here is one I found useful (there's much more you could do with this):

    function test-function{$_.name}
    function test-filter{$_.name}
    
    # 100 iterations
    (Measure-Command -Expression {1..1e2 | % {dir | % {test-function $_}}}).TotalSeconds
    2.869
    
    (Measure-Command -Expression {1..1e2 | % {dir | % {test-filter $_}}}).TotalSeconds
    3.53
    
    # 1000 iterations
    (Measure-Command -Expression {1..1e3 | % {dir | % {test-function $_}}}).TotalSeconds
    34.362
    
    (Measure-Command -Expression {1..1e3 | % {dir | % {test-filter $_}}}).TotalSeconds
    29.213

    I am sure as you do more the numbers will skew towards filter on higher volume.  These are meant to be lightweight functions (by another name).

    Tuesday, February 28, 2012 5:18 PM
  • So I took what Will did and added one more piece to this by adding a Function with the Process block and got some interesting results:

    function test-function_Process{ Process {$_.name}}
    function test-function_NoProcess{ $_.name}
    function test-filter{$_.name}
    # 100 iterations
    (Measure-Command -Expression {1..1e2 | % {dir | % {Test-function_Process $_}}}).TotalSeconds
    1.3818334
    (Measure-Command -Expression {1..1e2 | % {dir | % {test-function_NoProcess $_}}}).TotalSeconds
    1.4547748
    (Measure-Command -Expression {1..1e2 | % {dir | % {test-filter $_}}}).TotalSeconds
    1.4656084
    # 1000 iterations
    (Measure-Command -Expression {1..1e3 | % {dir | % {Test-function_Process $_}}}).TotalSeconds
    13.8699915
    (Measure-Command -Expression {1..1e3 | % {dir | % {test-function_NoProcess $_}}}).TotalSeconds
    14.90194
    (Measure-Command -Expression {1..1e3 | % {dir | % {test-filter $_}}}).TotalSeconds
    14.715142

    I ran this a few times just to check for consistency, and the Function with the Process{} block was always the fastest (in the longer running measurement) followed by the Filter and finally the Function without the Process{] block.

    Boe Prox

    Please remember to mark the best solution as the answer using Mark as Answer. If you find a solution to be helpful, please use Vote as Helpful.

    Looking for a script? Check out the Script Repository
    Need a script written for you? Submit a request at the Script Request Page

    Tuesday, February 28, 2012 5:41 PM
  • I don't like using disk input for performance benchmarks. Caching and contention introduce too many other variables, IMHO.

    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Tuesday, February 28, 2012 6:03 PM
  • I don't like using disk input for performance benchmarks. Caching and contention introduce too many other variables, IMHO.

    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Where is your sense of adventure? :)

    In all seriousness, I agree that running against a disk might not be the best way. I did run against processes and found the results to be similiar (in favor of a function with a process block).

    Might be worth running against a few other types of queries other than disk and process to get a better handle of measurements.


    Boe Prox

    Please remember to mark the best solution as the answer using Mark as Answer. If you find a solution to be helpful, please use Vote as Helpful.

    Looking for a script? Check out the Script Repository
    Need a script written for you? Submit a request at the Script Request Page

    Tuesday, February 28, 2012 6:28 PM
  • Found a flaw in our testing...

    Function Test-Filter should be Filter Test-Filter

    Think I am going to re-run some tests and se what comes back this time.


    Boe Prox

    Please remember to mark the best solution as the answer using Mark as Answer. If you find a solution to be helpful, please use Vote as Helpful.

    Looking for a script? Check out the Script Repository
    Need a script written for you? Submit a request at the Script Request Page

    Tuesday, February 28, 2012 6:36 PM
  • Doh.  Blasted spelling!  Why doesn't thought have spellcheck?
    Tuesday, February 28, 2012 6:39 PM
  • Still pretty close, also added a pipeline test for the Filter and Function /w Process{}. This is against the disk, but figured it was a starting point :)

    function test-function_Process{ Process {$_.name}}
    function test-function_NoProcess{ $_.name}
    Filter test-filter{$_.name}
    # 100 iterations
    (Measure-Command -Expression {1..1e2 | % {dir | % {Test-function_Process $_}}}).TotalSeconds
    1.3794607
    (Measure-Command -Expression {1..1e2 | % {dir | % {test-function_NoProcess $_}}}).TotalSeconds
    1.4521675
    (Measure-Command -Expression {1..1e2 | % {dir | % {test-filter $_}}}).TotalSeconds
    1.3925504
    # 1000 iterations
    (Measure-Command -Expression {1..1e3 | % {dir | % {Test-function_Process $_}}}).TotalSeconds
    13.7342737
    (Measure-Command -Expression {1..1e3 | % {dir | % {test-function_NoProcess $_}}}).TotalSeconds
    14.5357563
    (Measure-Command -Expression {1..1e3 | % {dir | % {test-filter $_}}}).TotalSeconds
    13.9621658
    #Pipeline test
    (Measure-Command -Expression {1..1e3 | % {dir | Test-function_Process }}).TotalSeconds
    5.3866466
    (Measure-Command -Expression {1..1e3 | % {dir | test-filter}}).TotalSeconds
    5.3580167


    Boe Prox

    Please remember to mark the best solution as the answer using Mark as Answer. If you find a solution to be helpful, please use Vote as Helpful.

    Looking for a script? Check out the Script Repository
    Need a script written for you? Submit a request at the Script Request Page

    Tuesday, February 28, 2012 6:43 PM
  • IMHO, that's not a very good test.  You're destroying the performance advantage of the filter and the function with the PROCESS block by using foreach-object.  They don't need that. 

    The function without the PROCESS block does need that, because if you give it a single script block without specifying that it's the PROCESS block, it will default to being the END block.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Tuesday, February 28, 2012 6:45 PM
  • We could use a good article on developing good testing for Powershell professionals. : )
    Tuesday, February 28, 2012 6:46 PM
  • IMHO, that's not a very good test.  You're destroying the performance advantage of the filter and the function with the PROCESS block by using foreach-object.  They don't need that. 

    The function without the PROCESS block does need that, because if you give it a single script block without specifying that it's the PROCESS block, it will default to being the END block.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "


    I agree, hence why I added the last two tests to allow them to use the pipeline vs. ForEach to show the performance increase by taking advantage of the pipeline and allowing the Filter and Function /w Process to work as intended.

    Boe Prox

    Please remember to mark the best solution as the answer using Mark as Answer. If you find a solution to be helpful, please use Vote as Helpful.

    Looking for a script? Check out the Script Repository
    Need a script written for you? Submit a request at the Script Request Page

    Tuesday, February 28, 2012 6:50 PM
  • Running this:

    $counter = 1..1e5
    filter test_filter {if ($_ -match $args){$_}}
    function test_funct {process{if ($_ -match $args){$_}}}
    function test_funct2 {if ($_ -match $args){$_}}
    $regex = [regex]'.0$'
    (measure-command {$counter | test_filter $regex}).totalmilliseconds
    (measure-command {$counter | test_funct $regex}).totalmilliseconds
    (measure-command {$counter |% {test_funct2 $regex}}).totalmilliseconds

    I can't really see much performance difference between the PROCESS function and the filter, other than it's easier to code.

    The foreach-object with the single scirpt block function is just sad.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Tuesday, February 28, 2012 7:13 PM
  • Running this:

    $counter = 1..1e5
    filter test_filter {if ($_ -match $args){$_}}
    function test_funct {process{if ($_ -match $args){$_}}}
    function test_funct2 {if ($_ -match $args){$_}}
    $regex = [regex]'.0$'
    (measure-command {$counter | test_filter $regex}).totalmilliseconds
    (measure-command {$counter | test_funct $regex}).totalmilliseconds
    (measure-command {$counter |% {test_funct2 $regex}}).totalmilliseconds

    I can't really see much performance difference between the PROCESS function and the filter, other than it's easier to code.

    The foreach-object with the single scirpt block function is just sad.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "


    Yea, that agrees with that I was seeing as well. Using a foreach prior to the function just kills it and as long as you are using the Filter and function w/ Process{} as intended, you see a great performance increase. How many times did you run that?  I noticed that it sometimes would sway between the Filter and Function w/ Process{}? But yes, it is a small difference in performance.

    Boe Prox

    Please remember to mark the best solution as the answer using Mark as Answer. If you find a solution to be helpful, please use Vote as Helpful.

    Looking for a script? Check out the Script Repository
    Need a script written for you? Submit a request at the Script Request Page

    Tuesday, February 28, 2012 7:34 PM
  • I think I found the wiki for you to create ;)
     

    Justin Rich
    http://jrich523.wordpress.com
    PowerShell V3 Guide (Technet)
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Tuesday, February 28, 2012 7:43 PM
  • Go get 'em! Wikiman.
    Tuesday, February 28, 2012 7:44 PM
  • Couldn't resist this one last time. If anything, this shows that proper use of the running the Filter and/or Function /w Process{} can really save you on performance.

    function test-function_Process{ Process {$_}}
    function test-function_NoProcess{ $_}
    Filter test-filter{$_}
    # 100 iterations
    (Measure-Command -Expression {1..1e2 | % {1..1e2 | % {Test-function_Process $_}}}).TotalSeconds
    2.5179856
    (Measure-Command -Expression {1..1e2 | % {1..1e2 | % {test-function_NoProcess $_}}}).TotalSeconds
    2.5524584
    (Measure-Command -Expression {1..1e2 | % {1..1e2 | % {test-filter $_}}}).TotalSeconds
    2.4964097
    #Pipeline test 100 iterations
    (Measure-Command -Expression {1..1e2 | % {1..1e2 | Test-function_Process }}).TotalSeconds
    0.1221473
    (Measure-Command -Expression {1..1e2 | % {1..1e2 | test-filter}}).TotalSeconds
    0.1064076
    # 1000 iterations
    (Measure-Command -Expression {1..1e3 | % {1..1e3 | % {Test-function_Process $_}}}).TotalSeconds
    258.524941
    (Measure-Command -Expression {1..1e3 | % {1..1e3 | % {test-function_NoProcess $_}}}).TotalSeconds
    256.5735826
    (Measure-Command -Expression {1..1e3 | % {1..1e3 | % {test-filter $_}}}).TotalSeconds
    249.916555
    #Pipeline test 1000 iterations
    (Measure-Command -Expression {1..1e3 | % {1..1e3 | Test-function_Process }}).TotalSeconds
    10.1089049
    (Measure-Command -Expression {1..1e3 | % {1..1e3 | test-filter}}).TotalSeconds
    10.9791761


    Boe Prox

    Please remember to mark the best solution as the answer using Mark as Answer. If you find a solution to be helpful, please use Vote as Helpful.

    Looking for a script? Check out the Script Repository
    Need a script written for you? Submit a request at the Script Request Page

    Tuesday, February 28, 2012 7:47 PM
  • I ran that at least a dozen times. Sometimes the filter was faster, sometimes the (PROCESS) function.  But never really any significant time differenct between them.

    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Tuesday, February 28, 2012 7:51 PM
  • I wonder if my newest swiss army knife would do any good helping get insight into this one.
    Tuesday, February 28, 2012 8:19 PM
  • Have you tried doing a trace-command on trace-command yet?

    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Tuesday, February 28, 2012 8:35 PM
  • Yeah.  I did Start-Transcript in the inner command. : )  Seriously...no.  That just sounds like dynamite.  Trying now...  <grinning>

    Boo.... Underwhelming.

    Trace-Command -Name ETS -Expression {Trace-Command -Name ETS -Expression {dir} -PSHost} -PSHost

    • Edited by Will Steele Tuesday, February 28, 2012 8:39 PM extra comment
    Tuesday, February 28, 2012 8:37 PM
  • The thing I like about filter is that it implies (to me) that it should be used in the middle or at the end of the pipeline, and not at the beginning.

    I tend to use filters as helpers inside scripts or inside other functions, just like I tend to use "traditional" functions as helpers. Specifying Begin, Process, End, ValueFromPipeline, Mandatory, or the name for the parameter that takes pipelined input (which is going to be named $InputObject anyway) is just too much for a helper function.

    To me, filter says a lot about how a function should be used with very few words.

    I don't know the intended purpose of filter in PowerShell, but sometimes I wish PowerShell made the use of a filter in the beginning of the pipeline an error or a warning. I think that would make filter much more useful. But then again, I don't know how they intended for filter to be used in the first place.

     

    Tuesday, February 28, 2012 10:09 PM
  • Something else to throw in the pot:

    $counter = 1..1e5
    filter test_filter {if ($_ -match $args){$_}}
    $regex = [regex]'.0$'

    (measure-command {$counter | test_filter $regex}).totalmilliseconds
    (measure-command {$counter |? {$_ -match $regex}}).totalmilliseconds
    1810.6949
    6000.846


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Tuesday, February 28, 2012 10:26 PM
  • Now you have me wondering whether they intended for filter to be used as a fast where, a fast foreach, a function that defaults to a process block, something else, or all of the above.

    I've always looked at filter as a function that defaults to a process block, which is different to "function" which is a function that defaults to an end block. Given its name and the responses here, I reconsidered my stance. But then I tried to make filter behave like a function, and it worked.

        ---

        filter a
        {
            begin { 'hello' }
            process { 'how are you' }
            end { 'bye' }
        }

        1..5 | a

        # Output
        # hello
        # how are you
        # how are you
        # how are you
        # how are you
        # how are you
        # bye

        ---

    Now, I really don't know how they want us to think about filter.

    Wednesday, February 29, 2012 1:30 AM
  • There's a similarity between the filter/process-only script block and where in that they are single process.

    Functions and foreach-object are similar in both having begin/process/end capabilities.

    The function provider will let you specify arguments to get passed to the script blocks, where the cmdlets do not. 

    You can only have one argument list, but that argument list will get passed to any/all of the begin, process and end blocks.  Those script blocks also appear to share a common scope.  If you invoked those script blocks separately outside the function, variables that were set in the Begin block would not be visible to Process or End blocks. 

    Foreach-object will execute those script blocks in the current scope, and normal invocation of the function will run them in their own scope, but wthin that scope the Begin, Process, and Ends blocks all run in the functions local scope.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "



    • Edited by mjolinor Wednesday, February 29, 2012 1:42 PM
    Wednesday, February 29, 2012 2:06 AM
  • I think 2 Language construcs with the same behavior and the same results are unnecessary!
    They are confusing for the PowerShell beginner and so the amount of learning PS and reading scripts of other developers are going harder.
    The PowerShell Syntay is even without such constructs, ugly enough!
    So i stay by to statement: $Filter=Ignor


    Please click “Mark as Answer” if my post answers your question and click Vote as Help if my Post helps you.
    Bitte markiere hilfreiche Beiträge von mir als Hilfreich und Beiträge die deine Frage ganz oder teilweise beantwortet haben als Antwort.
    My PowerShell Blog http://www.admin-source.info
    [string](0..21|%{[char][int]([int]("{0:d}" -f 0x28)+('755964655967-86965747271757624-8796158066061').substring(($_*2),2))})-replace

    Wednesday, February 29, 2012 6:26 AM
  • I'm going to add to the confusion some more, and say that exactly what a "filter" is ambigous.

    There is a named filter, which is managed by the function provider and is created using the Filter statement.

    Then there is an anonymous filter, which is a script block has it's isfilter property set to $true.  This appears to bind it to the pipeline, enabling the use of $_, and causing it to run once for every object that goes through the pipeline:

    $sb = {if ($_ % 9){$_}}
    $sb.isfilter = $true
    
    filter test {if ($_ % 9){$_}}
    $count = 1..1e5
    
    (measure-command {$count | test}).totalmilliseconds
    
    (measure-command {$count | &$sb}).totalmilliseconds
    
    The anonymous filter appears to run marginally faster than the named filter.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Wednesday, February 29, 2012 12:06 PM
  • do you think this applies else where or do you think there is some magic
    going on with regex?
     
    I thought maybe the regex might be compiled so I tired that, which didn’t
    work, then I thought maybe the script block was compiled... so I tossed the
    Where in a function, that didn’t help AT ALL
     
    $counter = 1..1e5
    filter test_filter {if ($_ -match $args){$_}}
    function test_where {process{$_ | ?{$_ -match $args}}}
    function test_function {process{if($_ -match $args){$_}}}
    $regex = [regex]'.0$'
    $regex_opts = [text.regularexpressions.regexoptions]::Compiled
    $compregex = new-object text.regularexpressions.regex ('.+0.$',$regex_opts)
     
    (measure-command {$counter | test_filter $regex}).totalmilliseconds
    (measure-command {$counter | test_function $regex}).totalmilliseconds
    (measure-command {$counter | test_where $regex}).totalmilliseconds
    (measure-command {$counter |? {$_ -match $regex}}).totalmilliseconds
    (measure-command {$counter |? {$_ -match $compregex}}).totalmilliseconds
     
    1174.1703
    1184.5488
    14353.8124
    3165.1233
    3111.1516
     
    not much difference... side note, rob you need a new pc :)
     
    apparently where is just bad in large cases
     
     

    Justin Rich
    http://jrich523.wordpress.com
    PowerShell V3 Guide (Technet)
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Wednesday, February 29, 2012 1:19 PM
  • I had a fair amount of other stuff running when I did my tests, but you got relatively close to the same results I did - the filter and function are about 3x faster than where-object.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "


    • Edited by mjolinor Wednesday, February 29, 2012 1:28 PM
    Wednesday, February 29, 2012 1:28 PM
  • I wonder if the difference is that a function unspecified runs in the End
    and a filter if unspecified runs in the Process block?
     
    Help About_Functions
    "
    If no Begin, Process, or End keywords are
         used, all the statements are treated like an End statement list.
    "
     
    there is no help on filters, they are just mentioned in the functions
    section and no indication of how it processes..
     
     

    Justin Rich
    http://jrich523.wordpress.com
    PowerShell V3 Guide (Technet)
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Wednesday, February 29, 2012 1:28 PM
  • That appears to be about the only difference between the Filter statement and the Function statement - what the behaviour of a unspecified script block will be.  In a Function statement it will be End block behaviour.  In a Filter statement it will be Process block.

    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Wednesday, February 29, 2012 1:34 PM
  • Coming back to using a filter vs. where-object, I have found a filter is consistently faster (by a smallish margin):

    filter BigFiles {if ($_.length -gt 1mb){$_}}
    Measure-Command {gci -Recurse -Force -ea silentlycontinue| bigfiles}
    Measure-Command {gci -Recurse -Force -ea silentlycontinue| ? { $_.length -gt 1mb} }


    Grant Ward, a.k.a. Bigteddy

    What's new in Powershell 3.0 (Technet Wiki)

    Wednesday, February 29, 2012 1:43 PM
  • which would explain why a function that specifies Process runs about the
    same speed..
     
    also, bigteddy, never use disk for perf tests :)
     
     

    Justin Rich
    http://jrich523.wordpress.com
    PowerShell V3 Guide (Technet)
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Wednesday, February 29, 2012 1:46 PM
  • The vast majority of the time in either test is going to be taken up in disk I/O.  That's going to make it appear that there's relatively little difference between the two different filtering methods.  To find out what the actual difference between the two methods is, you want to eliminate, as much as possible, any processing overhead that is not part of what you're testing. 

    For this to be a fair comparison to where-object, that filter has to at least do some kind of boolean test on the pipeline objects.  My using regex matches was adding additional overhead too. 

     I think this might present a more realistic picture of exactly what the process time differences are between the two methods:

    $count = 1..1e6
    filter test {if ($_){$_}}
    (measure-command {$count | test}).totalmilliseconds
    (measure-command {$count |?{$_}}).totalmilliseconds

    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "


    • Edited by mjolinor Wednesday, February 29, 2012 2:08 PM
    Wednesday, February 29, 2012 2:06 PM
  • so perhaps the heart of it is that its no big deal because the % of increase
    overhead from filter/function/where is negligible compared to the work... I
    me hell we tested it with 1e5 items, that’s a LOT if that were to be actual
    work then the 3-6 seconds isnt a factor..
     
     

    Justin Rich
    http://jrich523.wordpress.com
    PowerShell V3 Guide (Technet)
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Wednesday, February 29, 2012 2:14 PM
  • I agree, mj, and running these test, the results speak for themselves:

    PS C:\Windows> $count = 1..1e6
    filter test {if ($_){$_}}
    (measure-command {$count | test}).totalmilliseconds
    (measure-command {$count |?{$_}}).totalmilliseconds

    3317.3036
    30143.8225

    That's a ten-fold difference!  So, there is merit in using Filters.


    Grant Ward, a.k.a. Bigteddy

    What's new in Powershell 3.0 (Technet Wiki)

    Wednesday, February 29, 2012 2:15 PM
  • Here's what distinguishes functions from filters to me.
     
    Functions have access to the total pipeline contents and can reprocess that pipeline if needed.
    Because of this, functions block the pipeline, requiring the previous element in the pipeline to
    run to completion.
     
    Filters have access to only one item in the pipeline.
    Filters do not block the pipeline, and take input as soon as it is produced by the previous element
    in the pipeline.
      - Larry
     
    Wednesday, February 29, 2012 2:22 PM
  • And if we leave out the pipeline altogether, we get the fastest result of all:

    $count = 1..1e6
    filter test {if ($_){$_}}
    (measure-command {$count | test}).totalmilliseconds
    (measure-command {$count |?{$_}}).totalmilliseconds
    (Measure-Command {foreach($x in $count) {if ($x) {$x}}}).milliseconds
    


    Grant Ward, a.k.a. Bigteddy

    What's new in Powershell 3.0 (Technet Wiki)

    Wednesday, February 29, 2012 2:24 PM
  • I think I smell another wiki coming on....
     

    Justin Rich
    http://jrich523.wordpress.com
    PowerShell V3 Guide (Technet)
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Wednesday, February 29, 2012 2:33 PM
  • Whether that's a lot or not depends on the context.   That's a lot if you're talking about files in a directory.  It can be relatively trivial if you're talking about characters in a file.

    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Wednesday, February 29, 2012 2:37 PM
  • Yes we do, but then we're back to the issue of needing to have everything in memory before we can start to process.

    Whether to use the pipeline or not is a different question than what to put in that pipeline once you've decided to use it.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "


    • Edited by mjolinor Wednesday, February 29, 2012 2:43 PM
    Wednesday, February 29, 2012 2:43 PM