none
Switch vs If/Else vs Hashtable of scriptblocks RRS feed

  • General discussion

  • OK, from another thread (that was getting way too long, and OT) I came up with this idea.  The only time it would be useful is if you handling really big data sets, and needing to do switching based on a straight comparison of values.  Normally you'd use a Switch or a stack of If/Else statements.  This is a kind of Frankenstein creation that uses a hash table of script blocks in place of the Switch or If/Else statements.  The script blocks in the tests are empty to keep the time being measured down to just how long it takes to select the right script block, and in the limited testing I've done it seems to have a clear performance advantage over the other 2 methods.

     

     
    $x = 0..10 | get-random
    
    "`nSwitch"
    (measure-command {
    switch ($x){
    0  {}
    1  {}
    2  {}
    3  {}
    4  {}
    5  {}
    6  {}
    7  {}
    8  {}
    9  {}
    default {}
    }
    }).totalmilliseconds
    
    "`nIf/Else"
    (measure-command {
    
    if ($x -eq 0){}
    elseif ($x -eq 1){}
    elseif ($x -eq 2){}
    elseif ($x -eq 3){}
    elseif ($x -eq 4){}
    elseif ($x -eq 5){}
    elseif ($x -eq 6){}
    elseif ($x -eq 7){}
    elseif ($x -eq 8){}
    elseif ($x -eq 9){}
    else {}
    }).totalmilliseconds
    
    
    $hash = @{
    0 = {}
    1 = {}
    2 = {}
    3 = {}
    4 = {}
    5 = {}
    6 = {}
    7 = {}
    8 = {}
    9 = {}
    default = {}
    }
    
    "`nHash"
    (measure-command {
    if ($hash.containskey($x)){.$hash.$x} else{.$hash.default}
    }).totalmilliseconds
    
    
    
    
    • Edited by mjolinor Thursday, February 2, 2012 2:54 PM
    Wednesday, February 1, 2012 7:16 PM

All replies

  • I don't think using a hash of ScriptBlocks is Frankenstein-ish at all; it's a great PowerShell adaptation of the practice of using dictionaries of delegates or anonymous functions in static languages like C#. The incredibly powerful thing it also adds is the ability to more cleanly add and remove conditions and actions.

    Imagine some script which assigns groups of certain customers to call center personnel. You can build and test a lot of base logic, but leave the specific switch values and even response actions to be added--and freely changed--later. Since PS is dynamic, you could even have the text of script blocks in a database. Heck, let's even go one step further than is even possible with a switch statement, and replace the switch values (or patterns if -regex) with arbitrary predicate-like scriptblocks themselves! Now who's Dr. Frankenstein?

    function Get-ActionTable {
        $sourcetable = {a function that queries a db} -ConnectionString "..." `
                       -Command "SELECT PredicateBlock, ActionBlock, ZOrder FROM dbo.adb.AssignmentActions"
        $actiontable = ($sourcetable.Rows | foreach { New-Object PSObject -Property @{
                ZOrder     = $_.ZOrder;
                Condition = [ScriptBlock]::Create($_.PredicateBlock);
                Action    = [ScriptBlock]::Create($_.ActionBlock); 
            }} | Sort-Object ZOrder)
        Write-Output $actiontable
    }
    
    function Apply-Action {
        param([object[]]$InputArgs, $ActionTable)
        foreach($actionrow in $ActionTable) {
            [boolean]$result = $actionrow.Condition.Invoke($inputArgs)
            if($result) { 
                $actionrow.Action.Invoke()
                return           
            }
        }
    }
    
    # bunch o' stuff
    
    $assignments = Get-ActionTable
    foreach($customer in $customerlist) {
        Apply-Action -InputArgs $customer -ActionTable $assignments
    }
    
    

     


    jmh
    Wednesday, February 1, 2012 7:58 PM
  • Switch supports passing the whole array at once, if and hashtable don't... Make sure the right thing is being executed.

    I tried it out like this:
    - 10,000 items
    - items have random numerical values 1 to N
    - Comparing (all using empty block of code)
    -- switch statement with 1 entry per possible value
    -- switch statement with 1 entry per possible value, with "continue" to short-circuit after match
    -- if/elseif chain with 1 block per possible value
    -- Hashtable with 1 scriptblock entry per possible value, invoked with .Invoke()

    With N = 10 possible values, results were (in seconds)
    Time (switch): 2.1879905
    Time (switch w/ continue): 11.3153017   # this is due to unfortunate design of how continue works :(
    Time (if/elseif): 0.393717
    Time (hash): 2.7821701

    With N = 20 possible values, results were (in seconds)
    Time (switch): 4.0056764
    Time (switch w/ continue): 12.2493424
    Time (if/elseif): 0.8128013
    Time (hash): 2.788083

    So we see switch and if/elseif both double execution time if number of cases doubles.  Hashtable has more overhead due to need for special scriptblock.Invoke(), but as predicted runtime remains constant even with double the # of cases.  Switch with "continue" is just sad :)

    Note, big improvements in V3.  "Continue" appears to be fixed in switch, and there is secret sauce that speeds up algorithmic code in general.  The overall trends don't change, though.


    N = 10
    Time (switch): 0.515829
    Time (switch w/ continue): 0.2616038
    Time (if/elseif): 0.2789372
    Time (hash): 1.1142537

    N=20
    Time (switch): 1.0582802
    Time (switch w/ continue): 0.5733133
    Time (if/elseif): 0.5456855
    Time (hash): 1.1806393

    Moral: HashTable of script blocks will save perf only if you have a lot of cases to consider (perhaps 50+ cases), otherwise the overhead will dominate and if/elseif is better.  In v3 break/continue is switches no longer causes perf issues (yay).

    Wednesday, February 1, 2012 9:30 PM
  • Fair enough.

     Question tho - why the .invoke()?  That will cause the creation and return of a collection, even on an empty script block.  The normal behaviour for Powershell is to use .invokereturnasis(), which I think requires less overhad.  I just used dot-sourcing to invoke the script block in the current scope (.$hash.$x)


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "
    • Edited by mjolinor Wednesday, February 1, 2012 9:50 PM
    Wednesday, February 1, 2012 9:42 PM
  • Good point, using InvokeReturnAsIs seems to have a bit less overhead.  Thanks for showing me that.

    I tried dot sourcing and invoking with &, and both were much slower than calling .Invoke*

    Wednesday, February 1, 2012 10:38 PM
  • Thanks for that tip.  I'll test with that here. 

     One other thing that may be a consideration is what happens in cases where most of the data is going to handled by the default actions (eg you're only going to match a few of the input objects).  In that kind of scenario, the advantage would probably be skewed toward the hash table, because you can't put the default at the top of the stack in the switch or the if/else.

    I hadn't tested with passing an entire array, on the assumption that if you're doing this you're probably dealing with a data set that's more than you want to try to get into memory, and are just passing objects through the pipeline and discarding them at the end.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "
    Wednesday, February 1, 2012 10:48 PM
  • first time I got these results: 16.1,.05,.02, next time (in the same session): .09,.05,.02.

    The second time switch was much closer to the time of the other two, likely because ir benefitted from some caching that was going on under the surface. Tomorrow I'l try this again with the three blocks in the reverse order to see if perhaps hash and if/else benefitted in some way from the switch that ran before.

    Wednesday, February 1, 2012 10:48 PM
  • Your second result is closer to what I see as the average - the If/Else comes in  about twice as fast as the Switch, and the hash table about twice as fast as the If/Else.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "
    Wednesday, February 1, 2012 10:51 PM
  • Testing with a single value won't give you representative data.  The random overhead and upfront cost of each approach will overwhelm the actual algorithmic complexity.  Use 1000 or more values to force the algorithmic complexity to outweigh the overhead.

    It's like asking who is a better hitter in baseball.  You can't look at a single at-bat, you need to look at dozens or hundreds of at-bats to see the real trend.

    Wednesday, February 1, 2012 11:16 PM
  • Not quite a baseball analogy in my mind.  To me it is more like asking a golfer to sink the same
    putt over and over from the same spot.  The early putts will tend to miss the hole more than the
    last ones as the golfer gets to know the lay of the land better and better.  The computer will have
    more of the resources cached in the last iterations than the first ones.
     
    Both the baseball and the golfer analogies have something to offer, but really, what's going on in
    the computer can be easily understood on its own merits.
     
    Wednesday, February 1, 2012 11:34 PM
  • Maybe I'm doing something wrong, then.  Using this test:

    $count = 1..100000
    $Switch_time = 0
    $IfElse_time = 0
    $Hash_time = 0
    
    
    $hash = @{
    0 = {}
    1 = {}
    2 = {}
    3 = {}
    4 = {}
    5 = {}
    6 = {}
    7 = {}
    8 = {}
    9 = {}
    default = {}
    }
    
    foreach ($i in $count){
    
    
    $x = 0..10 | get-random
    
    #Switch
    $Switch_time += (measure-command {
    switch ($x){
    0  {}
    1  {}
    2  {}
    3  {}
    4  {}
    5  {}
    6  {}
    7  {}
    8  {}
    9  {}
    default {}
    }
    }).totalmilliseconds
    
    #If/Else
    $IfElse_time += (measure-command {
    
    if ($x -eq 0){}
    elseif ($x -eq 1){}
    elseif ($x -eq 2){}
    elseif ($x -eq 3){}
    elseif ($x -eq 4){}
    elseif ($x -eq 5){}
    elseif ($x -eq 6){}
    elseif ($x -eq 7){}
    elseif ($x -eq 8){}
    elseif ($x -eq 9){}
    else {}
    }).totalmilliseconds
    
    #Hash
    $Hash_time += (measure-command {
    if ($hash.containskey($x)){.$hash.$x} else{.$hash.default}
    }).totalmilliseconds
    }
    
    

     

    I get these results on three passes:


    Switch time was 2670.89669999835
    If/Else time was 1694.83859999989
    Hash time was 1060.25020000139


    Switch time was 2554.29329999959
    If/Else time was 1656.70169999984
    Hash time was 998.208900001384


    Switch time was 2615.99469999905
    If/Else time was 1691.86409999991
    Hash time was 1055.77390000144

    That's with just 10 conditional tests plus a default  per method,  through 100,000 tests.

     

    UPDATE: Disclaimer - this is on machine with V3 installed.

    I will test also test on a V2 machine. 

     Same test on a VM running V2:

    Switch time was 6443.49329999985
    If/Else time was 3823.0860000002
    Hash time was 2814.46939999985

    Switch time was 6545.26159999965
    If/Else time was 3841.04629999999
    Hash time was 2805.49799999984

    Switch time was 6552.35759999961
    If/Else time was 3875.11650000039
    Hash time was 2845.69570000016


     

    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "



    • Edited by mjolinor Thursday, February 2, 2012 2:55 PM
    Wednesday, February 1, 2012 11:54 PM
  • In PSv2 there is no internal caching or optimization engine.  Any kind of "warm up" one experiences is due the powershell engine being built on .NET.  That mean JIT compilation and garbage collection can add noise to the performance, especially during the first execution or when accessing resources that have been left dormant for a long time.

    These effects are quite small, but all the more reason to use large data sets to test with.  A random delay of 10ms means nothing when your test takes 10 seconds, but means a lot when your test takes 20ms.

    High-performance data serving systems (think SQL or Google) do indeed have sophisticated caching logic which can have major effects on runtime.  Powershell just runs your code, in order, as you wrote it, and returns the result.   You get a bit of .NET warmup to consider, but it's not all that much if you are doing any nontrivial amount of work.

    So no, powershell v2 does not learn the lay of the land to any degree beyond what .NET does generally for all applications. 

    In all the time I've spent working with computers, I don't think I have once thought that "what's going on can be easily understood."  If you do, please tell me when you start your company so I can invest ;-)

    Thursday, February 2, 2012 12:21 AM
  • First off, I mistakenly said my tests were with 10K, in fact it was with 100K, like yours.

    Second - Your hash section has a bug.  You have too many braces, so all that's happening within Measure-Command is that you are creating a scriptblock.  No wonder it's fast!

    Besides that, your data lines up with mine in terms of relative speed.  My test script is inside-out of yours - I create a random list of data up front, then measue each approach by itself once with the whole data list.

    Now try increasing from 10 to 20 cases and see how each reacts...

    And btw, any v3 machine is a v2 machine.  Just start Powershell.exe -Version 2 from cmd.  It is using the exact same v2 bits, not just disabling new features but with v3 engine.

    Thursday, February 2, 2012 12:56 AM
  • Then why did you bring up the baseball batter analogy?
     
    .
    On 2/1/2012 6:21 PM, Lincoln Atkinson [MSFT] wrote:
    > ... So no, powershell v2 does not learn the lay of the land to any degree beyond what .NET does
    > generally for all applications.
    >
     
    Thursday, February 2, 2012 1:15 AM
  • It is most likely the underlying virtual memory system that creates a working set of pages that
    don't have to be unloaded and reloaded after the first time, that contributes the most to the
    expense of the first iteration and makes subsequent iterations much faster.
     
    Thursday, February 2, 2012 1:19 AM
  • Doh!

    Okay, now I get results that line up with yours. The break-even point seems to be about 40-45 conditions if I keep the input data within the range of the tests.  If I increase the range of numbers on the get-random so that more of it fails all the tests (excpt default) the advantage starts to go to the hash.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "
    Thursday, February 2, 2012 1:42 AM
  • Doh!

    Okay, now I get results that line up with yours. The break-even point seems to be about 40-45 conditions if I keep the input data within the range of the tests.  If I increase the range of numbers on the get-random so that more of it fails all the tests (excpt default) the advantage starts to go to the hash.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "

    Would that be sufficient evidence to justify making a corresponding rule of thumb? It seems to me that other factors could come in to play, including the "costs" of evaluating expressions (which could involve calls to functions that depend on factors outside the control of the script. Unless these tests were to show orders of magnitude differences in time, It makes more sense to me to choose the algorithm that most naturally suits what is being modeled.
    Thursday, February 2, 2012 5:23 AM
  • I think which method to prefer based purely on performance is going to depend on the scale of the data it's being applied to.  Saving 1ms per object for 10 objects probably isn't worth worrying about.  Saving 1ms per object for 10 million objects......

    Syntactically, I find both the hash table and the switch preferable to the if/else.  The hash table has the advantage of making it easy to move all the conditions/action pairs out of the middle of the surrounding code, which may lend inself to ease of maintenance and readability of the script. 

    I could also see employing it as a means of simplifying dynamic programming.  You could selectively replace any of the script blocks in the hash table within your script pretty easily using [scriptblock]::create() by using it's table key as a reference.  I think you'd be hard pressed to come up with a clean way to do that with either of the other two methods.


    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "
    • Edited by mjolinor Thursday, February 2, 2012 10:30 AM
    Thursday, February 2, 2012 10:06 AM
  • mjolinor,

    It would be nice if you could update the OP so that it doesn't have the non-executing hash bug. Doing my own testing I discovered this too, not having read down far enough through the thread to Lincoln's comment. Updating the OP might save others some frustration later.

    Thanks,

    Tim

    Thursday, February 2, 2012 2:16 PM
  • I agree that if/elseif/else/endif blocks become less intuitive and useful and more unweildy when there are more than a couple of elseif clauses.

    I suspect that, since the switch statement's use and format are specified by the language, that it might not require as much in the way of internal documentation to explain what the intent is as compared to the hash approach.

    Thursday, February 2, 2012 2:24 PM
  • Done.  Sorry, I should have thought of that!
    [string](0..33|%{[char][int](46+("686552495351636652556262185355647068516270555358646562655775 0645570").substring(($_*2),2))})-replace " "
    Thursday, February 2, 2012 2:56 PM
  • I wanted to do some performance testing on array population, so I used the below:

    Use two different kinds of arrays to start with:

    $array1 = @()
    [string[]]$array3 = new-object String[] 10000

     

    $s = 0  #a value for the Start-Sleep function
    $L = 10000  #an integer of repetitions to start with

    $i = 0

    ##t1 uses 'for', looks to $L every loop, uses the unsized array $array1 and has the Start-Sleep albeit at a value of '0'
    $t1 = (measure-command{
    for($i =0;$i -lt $L;$i++)
    {

    $array1 += ,@($i.ToString)

    Start-Sleep -m $s

    }
    }).totalmilliseconds
    #43833.3219  10000

    $i = 0
    $array1 = $null
    $array1 = @()

    #t2 removes the $L and goes for a hard value of 10000
    $t2 = (measure-command{
    for($i =0;$i -lt 10000;$i++)
    {

    $array1 += ,@($i.ToString)

    Start-Sleep -m $s

    }
    }).totalmilliseconds

    $i = 0
    $array1 = $null
    $array1 = @()

    #$t3 further removes the call to Start-Sleep
    $t3 =(measure-command{
    for($i =0;$i -lt 10000;$i++)
    {

    $array1 += ,@($i.ToString)

    #Start-Sleep -m $s

    }
    }).totalmilliseconds
    #14513.3572  10000

    $i = 0
    $array1 = $null
    $array1 = @()

    #$t4 further changes from a 'for' to a 'while'
    $t4 = (measure-command{
    while($i -lt 10000)
    {

    $array1 += ,@($i.ToString)

    #Start-Sleep -m $s
    $i+=1
    }
    }).totalmilliseconds
    # 14338.8104 with 1000


    $i = 0

    #$t5 finally uses the size array
    $t5 = (measure-command{
    while($i -lt 10000)
    {

    $array3[$i] = $i.ToString
    $i+=1
    #Start-Sleep -m $s

    }
    }).totalmilliseconds
    #339.727 with 10000


    't1',$t1
    '   '
    't2',$t2
    '  '
    't3',$t3
    '   '
    't4',$t4
    '   '
    't5',$t5

     

    There will be some surprises when you run this!

    J_L

    "Make your scripts fast, my son."


    Joe Leibowitz
    Thursday, February 2, 2012 6:57 PM