locked
BinaryFormatter and byte[] - bug or feature? RRS feed

  • Question

  • Hi,

    I faced an issue w/ the BinaryFormatter in PS, which took me a couple of hours to understand **why** -- as I never saw it happen in C#!

    Not really sure if this is the expected behavior... so trying to figure out if this is a feature... or a bug!

    Here it is...

    For this example, I used up a text file, called aaa, with 10K+ bytes. And yes, in this example, size does matter!

    Also, the file serialization is not the goal... is just an easy way to have a considerable array of data of different data types.

    PS C:\temp> dir aaa
    
    
        Directory: C:\temp
    
    
    Mode                LastWriteTime         Length Name
    ----                -------------         ------ ----
    -a----       2017-10-20     16:40          13542 aaa
    

    Loading and serializing the file, looks trivial:

    PS C:\temp> $x=gc .\aaa
    PS C:\temp> $x.GetType()
    
    IsPublic IsSerial Name                                     BaseType
    -------- -------- ----                                     --------
    True     True     Object[]                                 System.Array
    
    
    PS C:\temp> $x[0].GetType()
    
    IsPublic IsSerial Name                                     BaseType
    -------- -------- ----                                     --------
    True     True     String                                   System.Object
    
    PS C:\temp> $x.Length
    33
    PS C:\temp> $ms=New-Object System.IO.MemoryStream; try{ date; (New-Object System.Runtime.Serialization.Formatters.Binary.BinaryFormatter).Serialize( $ms, $x); date; $ms.Length }Finally{ $ms.Close() }
    
    Monday, Nov 20, 2017 15:01:45
    Monday, Nov 20, 2017 15:01:45
    93333

    So, the serialization of the object[] happens in a glitch, no issues so far.

    However, when I import the file as a byte[]:

    PS C:\temp> $x=gc .\aaa -Encoding Byte
    PS C:\temp> $x.Length
    13542
    PS C:\temp> $x.GetType()
    
    IsPublic IsSerial Name                                     BaseType
    -------- -------- ----                                     --------
    True     True     Object[]                                 System.Array
    
    PS C:\temp> $ms=New-Object System.IO.MemoryStream; try{ date; (New-Object System.Runtime.Serialization.Formatters.Binary.BinaryFormatter).Serialize( $ms, $x); date; $ms.Length }Finally{ $ms.Close() }
    
    Monday, Nov 20, 2017 15:03:01
    Monday, Nov 20, 2017 15:06:57
    35557612
    

    Not only the serialized image is 381 times bigger than the text one, as it takes almost 4 mins to serialize it!!

    If however, I force the cast to byte[]

    PS C:\temp> $x=[byte[]]$x
    PS C:\temp> $x.GetType()
    
    IsPublic IsSerial Name                                     BaseType
    -------- -------- ----                                     --------
    True     True     Byte[]                                   System.Array
    
    PS C:\temp> $ms=New-Object System.IO.MemoryStream; try{ date; (New-Object System.Runtime.Serialization.Formatters.Binary.BinaryFormatter).Serialize( $ms, $x); date; $ms.Length }Finally{ $ms.Close() }
    
    Monday, Nov 20, 2017 15:07:58
    Monday, Nov 20, 2017 15:07:58
    13570

    It gives a more acceptable size and is done in a glitch!

    I thought it may be due to boxing; as in the first byte[] load, $x was object[], it is probably boxing each byte in an object... so, I decided to box, not a byte[] but a int[], expecting similar results. However...

    PS C:\temp> $x=[object[]][int[]]$x
    PS C:\temp> $x.GetType()
    
    IsPublic IsSerial Name                                     BaseType
    -------- -------- ----                                     --------
    True     True     Object[]                                 System.Array
    
    PS C:\temp> $x[0].GetType()
    
    IsPublic IsSerial Name                                     BaseType
    -------- -------- ----                                     --------
    True     True     Int32                                    System.ValueType
    
    PS C:\temp> $ms=New-Object System.IO.MemoryStream; try{ date; (New-Object System.Runtime.Serialization.Formatters.Binary.BinaryFormatter).Serialize( $ms, $x); date; $ms.Length }Finally{ $ms.Close() }
    
    Monday, Nov 20, 2017 15:08:38
    Monday, Nov 20, 2017 15:08:38
    81279
    

    It currently looks well!

    I made many more experiments and looks like that when is a byte[], the system runs wild and the serialization takes forever.

    I definitely need to force the cast to byte[] to have a normal behavior... However, any other type (any array of any value type, not byte, or class) works as expected.

    Am I missing something ??

    Best regards,

    José


    Monday, November 20, 2017 2:32 PM

All replies

  • PowerShell is not C# and cannot be made to work like C#.

    To read a file as bytes we would do this:

    $a = Get-Content ,filename> -Raw

    Now you have bytes and not an array of strings.


    \_(ツ)_/

    Monday, November 20, 2017 3:06 PM
  • I don't pretend to completely understand your smart posts, but I wonder if powershell core would work better for you.  They just put out a release candidate.  https://github.com/PowerShell/PowerShell/releases




    • Edited by JS2010 Monday, November 20, 2017 3:55 PM
    Monday, November 20, 2017 3:44 PM
  • Hi jrv,

    I know PS and C# are different... but this is about the way the language uses the .NET framework. Also, the issue is not about the file read; as I stated before, this was used only to help to simulate the issue.

    I find this accidentally when serializing an abstract object... sometimes it would take nothing, other times several minutes.

    I find out that is you have an object that happens to be a byte[] but is presented as object, this will happen... however, any other array of vaue types (int, short, doubles, whatever) it works correctly.

    Also, the size of the MemoryStream in the different scenarios makes me believe that in this particular case, not sure why, it is boxing the values... but is just a wild guess. 

    I am just checking if someone knows the why of this strange behavior.

    About the -Raw argument, it is not mandatory to send a byte[]. 

    PS C:\temp> (gc .\aaa -Raw).gettype()
    
    IsPublic IsSerial Name                                     BaseType
    -------- -------- ----                                     --------
    True     True     String                                   System.Object
    
    PS C:\temp> ([byte[]](gc .\aaa -Raw)).gettype()
    Cannot convert value "
    ...

    while

    PS C:\temp> (gc .\aaa -Encoding Byte).gettype()
    
    IsPublic IsSerial Name                                     BaseType
    -------- -------- ----                                     --------
    True     True     Object[]                                 System.Array
    
    PS C:\temp> ([byte[]](gc .\aaa -Encoding Byte)).gettype()
    
    IsPublic IsSerial Name                                     BaseType
    -------- -------- ----                                     --------
    True     True     Byte[]                                   System.Array

    Regards,

    José


    Monday, November 20, 2017 5:50 PM
  • Hi JS2010,

    Will take a look; however, is a RC, so for production purposes, is still baking :)

    Regards,

    José

    Monday, November 20, 2017 5:51 PM
  • Hi,

    This is a quick note to let you know that I am currently performing research on this issue and will get back to you as soon as possible. I appreciate your patience.

    If you have any updates during this process, please feel free to let me know.

    Best Regards,
    Albert Ling

    Please remember to mark the replies as an answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com

    Tuesday, November 21, 2017 12:25 PM
  • Hi Albert,

    Thanks for your update; I'll wait for the results (for the time being, I bypassed it with a workaround).

    Regards,

    José



    Tuesday, November 21, 2017 6:08 PM
  • Hi Albert,

    did you got any feedback on this?

    Regards,

    José


    José

    Monday, December 4, 2017 5:33 PM