none
Binary.Decompress() and zip files

    Question

  • Hi all,

    I've been playing around with Binary.Decompress() and while I can use it to open a csv file stored in a .gzip file using the Compression.GZip option, and I can use it to compress and then decompress a file using Compression.Deflate, when I try to open a .csv file stored in a .zip file it doesn't work. Is this something that I should be able to do?

    I got an error when I first tried using Binary.Decompress() and Compression.Deflate that, when I searched on the internet, suggested that the first two bytes in the binary might be the problem. I got as far as this query:

    let
        Source = File.Contents("C:\Users\Chris\Documents\Power BI Dashboard Designer\CompressedData.zip"),
        MyBinaryFormat = BinaryFormat.Record([FirstTwoBytes=BinaryFormat.Binary(2), TheRest=BinaryFormat.Binary()]),
        GetDataToDecompress = MyBinaryFormat(Source)[TheRest],
        DecompressData = Binary.Decompress(GetDataToDecompress, Compression.Deflate)
    in
        DecompressData

    ...but while the error has gone, I'm not extracting my csv file either. Does anyone have any suggestions?

    Thanks,

    Chris


    Check out my MS BI blog I also do SSAS, PowerPivot, MDX and DAX consultancy and run public SQL Server and BI training courses in the UK

    Friday, November 20, 2015 10:21 PM

Answers

  • Here is my general solution.

    I've written up the solution on my web site, but due to no account verification I cannot post the link with the detailed explanation.  However, here is the code:

    let
        Source = File.Contents("C:\CompressedData.zip"),
    
        DecompressFiles = (ZIPFile, Position, FileToExtract, DataSoFar) => 
        let 
        
        MyBinaryFormat = try BinaryFormat.Record([DataToSkip=BinaryFormat.Binary(Position), 
    					  MiscHeader=BinaryFormat.Binary(18), 
    					  FileSize=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
    					  UnCompressedFileSize=BinaryFormat.Binary(4),
    					  FileNameLen=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
    					  ExtrasLen=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
    					  TheRest=BinaryFormat.Binary()]) otherwise null,
    
        MyCompressedFileSize = try MyBinaryFormat(ZIPFile)[FileSize]+1 otherwise null,
    
        MyFileNameLen = try MyBinaryFormat(ZIPFile)[FileNameLen] otherwise null,
    
        MyExtrasLen = try MyBinaryFormat(ZIPFile)[ExtrasLen] otherwise null,
    
        MyBinaryFormat2 = try BinaryFormat.Record([DataToSkip=BinaryFormat.Binary(Position), Header=BinaryFormat.Binary(30), Filename=BinaryFormat.Text(MyFileNameLen), Extras=BinaryFormat.Binary(MyExtrasLen), Data=BinaryFormat.Binary(MyCompressedFileSize), TheRest=BinaryFormat.Binary()]) otherwise null,
    
        MyFileName = try MyBinaryFormat2(ZIPFile)[Filename] otherwise null,
    
        GetDataToDecompress = try MyBinaryFormat2(ZIPFile)[Data] otherwise null,
    
        DecompressData = try Binary.Decompress(GetDataToDecompress, Compression.Deflate) otherwise null,
    
        NewPosition = try Position + 30 + MyFileNameLen + MyExtrasLen + MyCompressedFileSize - 1 otherwise null,
    
        AsATable = Table.FromRecords({[Filename = MyFileName, Content=DecompressData]}),
    
        #"Appended Query" = if DecompressData = null then DataSoFar else if (MyFileName = FileToExtract) then AsATable else
    			if (FileToExtract = "") and Position <> 0 then Table.Combine({DataSoFar, AsATable})
    			else AsATable    
    
        in
    
         if  (MyFileName = FileToExtract) or (#"Appended Query" = DataSoFar) then
    
           #"Appended Query"
    
         else 
    
           @DecompressFiles(ZIPFile, NewPosition, FileToExtract, #"Appended Query"),
    
        MyData = DecompressFiles(Source, 0, "", null)
    
    in
        MyData
    As previously described, 

    Basically, you parse the binary zip file and extract information showing exactly where the compressed data is and use Binary.Decompress.

    It has worked with most of the zip files that I have come across.

    KenR


    Sunday, February 14, 2016 9:15 AM
  • Hi Chris,

    GZIP and ZIP are not the same format. GZIP compresses binary streams and ZIP compresses a directory.

    The issue with adding ZIP support is that it's quite a mess in practice. There's a specification for it, but different operations systems and tools produce slightly different variations on the format. As I recall, character encoding of directory paths was one complicated issue. There are other such as varying compression algorithms. Common unzip tools such as 7Zip have a lot of built-in knowledge about what edge cases to expect.

    If we ever add support for it, it will likely be through a third-party library, at least internally. Feel free to post a request for it on PBI user voice if this is something you would like to see. :-)

    Tristan

    Tuesday, November 24, 2015 12:44 AM
    Moderator

All replies

  • Hi Chris,

    GZIP and ZIP are not the same format. GZIP compresses binary streams and ZIP compresses a directory.

    The issue with adding ZIP support is that it's quite a mess in practice. There's a specification for it, but different operations systems and tools produce slightly different variations on the format. As I recall, character encoding of directory paths was one complicated issue. There are other such as varying compression algorithms. Common unzip tools such as 7Zip have a lot of built-in knowledge about what edge cases to expect.

    If we ever add support for it, it will likely be through a third-party library, at least internally. Feel free to post a request for it on PBI user voice if this is something you would like to see. :-)

    Tristan

    Tuesday, November 24, 2015 12:44 AM
    Moderator
  • Wednesday, November 25, 2015 10:53 AM
  • I’ve had success today using Power Query to unzip a .zip file. In my case a compressed xml file.
     This is the code that I used:

    let
        Source = File.Contents("C:\Sheet1.zip"),
    
        MyBinaryFormat = BinaryFormat.Record([MiscHeader=BinaryFormat.Binary(18), 
     FileSize=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
     UnCompressedFileSize=BinaryFormat.Binary(4),
     FileNameLen=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
     ExtrasLen=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
     TheRest=BinaryFormat.Binary()]),
    
        MyCompressedFileSize = MyBinaryFormat(Source)[FileSize]+1,
        MyFileNameLen = MyBinaryFormat(Source)[FileNameLen],
        MyExtrasLen = MyBinaryFormat(Source)[ExtrasLen],
    
    
        MyBinaryFormat2 = BinaryFormat.Record([Header=BinaryFormat.Binary(30), Filename=BinaryFormat.Binary(MyFileNameLen), Extras=BinaryFormat.Binary(MyExtrasLen), Data=BinaryFormat.Binary(MyCompressedFileSize), TheRest=BinaryFormat.Binary()]),
    
        GetDataToDecompress = MyBinaryFormat2(Source)[Data],
        DecompressData = Binary.Decompress(GetDataToDecompress, Compression.Deflate),
        #"Imported XML" = Xml.Tables(DecompressData)
    in
        #"Imported XML" 


    Basically, I parse the file once to grab some metadata that is needed (filesize and filename length).

    Then I parse the file to extract the compressed data and discard anything after the compressed data.

    I’m not sure how broadly this would work – obviously deflated files only. It could be fairly easily adapted to extract files from .zip files with multiple files.

    Ken




    Sunday, February 07, 2016 2:08 AM
  • Here is my general solution.

    I've written up the solution on my web site, but due to no account verification I cannot post the link with the detailed explanation.  However, here is the code:

    let
        Source = File.Contents("C:\CompressedData.zip"),
    
        DecompressFiles = (ZIPFile, Position, FileToExtract, DataSoFar) => 
        let 
        
        MyBinaryFormat = try BinaryFormat.Record([DataToSkip=BinaryFormat.Binary(Position), 
    					  MiscHeader=BinaryFormat.Binary(18), 
    					  FileSize=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
    					  UnCompressedFileSize=BinaryFormat.Binary(4),
    					  FileNameLen=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
    					  ExtrasLen=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
    					  TheRest=BinaryFormat.Binary()]) otherwise null,
    
        MyCompressedFileSize = try MyBinaryFormat(ZIPFile)[FileSize]+1 otherwise null,
    
        MyFileNameLen = try MyBinaryFormat(ZIPFile)[FileNameLen] otherwise null,
    
        MyExtrasLen = try MyBinaryFormat(ZIPFile)[ExtrasLen] otherwise null,
    
        MyBinaryFormat2 = try BinaryFormat.Record([DataToSkip=BinaryFormat.Binary(Position), Header=BinaryFormat.Binary(30), Filename=BinaryFormat.Text(MyFileNameLen), Extras=BinaryFormat.Binary(MyExtrasLen), Data=BinaryFormat.Binary(MyCompressedFileSize), TheRest=BinaryFormat.Binary()]) otherwise null,
    
        MyFileName = try MyBinaryFormat2(ZIPFile)[Filename] otherwise null,
    
        GetDataToDecompress = try MyBinaryFormat2(ZIPFile)[Data] otherwise null,
    
        DecompressData = try Binary.Decompress(GetDataToDecompress, Compression.Deflate) otherwise null,
    
        NewPosition = try Position + 30 + MyFileNameLen + MyExtrasLen + MyCompressedFileSize - 1 otherwise null,
    
        AsATable = Table.FromRecords({[Filename = MyFileName, Content=DecompressData]}),
    
        #"Appended Query" = if DecompressData = null then DataSoFar else if (MyFileName = FileToExtract) then AsATable else
    			if (FileToExtract = "") and Position <> 0 then Table.Combine({DataSoFar, AsATable})
    			else AsATable    
    
        in
    
         if  (MyFileName = FileToExtract) or (#"Appended Query" = DataSoFar) then
    
           #"Appended Query"
    
         else 
    
           @DecompressFiles(ZIPFile, NewPosition, FileToExtract, #"Appended Query"),
    
        MyData = DecompressFiles(Source, 0, "", null)
    
    in
        MyData
    As previously described, 

    Basically, you parse the binary zip file and extract information showing exactly where the compressed data is and use Binary.Decompress.

    It has worked with most of the zip files that I have come across.

    KenR


    Sunday, February 14, 2016 9:15 AM