none
Difference between Binary.Combine and Table.Combine RRS feed

  • Question

  • Hi,

    What is difference between Binary.Combine and Table.Combine? When I look at them, both of them are working in the way of appending data (Union all in SQL). Any difference between them?

    Is Binary.Combine faster then Table.Combine? Or the other way around?

    Thanks,

    Kuan

    Tuesday, November 3, 2015 3:31 PM

Answers

  • I'm guessing those two methods have pretty similar perf characteristics, since in both cases the data can be streamed and doesn't have to be fully read in order to start returning table rows. (This wouldn't necessarily be the case for non-CSV formats.)

    Feel free to try them both and post your findings here.

    Ehren


    Tuesday, November 3, 2015 7:15 PM
    Owner
  • Below is the test result:

    Binary.Combine:

    let

        Source = Folder.Files("C:\......"),
        #"Filtered Rows" = Table.SelectRows(Source, each [Name] = "File1.csv" or [Name] = "File2.csv" or [Name] = "File3.csv" or [Name] = "File4.csv"),  
        #"Combined Binaries" = Binary.Combine(#"Filtered Rows"[Content]),
        #"Imported CSV" = Csv.Document(#"Combined Binaries",[Delimiter=",",Encoding=65001]),
        #"Promoted Headers" = Table.PromoteHeaders(#"Imported CSV"),
        #"Filtered Rows1" = Table.SelectRows(#"Promoted Headers", each ([P_PERIOD] <> "P_PERIOD"))
    in
        #"Filtered Rows1"

    Total 667287 rows. Takes around 15s to refresh the query.

    ----------------------------------------------------------------------------------------------------------------------

    Table.Combine:

    let
        ReadFile1="...\File1.csv",
        ReadFile2="...\File2.csv",
        ReadFile3="...\File3.csv",
        ReadFile4="...\File4.csv",
        CSV1=Table.PromoteHeaders(Csv.Document(File.Contents(ReadFile1),[Delimiter=",",Encoding=65001])),
        CSV2=Table.PromoteHeaders(Csv.Document(File.Contents(ReadFile2),[Delimiter=",",Encoding=65001])),
        CSV3=Table.PromoteHeaders(Csv.Document(File.Contents(ReadFile3),[Delimiter=",",Encoding=65001])),
        CSV4=Table.PromoteHeaders(Csv.Document(File.Contents(ReadFile4),[Delimiter=",",Encoding=65001])),
        Result1=Table.Combine({CSV1,CSV2}),
        Result2=Table.Combine({Result1,CSV3}),
        Result3=Table.Combine({Result2,CSV4})
      
    in
        Result3

    Total 667287 rows. Also takes about 15s to refresh the query.


    Tuesday, November 3, 2015 7:42 PM

All replies

  • Binary.Combine combines multiple binary values into a single binary value. Table.Combine combines tables.

    Binary.Combine is great when the files you're combining don't have characteristics that would make the combined result "invalid" in some way. For example, XML files require a single root element, so combining them results in a value that can't be processed by the M XML functions. CSV files with headers can be combined, but this results in header rows in the middle of the dataset which have to be filtered out.

    If the files can't be combined to produce a valid result, it's probably easier to convert each file to a table first, and then combine them using Table.Combine.

    Ehren

    Tuesday, November 3, 2015 6:01 PM
    Owner
  • Hi Ehren,

    If I am about to combine come CSV files (same format, same columns).

    Method 1: Use Binary.Combine to combine them and load to excel (data model)

    Method 2: Convert each CSV file to a table, and then append using Table.Combine.

    Which method will be faster?  

    Thanks

    Tuesday, November 3, 2015 7:10 PM
  • I'm guessing those two methods have pretty similar perf characteristics, since in both cases the data can be streamed and doesn't have to be fully read in order to start returning table rows. (This wouldn't necessarily be the case for non-CSV formats.)

    Feel free to try them both and post your findings here.

    Ehren


    Tuesday, November 3, 2015 7:15 PM
    Owner
  • Below is the test result:

    Binary.Combine:

    let

        Source = Folder.Files("C:\......"),
        #"Filtered Rows" = Table.SelectRows(Source, each [Name] = "File1.csv" or [Name] = "File2.csv" or [Name] = "File3.csv" or [Name] = "File4.csv"),  
        #"Combined Binaries" = Binary.Combine(#"Filtered Rows"[Content]),
        #"Imported CSV" = Csv.Document(#"Combined Binaries",[Delimiter=",",Encoding=65001]),
        #"Promoted Headers" = Table.PromoteHeaders(#"Imported CSV"),
        #"Filtered Rows1" = Table.SelectRows(#"Promoted Headers", each ([P_PERIOD] <> "P_PERIOD"))
    in
        #"Filtered Rows1"

    Total 667287 rows. Takes around 15s to refresh the query.

    ----------------------------------------------------------------------------------------------------------------------

    Table.Combine:

    let
        ReadFile1="...\File1.csv",
        ReadFile2="...\File2.csv",
        ReadFile3="...\File3.csv",
        ReadFile4="...\File4.csv",
        CSV1=Table.PromoteHeaders(Csv.Document(File.Contents(ReadFile1),[Delimiter=",",Encoding=65001])),
        CSV2=Table.PromoteHeaders(Csv.Document(File.Contents(ReadFile2),[Delimiter=",",Encoding=65001])),
        CSV3=Table.PromoteHeaders(Csv.Document(File.Contents(ReadFile3),[Delimiter=",",Encoding=65001])),
        CSV4=Table.PromoteHeaders(Csv.Document(File.Contents(ReadFile4),[Delimiter=",",Encoding=65001])),
        Result1=Table.Combine({CSV1,CSV2}),
        Result2=Table.Combine({Result1,CSV3}),
        Result3=Table.Combine({Result2,CSV4})
      
    in
        Result3

    Total 667287 rows. Also takes about 15s to refresh the query.


    Tuesday, November 3, 2015 7:42 PM