locked
Merge multiple files when some of them may not exist RRS feed

  • Question

  • Hi,

    I have these language files in the same folder.
    en.csv
    fr.csv
    ja.csv
    etc.

    I have pulled them into Power Query and now each of them is a separate query/

    I want to merge them such that the ID from EN becomes the ID for the final table. Then I will merge FR into EN matching LEFTOUTER, then expanding the merge. I then merge JA into the resulting query matching LEFTOUTER nd expanding the merge again. I will repeat this until there are no more csv files to merge.

    The final output should be like this:
    ID   ENValue FRValue JAValue ...

    ..        ..            ..         ..       ...

    However, I have a problem. Some of the language file may be in that folder at some point in time. In other times, they maybe missing. I want return an empty XXValue when the language file XX is missing.

    I tried to merge JA query when that table is empty, but PQ gives me error.

    I have also looked into combining all files in a folder. But this will only allow me to combine, not merge.

    Any ideas to accomplish my goal?

    Thanks!

    Thursday, June 6, 2019 10:52 AM

Answers

  • OK, so I was able to do this using an n-sized list of tables, with the only requirement being your master table is the first table in the list and I'm assuming the ID and Value fields have the same name in each table. Here's my sample tables standing in for the CSVs

    Here's those tables loaded into Power Query in a query called LanguageTables

    And this is the code I put in an empty query that is processing the LanguageTables

    let
        Source = 
    List.Accumulate(
     //This is the list of tables we will iterate over, which is every table except the first one
     Table.ToRecords(Table.RemoveFirstN(LanguageTables,1)),
    
     //This is the first table in the list, that we will join against
     Table.RenameColumns(LanguageTables{0}[Data], {"Value", LanguageTables{0}[Language]&"Value"}),
    
     //This does a join starting with he first table in the list against each other table in the list and expands the results of the join into a column named for the language
     (state,current)=>
     Table.ExpandTableColumn(
      Table.NestedJoin(state,{"ID"},current[Data],{"ID"},"JoinResults",JoinKind.LeftOuter)
     , "JoinResults", {"Value"}, {current[Language]&"Value"})
    )
    in
        Source
    Which results in this output


    There's some obnoxious datatype conversions going on here because the only way I know of to do something like this in PQ involves using the List.Accumulate function, which requires the data it iterates over to exist in a List. There may be other ways to do it, but this works other than the fact it won't list anything if a table is "missing". That could probably be handled by having a master list of languages and a try/otherwise on the join.

    The best explanation of how List.Accumulate works that I've seen is here:
    https://radacad.com/list-accumulate-hidden-gem-of-power-query-list-functions-in-power-bi




    • Edited by pdelia Thursday, June 6, 2019 7:51 PM
    • Marked as answer by duclepham Saturday, June 8, 2019 3:11 PM
    Thursday, June 6, 2019 7:45 PM
  • Hi,

    Suppose, you have following data:

    

    Then you may use following pattern for join:

    let
        join1 = try [a = Table.NestedJoin(Table_en,{"ID"},Table_fr,{"ID"},"Table_fr",JoinKind.LeftOuter),
                     b = Table.ExpandTableColumn(a, "Table_fr", {"data"}, {"Table_fr.data"})][b] otherwise Table_en,
        join2 = try [a = Table.NestedJoin(join1,{"ID"},Table_ja,{"ID"},"Table_ja",JoinKind.LeftOuter),
                     b = Table.ExpandTableColumn(a, "Table_ja", {"data"}, {"Table_ja.data"})][b] otherwise join1
    in
        join2


    But I guess, combine files in the folder and then transform data (with Table.Pivot or Table.Group functions, perhaps) would be more elegant solution.

    Thursday, June 6, 2019 2:17 PM

All replies

  • Hi,

    Suppose, you have following data:

    

    Then you may use following pattern for join:

    let
        join1 = try [a = Table.NestedJoin(Table_en,{"ID"},Table_fr,{"ID"},"Table_fr",JoinKind.LeftOuter),
                     b = Table.ExpandTableColumn(a, "Table_fr", {"data"}, {"Table_fr.data"})][b] otherwise Table_en,
        join2 = try [a = Table.NestedJoin(join1,{"ID"},Table_ja,{"ID"},"Table_ja",JoinKind.LeftOuter),
                     b = Table.ExpandTableColumn(a, "Table_ja", {"data"}, {"Table_ja.data"})][b] otherwise join1
    in
        join2


    But I guess, combine files in the folder and then transform data (with Table.Pivot or Table.Group functions, perhaps) would be more elegant solution.

    Thursday, June 6, 2019 2:17 PM
  • OK, so I was able to do this using an n-sized list of tables, with the only requirement being your master table is the first table in the list and I'm assuming the ID and Value fields have the same name in each table. Here's my sample tables standing in for the CSVs

    Here's those tables loaded into Power Query in a query called LanguageTables

    And this is the code I put in an empty query that is processing the LanguageTables

    let
        Source = 
    List.Accumulate(
     //This is the list of tables we will iterate over, which is every table except the first one
     Table.ToRecords(Table.RemoveFirstN(LanguageTables,1)),
    
     //This is the first table in the list, that we will join against
     Table.RenameColumns(LanguageTables{0}[Data], {"Value", LanguageTables{0}[Language]&"Value"}),
    
     //This does a join starting with he first table in the list against each other table in the list and expands the results of the join into a column named for the language
     (state,current)=>
     Table.ExpandTableColumn(
      Table.NestedJoin(state,{"ID"},current[Data],{"ID"},"JoinResults",JoinKind.LeftOuter)
     , "JoinResults", {"Value"}, {current[Language]&"Value"})
    )
    in
        Source
    Which results in this output


    There's some obnoxious datatype conversions going on here because the only way I know of to do something like this in PQ involves using the List.Accumulate function, which requires the data it iterates over to exist in a List. There may be other ways to do it, but this works other than the fact it won't list anything if a table is "missing". That could probably be handled by having a master list of languages and a try/otherwise on the join.

    The best explanation of how List.Accumulate works that I've seen is here:
    https://radacad.com/list-accumulate-hidden-gem-of-power-query-list-functions-in-power-bi




    • Edited by pdelia Thursday, June 6, 2019 7:51 PM
    • Marked as answer by duclepham Saturday, June 8, 2019 3:11 PM
    Thursday, June 6, 2019 7:45 PM
  • Thanks @pdelia

    This method looks promising. However, my Language Table has Data type Binary, not Table. So how can I modify the code to use your method?


    Right now PQ throws me the error:

    Expression.Error: We cannot convert a value of type Binary to type Table.
    Details:
        Value=Binary
        Type=Type


    • Edited by duclepham Friday, June 7, 2019 10:31 AM
    Friday, June 7, 2019 10:29 AM
  • The .csv files need to be converted into tables first. If you add a custom column and put this in it, it should do the trick, though I don't know the specifics of those .csv files so the settings you need might be different.

    Table.PromoteHeaders(Csv.Document([Content],[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None]))

    Friday, June 7, 2019 11:35 PM
  • Thanks! I  applied your new code and it worked!

    Saturday, June 8, 2019 3:10 PM