none
Join on first row of group RRS feed

  • Question

  • I'm doing multiple left outer joins, always on the same ID field.  Unfortunately, with each join the rows are multiplying exponentially.

    Is there a way to create on join on the ID field, but only on the first row of each ID field?


    • Edited by Ranga Wal Thursday, August 9, 2018 4:45 AM
    Thursday, August 9, 2018 4:43 AM

Answers

  • Here is the code which you will need to put into a blank query:

    /*SecondaryTablesPrefix:
      You can add prefixes to the new columns of the secondary tables, this is especially useful if
      there are conflicting column names. If you do not want to add prefixes to a particular table then
      make values within the list, null or "". 
      If you do not want prefixes at all then e.g. make {"Beta","Gamma"} into {} */  
    let
        PrimaryTable = AlphaTable 
       ,SecondaryTables = {BetaTable, GammaTable}
       ,SecondaryTablesPrefix = List.Transform({"Beta","Gamma"}, Text.From) 
       ,CommonKey = "ID"
       ,IsEmpty = (X) => (X = "" or X = null) //function for simplifying code 
       ,IDTbl = Table.Group(PrimaryTable,CommonKey,{"Primary", each _})
       ,TempTblColNames = List.Transform(List.Zip({{1..List.Count(SecondaryTables)},SecondaryTablesPrefix})
                                        ,each Text.From(_{0}) & 
                                         (if IsEmpty(_{1}) then  
                                          "" else
                                         "("&_{1}&")" ))
       ,AllTogether = List.Accumulate(List.Zip({SecondaryTables,TempTblColNames,SecondaryTablesPrefix})
                                     ,IDTbl
                                     ,(X,Y)=> 
                                         let
                                             Join = Table.NestedJoin(X,CommonKey,Y{0},CommonKey,Y{1},JoinKind.LeftOuter)
                                            ,fTransTbl = (Table as table) =>
                                                         let    
                                                             RemoveColumn = Table.RemoveColumns(Table,CommonKey)
                                                            ,Prefix = if IsEmpty(Y{2}) then
                                                                      RemoveColumn else
                                                                      Table.PrefixColumns(RemoveColumn,Y{2})
                                                         in
                                                             Prefix
                                            ,TransTbl = Table.TransformColumns(Join, {Y{1}, fTransTbl}) //Transform Nested table just joined
                                         in
                                             TransTbl)
       ,RemoveID = Table.Buffer(Table.RemoveColumns(AllTogether,CommonKey))
       ,LstColNames = List.Buffer(
                      List.Combine(
                      List.Transform(Record.ToList(RemoveID{0}),Table.ColumnNames)))
       ,NewColumn = Table.AddColumn(RemoveID
                                   ,"AllTablesSideBySide"
                                   ,(Rec as record)=> Table.FromColumns(
                                                      List.Combine(
                                                      List.Transform(Record.ToList(Rec),Table.ToColumns)),LstColNames))
       ,SelectCol = Table.SelectColumns(NewColumn,"AllTablesSideBySide") //To show what is happening when going through the steps
       ,Expand = Table.Combine(SelectCol[AllTablesSideBySide])
    in
        Expand

    Some things to note:

    • You will need to edit the first four variables: PrimaryTable, SecondaryTables, SecondaryTablesPrefix & CommonKey to suit your needs. 
    • This resulting table is just a way of displaying the data for a human to read.
    • You may want to fill down first on the ID column, if you want to filter or order on ID. This can easily be done using the power query graphical user interface. 
    • Although the code looks complicated, stepping though the applied steps will make it clear what each step does. 


    Saturday, August 11, 2018 9:12 AM
  • I recently had a similar task (I believe) and came up with this solution: 

    https://www.thebiccountant.com/2018/08/12/how-to-do-a-real-vlookup-false-in-power-query-or-power-bi/

    Not as sophisticated like Cams, as it doesn't cater for multiple tables, but could be useful for other purposes as well.


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!


    Sunday, August 12, 2018 11:33 AM
    Moderator

All replies

  • Hi Ranga, 

    You can use the Table.Distinct function first on the tables you are join. Something like Table.Distinct(SomeTable, {"SomeIDField"}). This will keep only first row for each "SomeIDField", even if the rows are different. Keep in mind, that any sorting before hand might be ignored by Table.Distinct, so you might want use Table.Buffer after your sort and before your Table.Distinct. 

    Another option (and not one you asked for) is that you could have each new column as separate lists. For example:

    So in this case: Van, Car and Truck are all 2 column tables, with one column "Date" as their ID field. We can put all those tables together so it looks like above. If this is the type of thing you want, say so. 

    • Edited by CamWally Thursday, August 9, 2018 6:57 AM
    Thursday, August 9, 2018 6:50 AM
  • Hi Ranga, 

    You can use the Table.Distinct function first on the tables you are join. Something like Table.Distinct(SomeTable, {"SomeIDField"}). This will keep only first row for each "SomeIDField", even if the rows are different. Keep in mind, that any sorting before hand might be ignored by Table.Distinct, so you might want use Table.Buffer after your sort and before your Table.Distinct. 

    Another option (and not one you asked for) is that you could have each new column as separate lists. For example:

    So in this case: Van, Car and Truck are all 2 column tables, with one column "Date" as their ID field. We can put all those tables together so it looks like above. If this is the type of thing you want, say so. 

    Hi CamWally,

    I'm not sure the first solution would work, as I need every row from each table I join.

    The second idea looks interesting - can you explain more?

    Thursday, August 9, 2018 8:31 AM
  • It's best you show us what your data looks like (the tables you are trying to join). And (if you know) what you want the output to look like. 

    Thursday, August 9, 2018 11:44 AM
  • I can't upload anything, so I'll try and describe the tables:

    1. Main table, contains project metadata, 1 row per project, with a unique Project ID.

    2. Multiple other tables with differing amounts of columns, with some one-to-one, others one-to-many, all left outer joins.  All other tables have a Project ID which is the linking field.  All of these tables are in their simplest format, and require no manipulation.

    Ultimately, I'm simply trying to get as few rows as possible.  I suspect you were eluding to another method of merging the tables?

    Thursday, August 9, 2018 12:04 PM
  • Here is the code which you will need to put into a blank query:

    /*SecondaryTablesPrefix:
      You can add prefixes to the new columns of the secondary tables, this is especially useful if
      there are conflicting column names. If you do not want to add prefixes to a particular table then
      make values within the list, null or "". 
      If you do not want prefixes at all then e.g. make {"Beta","Gamma"} into {} */  
    let
        PrimaryTable = AlphaTable 
       ,SecondaryTables = {BetaTable, GammaTable}
       ,SecondaryTablesPrefix = List.Transform({"Beta","Gamma"}, Text.From) 
       ,CommonKey = "ID"
       ,IsEmpty = (X) => (X = "" or X = null) //function for simplifying code 
       ,IDTbl = Table.Group(PrimaryTable,CommonKey,{"Primary", each _})
       ,TempTblColNames = List.Transform(List.Zip({{1..List.Count(SecondaryTables)},SecondaryTablesPrefix})
                                        ,each Text.From(_{0}) & 
                                         (if IsEmpty(_{1}) then  
                                          "" else
                                         "("&_{1}&")" ))
       ,AllTogether = List.Accumulate(List.Zip({SecondaryTables,TempTblColNames,SecondaryTablesPrefix})
                                     ,IDTbl
                                     ,(X,Y)=> 
                                         let
                                             Join = Table.NestedJoin(X,CommonKey,Y{0},CommonKey,Y{1},JoinKind.LeftOuter)
                                            ,fTransTbl = (Table as table) =>
                                                         let    
                                                             RemoveColumn = Table.RemoveColumns(Table,CommonKey)
                                                            ,Prefix = if IsEmpty(Y{2}) then
                                                                      RemoveColumn else
                                                                      Table.PrefixColumns(RemoveColumn,Y{2})
                                                         in
                                                             Prefix
                                            ,TransTbl = Table.TransformColumns(Join, {Y{1}, fTransTbl}) //Transform Nested table just joined
                                         in
                                             TransTbl)
       ,RemoveID = Table.Buffer(Table.RemoveColumns(AllTogether,CommonKey))
       ,LstColNames = List.Buffer(
                      List.Combine(
                      List.Transform(Record.ToList(RemoveID{0}),Table.ColumnNames)))
       ,NewColumn = Table.AddColumn(RemoveID
                                   ,"AllTablesSideBySide"
                                   ,(Rec as record)=> Table.FromColumns(
                                                      List.Combine(
                                                      List.Transform(Record.ToList(Rec),Table.ToColumns)),LstColNames))
       ,SelectCol = Table.SelectColumns(NewColumn,"AllTablesSideBySide") //To show what is happening when going through the steps
       ,Expand = Table.Combine(SelectCol[AllTablesSideBySide])
    in
        Expand

    Some things to note:

    • You will need to edit the first four variables: PrimaryTable, SecondaryTables, SecondaryTablesPrefix & CommonKey to suit your needs. 
    • This resulting table is just a way of displaying the data for a human to read.
    • You may want to fill down first on the ID column, if you want to filter or order on ID. This can easily be done using the power query graphical user interface. 
    • Although the code looks complicated, stepping though the applied steps will make it clear what each step does. 


    Saturday, August 11, 2018 9:12 AM
  • I recently had a similar task (I believe) and came up with this solution: 

    https://www.thebiccountant.com/2018/08/12/how-to-do-a-real-vlookup-false-in-power-query-or-power-bi/

    Not as sophisticated like Cams, as it doesn't cater for multiple tables, but could be useful for other purposes as well.


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!


    Sunday, August 12, 2018 11:33 AM
    Moderator
  • The function that I wrote above is a bit different. It splits each table by their ID. And those tables are then placed side by side. It can generate a fair few nulls as some tables will have more rows than the other. It is useful for seeing the full range of values for each ID. 
    • Edited by CamWally Tuesday, August 14, 2018 1:06 PM
    Tuesday, August 14, 2018 1:02 PM