none
Create Groupings Based On Name Repetitions RRS feed

  • Question

  • I have a table containing two columns of categorical variables.

    I want to group all the variables that are linked to each other, like in the picture below, based on how the names are repeated.

    I really welcome any help!

    Friday, November 4, 2016 7:48 PM

Answers

  • Below the code with double recursion.

    let
        Source = Excel.CurrentWorkbook(){[Name="Input"]}[Content],
        Typed = Table.TransformColumnTypes(Source,{{"COL1", type text}, {"COL2", type text}}),
        Group = Table.AddColumn(Typed, "Group", each null),
        
        fnCreateGroup = (GroupID as number, TableSoFar as table) as any =>
        let
            FirstOfNextGroup = Table.Skip(TableSoFar, each [Group] <> null),
            InitialGroupMembers = List.Distinct({FirstOfNextGroup{0}[COL1],FirstOfNextGroup{0}[COL2]}),
            AllGroupMembers = fnAddRowsToGroup(GroupID, InitialGroupMembers, TableSoFar),
            TableWithGroup = if List.Contains(AllGroupMembers[Group],null) then @fnCreateGroup(GroupID + 1, AllGroupMembers) else AllGroupMembers
        in
            TableWithGroup,
       fnAddRowsToGroup = (Id as number, Members as list, TableSoFar) as table =>
       let
            AdditionalMembers = Table.AddColumn(TableSoFar, "Group1", each if [Group] <> null then [Group] else if List.ContainsAny(Members,{[COL1],[COL2]}) then Id else null),
            Filtered = Table.Buffer(Table.SelectRows(AdditionalMembers, each ([Group1] = Id))),
            NewMembers = List.Distinct(List.Combine({Members,Filtered[COL1],Filtered[COL2]})),
            Removed = Table.RemoveColumns(AdditionalMembers,{"Group"}),
            NewTable = Table.RenameColumns(Removed,{{"Group1", "Group"}}),
            TableWithGroupMembers = if List.Count(NewMembers) > List.Count(Members) then @fnAddRowsToGroup(Id, NewMembers, NewTable) else NewTable
            
        in
            TableWithGroupMembers,
        AllGroups = fnCreateGroup(1, Group),
        Prefixed = Table.TransformColumns(AllGroups, {{"Group", each "Group " & Text.From(_, "en-US"), type text}})
        
    in
        Prefixed




    Sunday, November 20, 2016 8:14 AM

All replies

  • You need a readable criteria for your group borders somewhere. Formatting signs cannot be read by Power Query.

    Imke Feldmann TheBIccountant.com

    Friday, November 4, 2016 8:32 PM
    Moderator
  • Oh, I'm sorry, I've used formatting only for your benefit, so you could quickly understand what I need.

    The table from the left side is actually a table that I get at some step in Power Query, and what I want is to insert the extra steps required to create those groupings.

    Friday, November 4, 2016 8:37 PM
  • OK, but how does one know that "X" and "Y" belong to Group2?


    Imke Feldmann TheBIccountant.com

    Friday, November 4, 2016 8:45 PM
    Moderator
  • Please let me know if the below makes sense:

    A B -> Group 1 (we always start at Group 1)

    B C -> B is also found in Group 1, therefore this row is also Group 1

    B D -> B is also found in Group 1, therefore this row is also Group 1

    D E -> D is also found in Group 1(previous row), therefore this row is also Group 1 

    X Y -> non found in Group 1, therefore this row is Group 2

    Z Q -> non found in Group 1 or Group 2, therefore this is Group 3

    Q W -> Q is also found in Group 3, therefore this row is also Group 3


    Friday, November 4, 2016 8:49 PM
  • Makes sense. So it's always just these 2 columns?

    Imke Feldmann TheBIccountant.com

    Friday, November 4, 2016 8:51 PM
    Moderator
  • For the sake of this exercise, let's say a convincing "yes, just these 2 columns".

    Technically I have a tool (based on VBA and PowerQuery), where I provide a bunch of different metrics and I calculate the correlations between each 2 of them (kind of like http://www.tylervigen.com/spurious-correlations)

    Therefore, there is one other column with the correlation coefficient, but it's not important for what I need here, that's why I haven't mentioned it.

    Friday, November 4, 2016 9:01 PM
  • Hi James :-)

    This is first shot only ( based on your example)

    let
        Source = Excel.CurrentWorkbook(){[Name="Tabela1"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"Col1", type text}, {"Col2", type text}}),
        #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1),
        ListCol1 = #"Added Index"[Col1],
        #"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each if List.IsEmpty(List.Intersect({ListCol1,{[Col2]}})) then [Index] else null),
        #"Filled Up" = Table.FillUp(#"Added Custom",{"Custom"}),
        #"Grouped Rows" = Table.Group(#"Filled Up", {"Custom"}, {{"tbl", each List.Distinct(_[Col1] & _[Col2]), type list}}),
        #"Added Index1" = Table.AddIndexColumn(#"Grouped Rows", "Group", 1, 1),
        #"Added Prefix" = Table.TransformColumns(#"Added Index1", {{"Group", each "Group " & Text.From(_, "en-US"), type text}}),
        #"Removed Columns" = Table.RemoveColumns(#"Added Prefix",{"Custom"}),
        #"Expanded {0}" = Table.ExpandListColumn(#"Removed Columns", "tbl"),
        #"Renamed Columns" = Table.RenameColumns(#"Expanded {0}",{{"tbl", "Items"}})
    in
        #"Renamed Columns"

    Regards

    Friday, November 4, 2016 9:30 PM
  • Hi Bill :)

    Thank you for the code - it works when using the data I provided, but if I make a change in the last row from

    Q W to X W, then it doesn't work correctly, as there should be a Group 2 containing: X, Y, Q.

    Friday, November 4, 2016 9:50 PM
  • Hi James,

    Please, show an example of possible cases, and the results you expect.

    I do not want to guess.;-)

    Regards

    Friday, November 4, 2016 11:51 PM
  • Of course, so the picture below shows the logic of the groupings, and I've highlighted the element that is common to the group.

    Now, for the final output I'd like it to be like in my original message:

    A GROUP 1

    B GROUP 1

    C GROUP 1

    X GROUP 2

    Y GROUP 2


    Saturday, November 5, 2016 12:03 AM
  • James,

    Based on your original table, is it the case that the values in COL2 are always unique?

    Saturday, November 5, 2016 3:18 PM
  • Hi Colin,

    No, it's not the case.

    I've attached below examples that never happen.


    Saturday, November 5, 2016 3:51 PM
  • My proposal would be a solution where rows are selected - and added to the final list - level by level.

    Ideally this would require a recursive function, but for now that's a bit beyond my capabilities.

    Anyhow, I created the following code that works fine for the 3-level-original-example-data.

    let
        Source = Excel.CurrentWorkbook(){[Name="Input"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"COL 1", type text}, {"COL 2", type text}}),
        #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1),
        #"Added Custom" = Table.AddColumn(#"Added Index", "LowerLevel?", each List.Contains(#"Added Index"[COL 2],[COL 1])),
        #"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([#"LowerLevel?"] = false)),
        #"Added Index1" = Table.AddIndexColumn(#"Filtered Rows", "Index.1", 1, 1),
        #"Added Prefix" = Table.TransformColumns(#"Added Index1", {{"Index.1", each "Group " & Text.From(_, "nl-NL"), type text}}),
        Level1Rows = Table.RenameColumns(#"Added Prefix",{{"Index.1", "Group"}}),
    // Level 1 done
        RemainingRows1 = Table.SelectRows(#"Added Custom", each ([#"LowerLevel?"] = true)),
        #"Added Custom1" = Table.AddColumn(RemainingRows1, "Group", each Level1Rows[Group]{List.PositionOf(Level1Rows[COL 2],[COL 1])}),
        #"Replaced Errors1" = Table.ReplaceErrorValues(#"Added Custom1", {{"Group", null}}),
        Level2Rows = Table.SelectRows(#"Replaced Errors1", each ([Group] <> null)),
        Level1and2Rows = Table.Combine({Level1Rows,Level2Rows}),
    // Level 1 and 2 done
        RemainingRows2 = Table.SelectRows(#"Replaced Errors1", each ([Group] = null)),
        #"Removed Columns" = Table.RemoveColumns(RemainingRows2,{"Group"}),
        #"Added Custom2" = Table.AddColumn(#"Removed Columns", "Group", each Level1and2Rows[Group]{List.PositionOf(Level1and2Rows[COL 2],[COL 1])}),
        #"Replaced Errors2" = Table.ReplaceErrorValues(#"Added Custom2", {{"Group", null}}),
        Level3Rows = Table.SelectRows(#"Replaced Errors2", each ([Group] <> null)),
        #"Level1-3Rows" = Table.Combine({Level1and2Rows,Level3Rows}),
    // Level 1-3 done, finishing touches:
        #"Sorted Rows" = Table.Sort(#"Level1-3Rows",{{"Index", Order.Ascending}}),
        #"Removed Columns1" = Table.RemoveColumns(#"Sorted Rows",{"Index", "LowerLevel?"}),
        #"Changed Type1" = Table.TransformColumnTypes(#"Removed Columns1",{{"Group", type text}})
    in
        #"Changed Type1"
    

    • Edited by MarcelBeug Friday, November 11, 2016 2:45 PM
    Friday, November 11, 2016 2:44 PM
  • Learning fast, so now I can present my solution including recursive function fnNextLevel:

    let
     fnNextLevel = (TableSoFar as table, Remaining as table) as table =>
    let
     RemoveGroup = Table.RemoveColumns(Remaining,{"Group"}),
     AddGroup = Table.AddColumn(RemoveGroup, "Group", each TableSoFar[Group]{List.PositionOf(TableSoFar[COL 2],[COL 1])}),
     ReplaceErrors = Table.ReplaceErrorValues(AddGroup, {{"Group", null}}),
     ThisLevelRows = Table.SelectRows(ReplaceErrors, each ([Group] <> null)),
     NewTable = Table.Combine({TableSoFar,ThisLevelRows}),
     RemainingRows = Table.SelectRows(ReplaceErrors, each ([Group] = null)),
     Result = if Table.RowCount(RemainingRows) > 0 then @fnNextLevel(NewTable, RemainingRows) else NewTable
    in
     Result
    in
     fnNextLevel


    And the query is now:

    let
        Source = Excel.CurrentWorkbook(){[Name="Input"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"COL 1", type text}, {"COL 2", type text}}),
        #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1),
        #"Added Custom" = Table.AddColumn(#"Added Index", "LowerLevel?", each List.Contains(#"Added Index"[COL 2],[COL 1])),
        #"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([#"LowerLevel?"] = false)),
        #"Added Index1" = Table.AddIndexColumn(#"Filtered Rows", "Index.1", 1, 1),
        #"Added Prefix" = Table.TransformColumns(#"Added Index1", {{"Index.1", each "Group " & Text.From(_, "nl-NL"), type text}}),
        Level1Rows = Table.RenameColumns(#"Added Prefix",{{"Index.1", "Group"}}),
    // Level 1 done
        RemainingRows1 = Table.SelectRows(#"Added Custom", each ([#"LowerLevel?"] = true)),
        #"Added Custom1" = Table.AddColumn(RemainingRows1, "Group", each Level1Rows[Group]{List.PositionOf(Level1Rows[COL 2],[COL 1])}),
        #"Replaced Errors1" = Table.ReplaceErrorValues(#"Added Custom1", {{"Group", null}}),
        Level2Rows = Table.SelectRows(#"Replaced Errors1", each ([Group] <> null)),
        Level1and2Rows = Table.Combine({Level1Rows,Level2Rows}),
    // Level 1 and 2 done
        RemainingRows2 = Table.SelectRows(#"Replaced Errors1", each ([Group] = null)),
        #"All Levels" = if Table.RowCount(RemainingRows2) > 0 then fnNextLevel(Level1and2Rows, RemainingRows2) else Level1and2Rows,
    // Finishing touches:
        #"Sorted Rows" = Table.Sort(#"All Levels",{{"Index", Order.Ascending}}),
        #"Removed Columns1" = Table.RemoveColumns(#"Sorted Rows",{"LowerLevel?"}),
        #"Reordered Columns" = Table.ReorderColumns(#"Removed Columns1",{"COL 1", "COL 2", "Group", "Index"}),
        #"Changed Type1" = Table.TransformColumnTypes(#"Reordered Columns",{{"Group", type text}})
    in
        #"Changed Type1"
    
    
    
    

    Remark: I left the Index column in the result, otherwise the original sort order got lost.

    • Edited by MarcelBeug Friday, November 11, 2016 3:47 PM
    Friday, November 11, 2016 3:41 PM
  • James, do any of these solutions provide what you're looking for?

    Ehren

    Tuesday, November 15, 2016 9:05 PM
    Owner
  • Unfortunately, they are not working as they should :(

    Thank you Marcel for the help, your code does work well for the sample I provided, but it doesn't work well with all the variants I presented.

    Friday, November 18, 2016 6:02 PM
  • Looks to me like you need 2 loops, in pseudo code:

    Outer loop:
       the first value pair without group ID will get the next group ID which be the current group ID
       those values are the initial current group values
       Inner loop:
          all pairs with one or two of the current group values will get the current group ID
          any additional values from those pairs are added to the current group values
       Until no more additional value pairs exist with current group values
    Until all value pairs have a group ID

    As far as I know, looping in Power Query can only be done with either List.Generate or recursive functions.

    I can take a further look first tomorrow (Sunday).


    • Edited by MarcelBeug Saturday, November 19, 2016 4:35 AM
    Saturday, November 19, 2016 4:34 AM
  • Below the code with double recursion.

    let
        Source = Excel.CurrentWorkbook(){[Name="Input"]}[Content],
        Typed = Table.TransformColumnTypes(Source,{{"COL1", type text}, {"COL2", type text}}),
        Group = Table.AddColumn(Typed, "Group", each null),
        
        fnCreateGroup = (GroupID as number, TableSoFar as table) as any =>
        let
            FirstOfNextGroup = Table.Skip(TableSoFar, each [Group] <> null),
            InitialGroupMembers = List.Distinct({FirstOfNextGroup{0}[COL1],FirstOfNextGroup{0}[COL2]}),
            AllGroupMembers = fnAddRowsToGroup(GroupID, InitialGroupMembers, TableSoFar),
            TableWithGroup = if List.Contains(AllGroupMembers[Group],null) then @fnCreateGroup(GroupID + 1, AllGroupMembers) else AllGroupMembers
        in
            TableWithGroup,
       fnAddRowsToGroup = (Id as number, Members as list, TableSoFar) as table =>
       let
            AdditionalMembers = Table.AddColumn(TableSoFar, "Group1", each if [Group] <> null then [Group] else if List.ContainsAny(Members,{[COL1],[COL2]}) then Id else null),
            Filtered = Table.Buffer(Table.SelectRows(AdditionalMembers, each ([Group1] = Id))),
            NewMembers = List.Distinct(List.Combine({Members,Filtered[COL1],Filtered[COL2]})),
            Removed = Table.RemoveColumns(AdditionalMembers,{"Group"}),
            NewTable = Table.RenameColumns(Removed,{{"Group1", "Group"}}),
            TableWithGroupMembers = if List.Count(NewMembers) > List.Count(Members) then @fnAddRowsToGroup(Id, NewMembers, NewTable) else NewTable
            
        in
            TableWithGroupMembers,
        AllGroups = fnCreateGroup(1, Group),
        Prefixed = Table.TransformColumns(AllGroups, {{"Group", each "Group " & Text.From(_, "en-US"), type text}})
        
    in
        Prefixed




    Sunday, November 20, 2016 8:14 AM
  • Hi Marcel,

    It works great, and it's brilliantly written, thank you so much.

    I was certainly unaware that you can even do these kind of loops through functions and have an inner and on outer loop.

    Really big thanks! I'm going to dissect the code because there are definitely parts which I can incorporate in further work! 

    Monday, November 21, 2016 4:49 PM