locked
How is the sort order in List.Union determined? RRS feed

  • Question

  • I'm trying to understand how the duplicate-removal-process in List.Union works and what that means for the resulting sort-order. This is my "mental model" and it matches all observed behaviours so far, except the sort-order:

    In red are the 2 items who come out differently than I expected. So I'd like to know what's the logic behind the sort-order in List.Union.

    Sample: List.Union({ {1, 1, 2}, {1,2,3,2}, {4,3,3,2} })

    Thx!


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!

    Monday, August 7, 2017 10:22 AM

Answers

  • Hi Imke,

    The results you see are consistent with a multiset union operation. Looking at the three lists, the following are the quantities of each number that must be in the result list:

    The number 1 - 2
    The number 2 - 2
    The number 3 - 2
    The number 4 - 1

    The next step is to place the numbers in the result list based on the order of the items in the lists, starting from the left list:

    The left starts with 2 x 1, so the result list begins with {1,1..}
    The next item in the list is a 2, so that is placed next {1,1,2..
    The first item in the second list is a 1, but we already have the maximum number of ones in the result list, so it is ignored.
    The next item in the second list is 2. Since we have only one 2 in the result list so far, another 2 is added {1,1,2,2...}

    Following the preceding logic to its natural conclusion, and you end up with {1,1,2,2,3,4,3}. If you switched the order of the first list to {1,2,1} and follow the logic, the result would be {1,2,1,2,3,4,3}. 

    Monday, August 7, 2017 8:14 PM

All replies

  • Dear Imke,

    Of course I don't know.

    But, after some analysis, my conclusion is that this should be considered as a 2-step approach:

    1. Determine how many items of each value are required.
    2. Start picking values from each list until the required number of items of each value is reached.

    I created a function to analyse what is happening: basically the function adds metadata to all items, consisting of a record with Comment with the list and position number.

    With that function, I noticed that from: {{1,1,2},  (1,2,3,2}, {2,2,2} } all 2's are taken from the first 2 lists, while - if the 3rd list wouldn't be there, only one 2 would have been taken from the second list.

    = fnAnalyseUnion({{1,1,2}, {1,2,3,2}, {2,2,2}})

    Function fnAnalyseUnion:

    (Lists as list) as table =>
    let
    //    Lists =   {{1, 1, 2}, {1,2,3,2}, {4,3,3,2} },
        Custom1 = Table.FromColumns({Lists}),
        #"Added Index" = Table.AddIndexColumn(Custom1, "List", 0, 1),
        #"Expanded Column1" = Table.Buffer(Table.ExpandListColumn(#"Added Index", "Column1")),
        #"Grouped Rows" = Table.Group(#"Expanded Column1", {"List"}, {{"AllData", each Table.AddIndexColumn(_,"Position",0,1), type table}}),
        #"Expanded AllData" = Table.ExpandTableColumn(#"Grouped Rows", "AllData", {"Column1", "Position"}, {"Column1", "Position"}),
        #"Changed Type" = Table.TransformColumnTypes(#"Expanded AllData",{{"Position", type text}, {"List", type text}}),
        #"Added Custom" = Table.Buffer(Table.AddColumn(#"Changed Type", "ItemsWithMetadata", each [Column1] meta [Comment = "L"&[List]&"P"&[Position]])),
        #"Grouped Rows1" = Table.Group(#"Added Custom", {"List"}, {{"AllData", each _[ItemsWithMetadata], type table}}),
        ListsWithMetadata = #"Grouped Rows1"[AllData],
        ListUnion = List.Union(ListsWithMetadata),
        Custom2 = Table.FromColumns({ListUnion}, {"Value"}),
        #"Added Custom1" = Table.AddColumn(Custom2, "ListAndPosition", each Value.Metadata([Value])[Comment])
    in
        #"Added Custom1"



    • Edited by MarcelBeug Monday, August 7, 2017 2:42 PM
    Monday, August 7, 2017 2:37 PM
  • Kudos Marcel, what a clever approach!

    Would be great though to receive a confirmation from the product team about this logic.

    Thx.


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!

    Monday, August 7, 2017 3:17 PM
  • Hi Imke,

    The results you see are consistent with a multiset union operation. Looking at the three lists, the following are the quantities of each number that must be in the result list:

    The number 1 - 2
    The number 2 - 2
    The number 3 - 2
    The number 4 - 1

    The next step is to place the numbers in the result list based on the order of the items in the lists, starting from the left list:

    The left starts with 2 x 1, so the result list begins with {1,1..}
    The next item in the list is a 2, so that is placed next {1,1,2..
    The first item in the second list is a 1, but we already have the maximum number of ones in the result list, so it is ignored.
    The next item in the second list is 2. Since we have only one 2 in the result list so far, another 2 is added {1,1,2,2...}

    Following the preceding logic to its natural conclusion, and you end up with {1,1,2,2,3,4,3}. If you switched the order of the first list to {1,2,1} and follow the logic, the result would be {1,2,1,2,3,4,3}. 

    Monday, August 7, 2017 8:14 PM
  • Thank you Colin,

    this looks perfect and confirms Marcels observation.


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!

    Monday, August 7, 2017 8:28 PM