none
Table.Distinct removes values completely RRS feed

  • Question

  • I created a table from queries, then deleted all columns except for the one I need. Then I used the GUI to remove duplicate entries in this single column. With the Table.Distinct function, 88 distinct values remain.

    However, if I use 'remove duplicates' in Excel or convert the column to a list and then use the List.Distinct function, 137 distinct values remain! Using Group By (and then removing the extra added column) also gives the desired result.

    Why does Table.Distinct remove ALL entries of certain values?
    If this is not a bug, can anyone tell me what the intention of the default behavior of this function is and whether I can change this default behavior in order to obtain the desired result?

    Tuesday, August 27, 2019 11:39 AM

All replies

  • Hi,

    Provide data sample, please.

    Wednesday, August 28, 2019 3:00 PM
  • Hello Aleksei!

    Sorry, I can't post my file due to sensitivity of the data. And when I try to isolate the issue by coping this single column to a separate file and load it from there in Power Query, the Table.Distinct function works as expected.

    I do have a workaround, but at this point I doubt wether the Table.Distinct function can be trusted (or at least there seems to be an unidentified factor that influences the result).

    Is there any known difference in the working of Table.Distinct and List.Distinct that might cause this behavior?

    Thursday, August 29, 2019 12:57 PM
  • The list of values returned by following formulas should be the same:

    = Table.Distinct(SingleColumnTable)
    = Table.Distinct(Table, {"Column"})
    = List.Distinct(Table[Column])

    Check, that you use the same source for both your formulas (i.e. the same table, without any additional transformations).

    Thursday, August 29, 2019 2:18 PM
  • The list of values returned by following formulas should be the same:

    = Table.Distinct(SingleColumnTable)
    = Table.Distinct(Table, {"Column"})
    = List.Distinct(Table[Column])

    Check, that you use the same source for both your formulas (i.e. the same table, without any additional transformations).

    Hello Aleksei, thank you for your support.

    I used all three variants after creating my single column table of 794 rows as:

    =Table.Distinct(#"Table Name")

    =Table.Distinct(#"Table Name", {"ColumnName"})

    =List.Distinct(#"Table Name"[ColumnName])

    They do not give the same results: Both the Table.Distinct formulas result in 88 rows, the List.Distinct formula gives 138 rows.

    Friday, August 30, 2019 6:33 AM
  • Is your source a relational database, and are the additionally-removed rows identical except for their casing with some rows that weren't removed?
    Friday, August 30, 2019 1:26 PM
  • The source is a CSV dump with SAP data.

    All letters are capital letters, so casing is identical. To simplify, I have a list like this:

    APPLE
    PEAR
    BANANA
    PEAR
    BANANA
    APPLE
    APPLE
    APPLE
    BANANA
    APPLE

    And I end up with:
    PEAR
    BANANA

    But where did APPLE go?

    It appears to me completely random (note that some 50 of 138 distinct values are completely removed in this way!). Moreover, why does List.Distinct give the correct result whereas Table.Distinct does not?!

    Tuesday, September 3, 2019 7:56 AM
  • What is the complete text of the incorrect query?
    Tuesday, September 3, 2019 1:37 PM
  • It's nothing more than the following, generated by the GUI:

    = Table.Distinct(#"Rijen gesorteerd")

    Whereas the following adjustment (correctly) gives more results:

    = List.Distinct(#"Rijen gesorteerd"[Column1])
    Wednesday, September 4, 2019 7:16 AM
  • That's not the complete text because it depends on the definition of "Rijen gesorteerd" which isn't included.
    Wednesday, September 4, 2019 3:56 PM
  • Hi YochananL. Are you able to provide the full text of your M queries?

    Thanks,
    Ehren

    Tuesday, September 24, 2019 9:14 PM
    Owner