none
Viewing duplicate records RRS feed

  • Question

  • hello,

    i am wondering if there is a way to view duplicate records before they are deleted? i'd like to make sure that i am deleting rows that need to be deleted.

    thank you

    tracy

    Saturday, January 30, 2016 9:35 PM

Answers

  • If - for some reason - sorting is not an option, you can add this custom column instead:

    let DupsCol=[DupsCol] in Table.RowCount(Table.SelectRows(Source, each ([DupsCol] = DupsCol)))

    it will show count the number of occurrences of the same expression in column "DupsCol". So every value above 1 will mean that there are dups who are doing to be deleted. You also won't need an index column for this solution.


    Imke Feldmann TheBIccountant.com

    Sunday, January 31, 2016 5:32 PM
    Moderator

All replies

  • 1) Add an index column

    2) then add a custom column with this formula:

    if #"Added Index"[Keyword]{[Index]} = #"Added Index"[Keyword]{[Index]-1} then "Duplicate" else ""

    This will produce an additional column showing the dups that will be be removed. Load this query to the worksheet and create another query that references this query and does the remove dups step.

    This way you will always be able to check which dups have been removed.


    Imke Feldmann TheBIccountant.com


    Sunday, January 31, 2016 7:07 AM
    Moderator
  • Imke, if I'm not mistaken, that solution will only work on an input sorted by the field that we expect duplicates in.

    Check out the Power BI User Group of Philadelphia.
    Our next meeting is February 2 in Malvern, PA.

    Sunday, January 31, 2016 5:15 PM
  • Gregg, you're absolutely right.

    Thanks for the clarification!


    Imke Feldmann TheBIccountant.com

    Sunday, January 31, 2016 5:16 PM
    Moderator
  • If - for some reason - sorting is not an option, you can add this custom column instead:

    let DupsCol=[DupsCol] in Table.RowCount(Table.SelectRows(Source, each ([DupsCol] = DupsCol)))

    it will show count the number of occurrences of the same expression in column "DupsCol". So every value above 1 will mean that there are dups who are doing to be deleted. You also won't need an index column for this solution.


    Imke Feldmann TheBIccountant.com

    Sunday, January 31, 2016 5:32 PM
    Moderator
  • hi imke,

    i will try your solution, as soon as possible. i don't know the language, but i'm game.
    i'll post again to let you know how i made out.

    thank you
    tracy

    Friday, February 5, 2016 5:50 PM
  • hi imke,

    i finally was able to get the code to work.

    the only way that i could get it to work was to use the data as a table on an excel worksheet - and then use the from table option on the power query tab

    i tried loading it to the data model and saving it as a connection, but got the error.

    Expression.Error: The field 'Address' of the record wasn't found.
    Details:
        Name=Products
        Data=Table
        Item=Products
        Kind=Sheet
        Hidden=FALSE

    should this code be able to work using data in the data model in excel 2013?

    thank you
    tracy


    • Edited by LadyTester Wednesday, February 10, 2016 11:48 PM added more details
    Wednesday, February 10, 2016 11:46 PM
  • hi tracy,

    if I understood your request right, you wanted to check the duplicate rows before they will be removed. Therefore you have to see them somewhere. How would you do this with Connection only or in the datamodel?

    So output to Excel seemed the only reasonable option in my eyes. Or do you want to check this in the editor only? But then you might be limited by the preview cut off.

    However, technically the query should work no matter which output you choose. Your error message makes very little sense to me - simply sounds you're referencing a column that isn't there.


    Imke Feldmann TheBIccountant.com

    Thursday, February 11, 2016 8:46 AM
    Moderator
  • hi imke,

    i "think" the reason i do the error was because the code was still in memory. when i closed excel and reopened the query, it ran without any errors.

    i didn't test it again with the data being in the data model. the table had less than 5,000 records, so it fit on a worksheet.

    with all of the problems that i have, being able to see the duplicates at all is a major win for me!

    thank you
    tracy

    Monday, February 22, 2016 7:18 PM