none
DAX Formula to mark duplicates with a sequential number (1,2,3, etc.)

    Question

  • Hello, I hope you can help :-)

    I have a list of opportunity ID's like this:

    OPPORTUNITY_ID
    1
    2
    3
    3
    3
    4
    4
    5

    and I want to add a calculated column in PowerPivot where a DAX formula is producing a sequential count of duplicates, like this:

    OPPORTUNITY_ID  COUNT
    1                          1
    2                          1
    3                          1
    3                          2
    3                          3
    4                          1
    4                          2
    5                          1

    So any unique OpportunityIDs will be marked "1", and the first duplicates will be marked "1", the second duplicates marked "2" and so on.

    I have an expression

    =calculate(COUNTROWS(Query),ALLEXCEPT(Query,Query[OpportunityID]))

    which returns a count of duplicates (so in the example above, OpportunityID 4 would return a count of "2" and "2" in both lines - but this isn't the sequential count I am trying to achieve.

    I'm close, but I can't figure out the DAX syntax, can anyone help me?

    Many thanks in advance!

    Thursday, October 31, 2013 4:05 PM

Answers

  • Hi mauledbylion,

    Here is a solution although it may require some further iterations based on your feedback.

    I have taken your example and created and loaded the data into Power Pivot. You will notice that I have added a column called 'RID' i.e. Row IDfentifier. This column is at the core of this solution and is needed to be able to determine the sequence number of any duplicates.

    RID OPPORTUNITY_ID
    1 1
    2 2
    3 3
    4 3
    5 4
    6 4
    7 5

    I have created a calculated column called 'DUPLICATE_NUMBER' on the tale in Power Pivot:

    =CALCULATE(COUNTROWS(Query), ALLEXCEPT(Query, Query[OPPORTUNITY_ID]), EARLIER(Query[RID]) >= Query[RID])

    The above calculation is an extension of what you have already defined. However, it uses the RID column in order to determine what occurance the OPPORTUNITY_ID is in the table. It uses the EARLIER function to check how many rows of the same OPPORTUNITY_ID have previously occured. It then gives the current row the corresponding sequence or duplicate number.

    I don't necessarily expect you to have a single column that uniquely identifies a row and this is where I need more detail on the table structure. Ultimately, we will need to combine the columns that uniquely identify a row in order to determine which duplicate row gets a '1' and how the sequencing continues for the remaining duplicates.

    Here is a screenshot of what the above calculated column returns based on the DAX formula above:

    Let me know your thoughts on this approach.

    Michael




    Monday, November 18, 2013 9:11 AM
    Answerer

All replies

  • Does anyone have ideas for Mauled?

    Thanks!


    Ed Price, Power BI & SQL Server Customer Program Manager (Blog, Small Basic, Wiki Ninjas, Wiki)

    Answer an interesting question? Create a wiki article about it!

    Monday, November 18, 2013 3:36 AM
    Owner
  • Hi mauledbylion,

    Here is a solution although it may require some further iterations based on your feedback.

    I have taken your example and created and loaded the data into Power Pivot. You will notice that I have added a column called 'RID' i.e. Row IDfentifier. This column is at the core of this solution and is needed to be able to determine the sequence number of any duplicates.

    RID OPPORTUNITY_ID
    1 1
    2 2
    3 3
    4 3
    5 4
    6 4
    7 5

    I have created a calculated column called 'DUPLICATE_NUMBER' on the tale in Power Pivot:

    =CALCULATE(COUNTROWS(Query), ALLEXCEPT(Query, Query[OPPORTUNITY_ID]), EARLIER(Query[RID]) >= Query[RID])

    The above calculation is an extension of what you have already defined. However, it uses the RID column in order to determine what occurance the OPPORTUNITY_ID is in the table. It uses the EARLIER function to check how many rows of the same OPPORTUNITY_ID have previously occured. It then gives the current row the corresponding sequence or duplicate number.

    I don't necessarily expect you to have a single column that uniquely identifies a row and this is where I need more detail on the table structure. Ultimately, we will need to combine the columns that uniquely identify a row in order to determine which duplicate row gets a '1' and how the sequencing continues for the remaining duplicates.

    Here is a screenshot of what the above calculated column returns based on the DAX formula above:

    Let me know your thoughts on this approach.

    Michael




    Monday, November 18, 2013 9:11 AM
    Answerer