none
Distinct count in custom column RRS feed

  • Question

  • Hi Everybody

    I am struggling how to write a syntax in Power Query to compute distinct count values in new custom column.

    This is how I have started to do it...


    I need to have distinct count of "rep_acct_nr" by "yr_camp"

    rep_acct_nr - unique customer's number

    yr_camp - campaign period

    In the other way I would like to know how many unique customers were within a campaign period.

    Anyone can help?

    Monday, June 27, 2016 9:51 AM

Answers

  • Here's an approach that you could try that uses a narrower table for the distinct customer count, which might pep things up a bit. 

    • create a second Power Query table that refers to the original
    • add a custom column that concatenates customer ID and period [rep_acct_nr] & "|" & [yr_camp]
    • remove duplicates from the custom column
    • remove all columns except customer ID and period
    • load it to Power Pivot
    • create the measure unique_customer_count:=DistinctCount([rep_acct_nr])
    • pivot table unique_customer_count by period

    The downside being that you lose the ability to slice and dice by different time frames (e.g. quarters instead of months). 

    Thursday, June 30, 2016 2:29 PM

All replies

  • Hi Bartek,

    You can right click on the column name, and click "Remove Duplicates"

    And then go to the Transform tab, click Count Rows.

    I hope this helps

    Oguz

    Monday, June 27, 2016 5:20 PM
    Moderator
  • Here's another option, load the table to Power Pivot and create a measure for the distinct count.  

    unique_customer_count:=DistinctCount([rep_acct_nr])
    Next, create a pivot table with unique_customer_count in the rows and yr_camp in the columns.  

    Monday, June 27, 2016 5:41 PM
  • My data set contains transactions data. It means each transaction, each product sold, each customer is reported here. It is huge (170 mln rows) and makes even power pivot working slowly.

    My idea is to aggregate some data. I do not need customer [rep_acct_nr] here. I want to look at product level only ( [cncpt_nr] and [billing_line_nr4] ) and time periods [yr_camp]. Before I remove customers I need to know the number of customers who made transaction in a given time. I will have to also aggregate sales and units sold.

    @Oguz

    Your idea is somehow right but I cannot remove duplicates before aggregation, because I will lost rows with data

    @ Jimmie

    This is an approach I use currently, but due to performance I have to reduce a weight of data loads to model and after this I won’t be able to compute distinct no of customers.

    Is this impossible with Power Query?


    Wednesday, June 29, 2016 4:54 PM
  • Here's an approach that you could try that uses a narrower table for the distinct customer count, which might pep things up a bit. 

    • create a second Power Query table that refers to the original
    • add a custom column that concatenates customer ID and period [rep_acct_nr] & "|" & [yr_camp]
    • remove duplicates from the custom column
    • remove all columns except customer ID and period
    • load it to Power Pivot
    • create the measure unique_customer_count:=DistinctCount([rep_acct_nr])
    • pivot table unique_customer_count by period

    The downside being that you lose the ability to slice and dice by different time frames (e.g. quarters instead of months). 

    Thursday, June 30, 2016 2:29 PM