none
Pushing more calcs to Power Query - replacing DAX RANKX RRS feed

  • Question

  • I have another Power Query calculated column challenge.

    Customer introduced a new requirement. Staff can work multiple shifts in a single business date. They would like a flag if the timesheet entries overlap. Many of their establishments close after midnight, which makes calculations a bit trickier, but there is a field for the "business date" of the shift which helps.

    You can imagine source data like this:

    Full Name	Date	Start Date	End Date
    Joe Smith	8/1/2014	8/1/2014 10:00	8/1/2014 18:00
    Joe Smith	8/1/2014	8/1/2014 22:00	8/2/2014 1:00
    Jane Doe	8/1/2014	8/1/2014 15:00	8/1/2014 18:00
    Jane Doe	8/1/2014	8/1/2014 17:00	8/1/2014 17:45
    John Brown	8/1/2014	8/1/2014 19:00	8/2/2014 1:00
    John Brown	8/1/2014	8/1/2014 14:00	8/1/2014 16:00
    John Brown	8/2/2014	8/2/2014 10:00	8/2/2014 18:00
    Mary Black	8/1/2014	8/1/2014 18:00	8/2/2014 1:00
    Mary Black	8/1/2014	8/1/2014 15:00	8/1/2014 21:00
    Bob Walker	8/1/2014	8/1/2014 11:00	8/1/2014 14:00
    Bob Walker	8/1/2014	8/1/2014 17:00	8/1/2014 22:00
    Bob Walker	8/1/2014	8/1/2014 19:00	8/2/2014 1:00


    In this case, Jane Doe, Mary Black, and Bob Walker have overlapping shifts. Joe Smith and John Brown do not.

    I did the calculation in calculated columns in Power Pivot by ranking the start time on the date, ranking the end time on the date, then brute force checking to make sure the start and end dates are in the right order using some intermediate calc columns. To do the ranking, I used the DAX RANKX function.

    Now if I want to push this sort of logic back to Power Query. How do I do this? I do not see a ranking function in the formula reference as a starting point for my calculation. 

    Thanks


    • Edited by Cathy Dumas - Friday, August 29, 2014 8:22 PM formatting
    Friday, August 29, 2014 8:20 PM

Answers

  • Cathy, the following are a couple of rank functions I created as part of a custom stat function "library." Hopefully, they are self-explanatory.

    //RankEqual Function
    (inputValue as any, inputSeries as list, optional orderDescending as nullable logical) as number =>
    let
        SortedSeries = if orderDescending or orderDescending = null then
                                    List.Sort(inputSeries,Order.Descending)
                               else
                                    List.Sort(inputSeries),
        RankEqual = List.PositionOf(SortedSeries,inputValue)+1
    in
        RankEqual
     
    //RankDense Function
    (inputValue as any, inputSeries as list, optional orderDescending as nullable logical) as number =>
    let
        SortedSeries = if orderDescending or orderDescending = null then
                                     List.Sort(inputSeries,Order.Descending)
                               else
                                     List.Sort(inputSeries),
        DistinctSeries = List.Distinct(SortedSeries),
        RankDense = List.PositionOf(DistinctSeries,inputValue)+1
    in
        RankDense

    • Marked as answer by Cathy Dumas - Saturday, August 30, 2014 12:05 AM
    Friday, August 29, 2014 9:37 PM
  • Here's one way to do it in M, which doesn't use Rank functions. Instead it creates a nested join column (matching on the [Full Name]) and then compares the dates directly. (I also added an ID column to make it easy to exclude records matching themselves.)

    First, do the self-join (this can be done via the UI using the Merge ribbon button in the Editor, although you'll have to rename the new column it creates):

    = Table.NestedJoin(Source,{"Full Name"},Source,{"Full Name"},"Matches")

    Then add a custom column with the following formula:

    = Table.RowCount(
        Table.SelectRows(
            [Matches],
            (match) => 
                [ID] <> match[ID] 
                and (
                    ([Start Date] >= match[Start Date] and [Start Date] <= match[End Date])
                    or (match[Start Date] >= [Start Date] and match[Start Date] <= [End Date])
                )
        )
    ) > 0

    Ehren


    Friday, August 29, 2014 10:16 PM
    Owner

All replies

  • Cathy, the following are a couple of rank functions I created as part of a custom stat function "library." Hopefully, they are self-explanatory.

    //RankEqual Function
    (inputValue as any, inputSeries as list, optional orderDescending as nullable logical) as number =>
    let
        SortedSeries = if orderDescending or orderDescending = null then
                                    List.Sort(inputSeries,Order.Descending)
                               else
                                    List.Sort(inputSeries),
        RankEqual = List.PositionOf(SortedSeries,inputValue)+1
    in
        RankEqual
     
    //RankDense Function
    (inputValue as any, inputSeries as list, optional orderDescending as nullable logical) as number =>
    let
        SortedSeries = if orderDescending or orderDescending = null then
                                     List.Sort(inputSeries,Order.Descending)
                               else
                                     List.Sort(inputSeries),
        DistinctSeries = List.Distinct(SortedSeries),
        RankDense = List.PositionOf(DistinctSeries,inputValue)+1
    in
        RankDense

    • Marked as answer by Cathy Dumas - Saturday, August 30, 2014 12:05 AM
    Friday, August 29, 2014 9:37 PM
  • Here's one way to do it in M, which doesn't use Rank functions. Instead it creates a nested join column (matching on the [Full Name]) and then compares the dates directly. (I also added an ID column to make it easy to exclude records matching themselves.)

    First, do the self-join (this can be done via the UI using the Merge ribbon button in the Editor, although you'll have to rename the new column it creates):

    = Table.NestedJoin(Source,{"Full Name"},Source,{"Full Name"},"Matches")

    Then add a custom column with the following formula:

    = Table.RowCount(
        Table.SelectRows(
            [Matches],
            (match) => 
                [ID] <> match[ID] 
                and (
                    ([Start Date] >= match[Start Date] and [Start Date] <= match[End Date])
                    or (match[Start Date] >= [Start Date] and match[Start Date] <= [End Date])
                )
        )
    ) > 0

    Ehren


    Friday, August 29, 2014 10:16 PM
    Owner
  • Thanks Ehren, this works with a couple of small changes. First, I did the self join on Full Name and Date, so I am only comparing the Date of interest. Then, instead of comparing [ID]<>match[ID], I used [Start Date] as I do not have an ID column. This worked correctly (although will miss the case where there are two different entries logged with exactly the same starting date by accident).

    What is very useful about your answer is the idea of the self-join on a filtered table to assist with calculations in columns. I am hung up in DAX on the row context when doing calculated columns, so I am doing all kinds of stuff with ALL() or ALLEXCEPT() or FILTER() or whatever to override the row context and do a calculation. But I see now in M, the self-joined and filtered table is the way to make my own "context" for doing calculations and this will help me solve other problems where I want a custom context in the future.

    I had to look a few times at the custom column expression to figure out what is going on. I see that the inlined function is kind of like using function lambdas in LINQ for selection. Having that analog in mind kind of helps.

    Friday, August 29, 2014 11:59 PM
  • Thanks Colin, I will hang on to these. This is handy. 

    Can you share out your general purpose stat function library? It would be really handy to refer to these in the future. 

    Saturday, August 30, 2014 12:05 AM
  • Thanks Ehren, this works with a couple of small changes. First, I did the self join on Full Name and Date, so I am only comparing the Date of interest. Then, instead of comparing [ID]<>match[ID], I used [Start Date] as I do not have an ID column. This worked correctly (although will miss the case where there are two different entries logged with exactly the same starting date by accident).

    Note that you can generate an ID column for your source table in the PQ UI: Add Column-->Insert Index Column. You might consider doing so (and using the index column instead of the Start Date column) if the accidental entry issue that you mention can occur.

    Monday, September 1, 2014 7:17 PM
  • Thanks Colin, I will hang on to these. This is handy. 

    Can you share out your general purpose stat function library? It would be really handy to refer to these in the future. 

    Will send via a PM.
    Monday, September 1, 2014 7:18 PM