none
Difference vs Previous Row & DistinctCount RRS feed

  • Question

  • I want to calculate a Delta Weeks column in Power Query WeekNum[current row] - WeekNum[previous row]

    I found a way to do it using the [Index] column, but it is painfully slow, and my table is 100k rows.

    let
        Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"Customer", type text}, {"Product", type text}, {"WeekNum", Int64.Type}}),
    
        #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
        #"Added Custom" = Table.AddColumn(#"Added Index", "Delta Weeks", each try Source[WeekNum]{[Index]} - Source[WeekNum]{[Index]-1} otherwise 0)
    
    in
        #"Added Custom"

    Also, after this, I need another column who would count the distinct values from the beginning up to that row.

    Most of the weeks are consecutive, so basically the distinct count will increase when they are not. 

    (I don't know how to do this in Power Query).

    Tuesday, July 19, 2016 5:51 AM

Answers

  • Hi James. This might speed things up slightly:

    #"Added Custom" = Table.AddColumn(#"Added Index", "Delta Weeks", each try [WeekNum] - Source[WeekNum]{[Index]-1} otherwise 0)

    Doing an accumulating/running column is difficult in PQ and likely to be quite inefficient. Have you considered doing this part in Power Pivot (i.e. loading the data to the Data Model via PQ, and adding a custom measure in Power Pivot to do the distinct count)?

    Ehren


    Tuesday, July 19, 2016 11:41 PM
    Owner
  • Yes, you're right: Shifted columns perform faster than [Index]-1. At the end I found this function to perform fastest in M:

    (Sourcetable, ValueColumn as text, SortColumn as text, optional Group as text) =>
    
    let
    
    // Internal function retrieving previous row
    fnPreviousRow = 
    (Source, Value_Column as text, Sort_Column as text) =>
    let
        Source0 = Source,
        #"Sorted Rows" = Table.Buffer(Table.Sort(Source0,{{Sort_Column, Order.Ascending}})),
        AddedIndex = Table.AddIndexColumn(#"Sorted Rows", "fnIndex", 0, 1),
        #"Added Index" = Table.AddIndexColumn(AddedIndex, "fnIndex2", 1, 1),
        #"Merged Queries" = Table.NestedJoin(#"Added Index",{"fnIndex"},#"Added Index",{"fnIndex2"},"NewColumn",JoinKind.LeftOuter​),
        #"Expanded NewColumn" = Table.ExpandTableColumn(#"Merged Queries", "NewColumn", {Value_Column}, {"PreviousRow"}),
        #"Removed Columns" = Table.RemoveColumns(#"Expanded NewColumn",{"fnIndex2"})
    in
        #"Removed Columns",
    
    // Tow different code-versions depending on group yes/no
    
    // 1 Code for group 
        #"Grouped Rows" = Table.Group(Sourcetable, {Group}, {{"Content", each Table.Buffer(_), type table}}),
        ExecuteFunction = Table.AddColumn(#"Grouped Rows", "Custom", each fnPreviousRow([Content], ValueColumn, SortColumn)),
        #"Removed Columns" = Table.RemoveColumns(ExecuteFunction,{"Content"}),
        ResultGrouped = Table.ExpandTableColumn(#"Removed Columns", "Custom", List.Difference(List.Union({Table.ColumnNames(Sour​cetable), {"fnIndex", "PreviousRow"}}), {Group}), List.Difference(List.Union({Table.ColumnNames(Sour​cetable), {"fnIndex", "PreviousRow"}}), {Group})),
    
    // 2 Code without group
        ResultUngrouped = fnPreviousRow(Sourcetable, ValueColumn, SortColumn),
    
    // Select Result
        FinalResult = if Group <> null then ResultGrouped else ResultUngrouped
    in
        FinalResult

    The optional group-parameter speeds it up because it chunks up your table and the indices don't count that high then. (Sad, because I like the [Index]-1-syntax, but it is simply too slow...)

    An if this is still too slow, you have the DAX-alternative: Create your index in M and create the column in DAX (PowerPivot):

    PreviousRow = CALCULATE(SUM('YourTable'[Value]), FILTER('YourTable', YourTable[Index]=EARLIER(YourTable[Index])-1) && YourTable[Group]=EARLIER(YourTable[Group]))


    Imke Feldmann TheBIccountant.com

    Friday, July 29, 2016 8:35 AM
    Moderator

All replies

  • Hi James. This might speed things up slightly:

    #"Added Custom" = Table.AddColumn(#"Added Index", "Delta Weeks", each try [WeekNum] - Source[WeekNum]{[Index]-1} otherwise 0)

    Doing an accumulating/running column is difficult in PQ and likely to be quite inefficient. Have you considered doing this part in Power Pivot (i.e. loading the data to the Data Model via PQ, and adding a custom measure in Power Pivot to do the distinct count)?

    Ehren


    Tuesday, July 19, 2016 11:41 PM
    Owner
  • You're right, Ehren, I've just discovered that PQ wasn't designed for working with previous row context.

    What I did find works better than referencing the previous row using [Index]-1, is creating 2 index columns (one starting with: 0,1,2, and the other with 0,0,1,2, so basically an [Index]-1 floored at 0) and then joining the 2 tables, which basically puts the previous row on the same row, if that makes sense.

    However even that was too slow for me, and in the end I implemented a different approach, and I simply use a bit of VBA code where I calculate the difference via previous row, and then import the table in PQ. I think this is a more efficient (and considerably faster) approach!


    Friday, July 22, 2016 9:40 PM
  • Just wondering if it really is by design that PQ doesn't know it's row number (and we therefore have to apply it by using a costly index) or if the command for working with the "internal" row number is just missing. Are there any plans to give an easier option for this in the future?


    Imke Feldmann TheBIccountant.com

    Saturday, July 23, 2016 4:03 AM
    Moderator
  • Yes, it's currently by design that PQ doesn't know its row number.

    Ehren

    Monday, July 25, 2016 5:16 PM
    Owner
  • Hi folks,

    In this method, no join is necessary and math is done with buffered lists/tables (I think).  I can't comment on efficiency or relative performance since I don't have a benchmark; maybe the code above can serve as one.  Code improvement suggestions are welcome as always.  I think it works :).  I'm a newbie.  Many thanks.

    Is this approach even reasonable to address the running total use case??

    let
        Source = Table.Buffer( Excel.CurrentWorkbook(){[Name="InputTable"]}[Content] ),
        WeekList = List.Buffer( Source[WeekNum] ),
        WeekListCount = List.Count( WeekList ),
        WeekListWithOffset = List.Buffer( List.Combine( { {0}, List.FirstN( WeekList, WeekListCount - 1 ) } ) ),
        Index = Table.Buffer( Table.AddIndexColumn( Source, "Index", 1, 1) ),
        AddedDeltas = Table.Buffer( Table.AddColumn(Index, "Delta", each if [Index] = 1 then 0 else ( [WeekNum] - List.First( List.Range( WeekListWithOffset, [Index] - 1 ) ) ) ) ),
        NonDistinctDeltaList = List.Buffer( AddedDeltas[Delta] ),
        AddedRunningDeltas = Table.Buffer( Table.AddColumn(AddedDeltas, "RunningDeltas", each List.Count( List.Distinct( List.Range( NonDistinctDeltaList, 0, [Index] ) ) ) ) )
    in
        AddedRunningDeltas


    • Edited by Nin Sute Wednesday, July 27, 2016 9:55 PM
    Wednesday, July 27, 2016 9:34 PM
  • Yes, you're right: Shifted columns perform faster than [Index]-1. At the end I found this function to perform fastest in M:

    (Sourcetable, ValueColumn as text, SortColumn as text, optional Group as text) =>
    
    let
    
    // Internal function retrieving previous row
    fnPreviousRow = 
    (Source, Value_Column as text, Sort_Column as text) =>
    let
        Source0 = Source,
        #"Sorted Rows" = Table.Buffer(Table.Sort(Source0,{{Sort_Column, Order.Ascending}})),
        AddedIndex = Table.AddIndexColumn(#"Sorted Rows", "fnIndex", 0, 1),
        #"Added Index" = Table.AddIndexColumn(AddedIndex, "fnIndex2", 1, 1),
        #"Merged Queries" = Table.NestedJoin(#"Added Index",{"fnIndex"},#"Added Index",{"fnIndex2"},"NewColumn",JoinKind.LeftOuter​),
        #"Expanded NewColumn" = Table.ExpandTableColumn(#"Merged Queries", "NewColumn", {Value_Column}, {"PreviousRow"}),
        #"Removed Columns" = Table.RemoveColumns(#"Expanded NewColumn",{"fnIndex2"})
    in
        #"Removed Columns",
    
    // Tow different code-versions depending on group yes/no
    
    // 1 Code for group 
        #"Grouped Rows" = Table.Group(Sourcetable, {Group}, {{"Content", each Table.Buffer(_), type table}}),
        ExecuteFunction = Table.AddColumn(#"Grouped Rows", "Custom", each fnPreviousRow([Content], ValueColumn, SortColumn)),
        #"Removed Columns" = Table.RemoveColumns(ExecuteFunction,{"Content"}),
        ResultGrouped = Table.ExpandTableColumn(#"Removed Columns", "Custom", List.Difference(List.Union({Table.ColumnNames(Sour​cetable), {"fnIndex", "PreviousRow"}}), {Group}), List.Difference(List.Union({Table.ColumnNames(Sour​cetable), {"fnIndex", "PreviousRow"}}), {Group})),
    
    // 2 Code without group
        ResultUngrouped = fnPreviousRow(Sourcetable, ValueColumn, SortColumn),
    
    // Select Result
        FinalResult = if Group <> null then ResultGrouped else ResultUngrouped
    in
        FinalResult

    The optional group-parameter speeds it up because it chunks up your table and the indices don't count that high then. (Sad, because I like the [Index]-1-syntax, but it is simply too slow...)

    An if this is still too slow, you have the DAX-alternative: Create your index in M and create the column in DAX (PowerPivot):

    PreviousRow = CALCULATE(SUM('YourTable'[Value]), FILTER('YourTable', YourTable[Index]=EARLIER(YourTable[Index])-1) && YourTable[Group]=EARLIER(YourTable[Group]))


    Imke Feldmann TheBIccountant.com

    Friday, July 29, 2016 8:35 AM
    Moderator
  • Thank you, Imke, for an elegant solution.  I'll give it a go.  Tough to measure how one method performs against the other.  What tools do you use to measure performance and efficiency?!
    Wednesday, August 3, 2016 8:32 PM
  • Dear Imke,

    I tried the your code as a query. There was no syntax error. However after closing the editor, it pops up

    "An error occurred in the ‘’ query. Expression.Error: The name 'Sour​cetable' wasn't recognized.  Make sure it's spelled correctly."

    Very strange ... ... Might you have an excel file instead?

    Thank you for your time.

     

    Saturday, September 30, 2017 4:59 PM