locked
Context Transition in CALCULATE and Clarification of Correct Terminology RRS feed

  • Question

  • Dear Team,


    Question #1:

    If I create a Calculated Column with this aggregate calculation:

    SUM-Aggregate =SUM(PlayersTable[Salary])

    Is this a correct statement:

    “Aggregate functions are affected by Filter Context, but will ignore Row Context and will show the same aggregated number in every row of the Calculated Column.”


    Question #2:

    If I create a Calculated Column with this aggregate calculation inside the CALCULATE function:

    SUM-Aggregate-in-CALCULATE =CALCULATE(SUM(PlayersTable[Salary]))

    Are these correct statements (these are my interpretations of Alberto Ferrari’s & Marco Russo’s book MS Excel 2013 Building Data Models with PowerPivot):

    “An aggregate calculation inside CALCULATE will show just the row values not the aggregate values, and so if there is no relationship on this table, the Calculated Column will simply show the individual amounts for each row from the Salary column.”

    “Inside CALCULATE, there is no Row Context”

    “If you use only the first argument in CALCULATE, it will transform Row Context to Filter Context.”


    Question #3:

    If the three statements are TRUE from question #2, how do I make sense of them given that as a person new to PowerPivot I “see” CALCULATE returning individual row amount and thus think that CALCULATE is doing a Row Context”

    Question #4:

    If I create a Calculated Field:

    TotalSalary:=SUM(PlayersTable[Salary])

    And then create this Calculated Column:

    SUM-CalculatedField-Auto-Call-To-CALCULATE =[TotalSalary]

    Is this a correct statement:

    “The Calculated Field comes with an automatic CALCULATE function wrapped around it and so because it is in a Calculated Column it will show just the row values and not the aggregated value”.

    Sincerely, Mike "Struggling To Learn PowerPivot" Girvin
    Friday, August 29, 2014 5:55 PM

Answers

  • Something to note that had slipped my mind (and usually does) is that all measures include an implicit CALCULATE(). My memory was refreshed when I was reviewing this post earlier today. To test this, make a measure TotalAmountCC5:=SUM(fSales[amount]) and then make a calculated column [Calculated Column]=[TotalAmountCC5]. Make sure to remove your TotalAmountCC2/TotalAmountCC4 first, else you'll see a circular dependency error.

    Why is there an error? Well, I've been glossing over something that's not been an issue in the examples we've used thus far. CALCULATE() transforms all row context into filter context, even that row context provided by calculated columns. What does this mean? In the examples you've provided thus far, there's no issue, but if you have two CALCULATE() calculated columns (TotalAmountCC2, 4, or 5), each will depend on the value of the other to determine its row context, and thus the filter context in which to evaluate its first argument. Thus a circular dependency - neither column can resolve its value since either depends on the value of the other being resolved so it can be transformed to filter context.

    Another way that might be useful to illustrate this whole idea is to cause your TotalAmountCC column to have an error. Replace SUM(fSales[amount]) with LASTDATE(fSales[amount]). This will cause TotalAmountCC to be an error, which will cause TotalAmountCC2 to error out as well. Why? The row context from the field TotalAmountCC will be an error, and thus part of the filter context in CALCULATE() will be an error, and error values cannot be tested for equality.

    Please let me know if you have further questions on this topic. DAX can be deceptively deep and seem very convoluted, and CALCULATE() alone will probably always be beyond my full understanding.

    • Marked as answer by Charlie Liao Monday, September 8, 2014 8:47 AM
    Friday, September 5, 2014 9:49 PM

All replies

  • Question #1: If I create a Calculated Column with this aggregate calculation: SUM-Aggregate =SUM(PlayersTable[Salary]) Is this a correct statement: “Aggregate functions are affected by Filter Context, but will ignore Row Context and will show

    the same aggregated number in every row of the Calculated Column.”

    This is technically correct, but may be misleading. There is no filter context in a calculated column, unless we use a CALCULATE() to introduce filter context (implicitly by transforming the row context into filter context, and explicitly through whatever filter logic we apply).

    Question #2: If I create a Calculated Column with this aggregate calculation inside the CALCULATE function: SUM-Aggregate-in-CALCULATE =CALCULATE(SUM(PlayersTable[Salary])) Are these correct statements (these are my interpretations of Alberto Ferrari’s & Marco Russo’s

    book MS Excel 2013 Building Data Models with PowerPivot): “An aggregate calculation inside CALCULATE will show just the row values not the aggregate

    values, and so if there is no relationship on this table, the Calculated Column will simply

    show the individual amounts for each row from the Salary column.” “Inside CALCULATE, there is no Row Context” “If you use only the first argument in CALCULATE, it will transform Row Context to Filter

    Context.”


    2a: No

    2b: Row Context -> Filter Context, not row context ceases to exist

    2c: Yes

    I said no to 2a, because it will perform an aggregation over [Salary] where all row items are equal. If you do not have a primary key, duplicates are possible. If your PlayersTable has only one other column, [OtherColumn] besides [Salary], CALCULATE( SUM( PlayersTable[Salary] ) ) will create a column that is the sum of all values where [OtherColumn] and [Salary] are equal. IF you have two players where [OtherColumn] = X and [Salary] = $100,000.00, then your calculated column will indicate $200,000.00 for each of those players. With a primary key (even composite), then your 2a would be correct, but this is not a general case.

    Question #3: If the three statements are TRUE from question #2, how do I make sense of them given that as a

    person new to PowerPivot I “see” CALCULATE returning individual row amount and thus think that

    CALCULATE is doing a Row Context”

    CALCULATE() transforms row context into filter context. As explained above, it's possible with duplicate rows (which are possible and valid depending on the model - I'd imagine this is not likely with your PlayersTable, based solely on my assumptions about the data which would be represented by a table named that way) to see CALCULATE(SUM()) return a value that is not equal to the value on that row.

    Question #4: If I create a Calculated Field: TotalSalary:=SUM(PlayersTable[Salary]) And then create this Calculated Column: SUM-CalculatedField-Auto-Call-To-CALCULATE =[TotalSalary] Is this a correct statement: “The Calculated Field comes with an automatic CALCULATE function wrapped around it and so

    because it is in a Calculated Column it will show just the row values and not the aggregated

    value”.

    The thought behind this is correct, the behavior is identical to just putting CALCULATE(SUM()) in a calculated column, but the statement itself is inaccurate for the same reason discussed above for duplicate rows.

    I hope these explanations have helped, and encourage you to make a small table of dummy data to play with inserting and removing duplicate rows, and inserting/removing duplicate fields for some rows. This discussion includes a common scenario where you may want a subtotal based on grouping by a subset of columns within a table - it may be useful to see how you'd manipulate your filter context.

    Please let us know if you have further questions.

    • Proposed as answer by Michael Amadi Sunday, August 31, 2014 10:00 AM
    Friday, August 29, 2014 7:48 PM
  • Michael Amadi,

    Thank you very much for taking the time to answer my questions. I am still utterly confused, but no matter, when I study some more I may be able to post further questions.

    Thursday, September 4, 2014 12:19 AM
  • Make sure to check out the material at sqlbi.com and powerpivotpro.com. Both of these resources have a large number of high quality posts around DAX and the logic behind calculate. The most recent post on powerpivotpro.com has a discussion of CALCULATE() logic, and the post it refers to is also a good resource. Checking out all the links (especially in the comments) is a great way to dive into some of this stuff.
    Thursday, September 4, 2014 2:15 PM
  • Michael Amadi,

    Thanks for the post back!

    I visit sqlbi.com and powerpivotpro.com often. They are great sites. Often I read posts and related discussions and also watch videos; but still I get confused with trying to put the pieces together in the story of how PowerPivot works. I will keep trying, though.

    As you suggested, I tried duplicate records and:

    1) The Calculated Column with the aggregate function SUM showed the same value in every row.

    2) The Calculated Column with the aggregate function SUM inside the CALCULATE function showed the individual values, except where there were duplicate records in which case it showed the total for the duplicate records.

    Here was the table in the PowerPivot Manage area:

    date sales rep product amount TotalAmountCC TotalAmountCC2
    9/3/2014 joe p1 22 76 44
    9/4/2014 sindy p2 15 76 15
    9/5/2014 sindy p1 17 76 17
    9/3/2014 joe p1 22 76 44

     

    Here are the Calculated Columns that I created:

    TotalAmountCC =SUM(fSales[amount])

    TotalAmountCC2 =CALCULATE(SUM(fSales[amount]))

    Michael Amadi, Is this supposed to illustrate Filter Context, because it is adding duplicate records?

    I also tried it with a primary key and the =CALCULATE(SUM(fSales[amount])) Calculated Column did not show the total for the duplicate records, just as you said. For this primary key, I did not go to Table Behaviors and set it as a Unique Identifier.

    Michael Amadi, Why does a primary key change the behavior of CALCULATE?  How did CALCULATE know there was a Primary Key?

    Here was the table in the PowerPivot Manage area:

    PrimaryKey date sales rep product amount TotalAmountCC3 TotalAmountCC4
    1 9/3/2014 joe p1 22 76 22
    2 9/4/2014 sindy p2 15 76 15
    3 9/5/2014 sindy p1 17 76 17
    4 9/3/2014 joe p1 22 76 22

    Here are the Calculated Columns:

    TotalAmountCC3 =SUM(fSalesWithPrimaryKey[amount])

    TotalAmountCC4 =CALCULATE(SUM(fSalesWithPrimaryKey[amount]))

    Thank you for your continued help!

     

     




    Thursday, September 4, 2014 6:58 PM
  • First, a minor thing, but I'm Greg. Michael Amadi proposed my original response as an answer to the thread, but hasn't participated otherwise in the thread.

    I don't know how familiar you are with SQL, but I'll try explaining with that.

    TotalAmountCC =SUM(fSales[amount]) is roughly equivalent to the following being evaluated:

    SELECT 
        f1.*
        ,TotalAmountCC = (SELECT
                            SUM(f2.amount)
                          FROM fSales f2
                          )
    FROM fSales f1


    Wrapping your measure in CALCULATE() changes the filter context for evaluation. 

    TotalAmountCC2 =CALCULATE(SUM(fSales[amount])) is roughly equivalent to the following SQL:

    SELECT
      f1.*
      ,totalcc2 = (SELECT 
                     SUM(f2.amount)
                   FROM fSales f2
                   WHERE    f2.date = f1.date AND
                            f2.[sales rep] = f1.[sales rep] AND
    			f2.product = f1.product AND
                            f2.amount = f1.amount
                            )
    FROM fSales f1

    What CALCULATE() is doing is looking at every field in the row, and then it is finding every row that matches on all fields. When you introduce a primary key, it is a logical impossibility for any two rows to have matching values for every field. Power Pivot does not in this case understand that a primary key exists, it is simply looking for rows where the field which happens to be called [PrimaryKey] is equal to the value of that field on the current row.

    For a more concrete example, I will look at what is happening on the first and second rows of each table for each formula.

    TotalAmountCC

    Row 1: Just take the sum of all values in the field [amount]

    Row 2: Just take the sum of all values in the field [amount]

    TotalAmountCC2

    Row 1: Create a filter context that will only consider rows where [date] = 9/3/2014 AND [sales rep] = 'joe' AND [product] = 'p1' AND [amount] = 22. Evaluating the table with respect to this filter context gives us two rows, row 1 and row 4. We then sum the field [amount] for these rows

    Row 2: Create a filter context that will only consider rows where [date] = 9/4/2014 AND [sales rep] = 'sindy' AND [product] = 'p2' AND [amount] = 15. Evaluating the table with respect to this filter context gives us only one row, row 2. Thus we evaluate the sum of the field [amount] for this one row. Doing so returns the value since adding 15 and nothing is 15

    TotalAmountCC3

    This is exactly the same as above.

    TotalAmountCC4

    Row 1: Create a filter context that will only consider rows where [PrimaryKey] = 1 AND[date] = 9/3/2014 AND [sales rep] = 'joe' AND [product] = 'p1' AND [amount] = 22. Evaluating the table with respect to this filter context gives us only one row, row 1, because row 4 does not match on [PrimaryKey], even though it matches on all other fields. Thus, we evaluate the sum of the field [amount] for this one row, giving us 22, the value on that row.

    Row 2: Create a filter context that will only consider rows where [PrimaryKey] = 2 [date] = 9/4/2014 AND [sales rep] = 'sindy' AND [product] = 'p2' AND [amount] = 15. Evaluating the table with respect to this filter context gives us only one row, row 2. Thus we evaluate the sum of the field [amount] for this one row. Doing so returns the value since adding 15 and nothing is 15.

    It is important to note that it doesn't matter that the field you added is a primary key, or that it is named [PrimaryKey]. You could have added time of day, and so long as the values are different for each row, you'd see the same behavior. Setting the table property of the field as a primary key will not change the behavior of CALCULATE() in this situation. The only thing CALCULATE() is doing is evaluating the function (SUM() in our case) for every row where every single field is equal to the values of that field for the current row.

    • Edited by greggyb Friday, September 5, 2014 8:47 PM typos
    Friday, September 5, 2014 2:57 PM
  • Something to note that had slipped my mind (and usually does) is that all measures include an implicit CALCULATE(). My memory was refreshed when I was reviewing this post earlier today. To test this, make a measure TotalAmountCC5:=SUM(fSales[amount]) and then make a calculated column [Calculated Column]=[TotalAmountCC5]. Make sure to remove your TotalAmountCC2/TotalAmountCC4 first, else you'll see a circular dependency error.

    Why is there an error? Well, I've been glossing over something that's not been an issue in the examples we've used thus far. CALCULATE() transforms all row context into filter context, even that row context provided by calculated columns. What does this mean? In the examples you've provided thus far, there's no issue, but if you have two CALCULATE() calculated columns (TotalAmountCC2, 4, or 5), each will depend on the value of the other to determine its row context, and thus the filter context in which to evaluate its first argument. Thus a circular dependency - neither column can resolve its value since either depends on the value of the other being resolved so it can be transformed to filter context.

    Another way that might be useful to illustrate this whole idea is to cause your TotalAmountCC column to have an error. Replace SUM(fSales[amount]) with LASTDATE(fSales[amount]). This will cause TotalAmountCC to be an error, which will cause TotalAmountCC2 to error out as well. Why? The row context from the field TotalAmountCC will be an error, and thus part of the filter context in CALCULATE() will be an error, and error values cannot be tested for equality.

    Please let me know if you have further questions on this topic. DAX can be deceptively deep and seem very convoluted, and CALCULATE() alone will probably always be beyond my full understanding.

    • Marked as answer by Charlie Liao Monday, September 8, 2014 8:47 AM
    Friday, September 5, 2014 9:49 PM
  • Greg, Thank you so much for this clear explanation!!! AND, AND, AND... If they are not all TRUE in any duplicate record, then CALCULATE will not add it like SUMIFS would in Excel!
    Sunday, September 7, 2014 6:09 AM
  • Greg,

    Yes, I have encountered this a few times in my study of PowerPivot and in some of my own attempts at report solutions using PowerPivot. Your explanation of Circular Dependency helps much!

    Thank you for detailed and helpful comments!

    Sunday, September 7, 2014 6:17 AM
  • Hi Mike,

    As Greg has already mentioned, I merely proposed one of his responses as a possible answer. Greg is the one who has actually been providing you with the great responses you have received so far. If you believe that any of the responses in this thread have answered your question, please use the 'Mark as answer' link to mark it as an answer :)

     

    Regards,

    Michael Amadi

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful :)

    Website: http://www.nimblelearn.com
    Blog: http://www.nimblelearn.com/blog
    Twitter: @nimblelearn

    Sunday, September 7, 2014 7:47 PM