none
Date filtering with sparse table, getting most recent column value

    Question

  • My intent is to generate some measures in PowerPivot which dynamically update a count of rows or unique values.  The tricky bit is that the table is sparse, only recording the date that a value changed.  I want the count to include the most recent row for each customer, regardless of whether that status update date is within the query filter context or earlier.

    I have two questions, and will provide sample T-SQL (SQL2012) for a simple example data set to use in PowerPivot in Excel 2013.

    Q1: Can you help me understand if my thought processes are reasonable thus far,  or correct them appropriately, with special focus on the pseudo-code listed as steps 1-6 at the end of the post?

    Q2: Can you help me understand how I'd move forward to actually get the counts I'm expecting?  I think I'm particularly stuck on using SUMMARIZE(), since it's non-trivial to visualize dimension results when writing them in PowerPivot.

    Example data setup:

    XX_SAMPLE_DATE is a fully-populated date table with a unique DateKey.  It has a row for every date in the relevant years.  No dates exist in the dataset without this getting updated.  Typically, the user will be filtering by Month, Quarter, WeekOfYear, FiscalMonth, etc.  Sample SQL to create XX_SAMPLE_DATE table provided below.

    XX_SAMPLE is a fact table, having sparse updates to customer status.  This means that there is only a record at the date that something changes, and that status persists for that customer until another record is added for that customer.

    In PowerPivot, there is an active relationship from XX_SAMPLE[StatusDate] to XX_SAMPLE_DATE[DateKey].  Finally, XX_SAMPLE_DATE is marked as a DateTable in PowerPivot.

    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[XX_SAMPLE](
    	[SurrogateKey] [int] IDENTITY(11,11) NOT NULL PRIMARY KEY,
    	[Customer_ID] [int] NOT NULL,
    	[StatusGroup] [nvarchar](20) NOT NULL,
    	[StatusDate] [date] NOT NULL,
    	);
    GO
    CREATE TABLE [dbo].[XX_SAMPLE_DATE](
    	[DateKey] [date] NOT NULL PRIMARY KEY,
    	[DateInt] [int] NOT NULL,
    	[YearMonthInt] [int] NOT NULL,
    	[Year] [int] NOT NULL,
    	[MonthOfYear] [tinyint] NOT NULL
    	-- Many others removed for brevity
    	);
    GO
    INSERT [dbo].[XX_SAMPLE] ([Customer_ID], [StatusGroup], [StatusDate])
    VALUES
    	(1, '01 FirstStatus', 'Jan 1, 2001')
    	,(1, '02 SecondStatus', 'Mar 3, 2001')
    	,(2, '02 SecondStatus', 'Feb 14, 2001')
    	,(4, '01 FirstStatus', 'April 1, 2001')
    	,(4, '02 SecondStatus', 'April 4, 2001')
    	,(1, '01 FirstStatus', 'June 4, 2001')
    	,(3, '03 FinalStatus', 'June 4, 2001')
    	,(4, '03 FinalStatus', 'June 4, 2001');
    GO
    DECLARE @D [DATE] = 'Jan 1, 2001';
    DECLARE @maxD [DATE] = 'Dec 31, 2001';
    WHILE (@D <= @maxD)
    BEGIN
    	INSERT [dbo].[XX_SAMPLE_DATE]([DateKey], [DateInt], [YearMonthInt], [Year], [MonthOfYear])
    	VALUES (@D, (YEAR(@D)*100*100) + (MONTH(@D)*100) + DAY(@D), (YEAR(@D)*100) + MONTH(@D), YEAR(@D), MONTH(@D));
    	SET @D = DATEADD(day,1,@D);
    END

    Example expected output:

    Here's an example of the output I would expect, when the Query Filter Context is based on Year / Month / Status.  I am getting the count of rows, but back-filling for customers who have no status change that month (using most recent status).

    /* Notes on why interesting */
    Query Filter Context => Measure result
    
    /* Cust 1 added */
    2001/JAN/'01 FirstStatus'  ==> 1
    2001/JAN/'02 SecondStatus' ==> 0
    2001/JAN/'03 FinalStatus'  ==> 0
    2001/JAN SUBTOTAL ==> 1
    
    /* Cust 2 added as status 02 */
    /* Note: Still need to report Cust 1 (sparse table) */
    2001/FEB/'01 FirstStatus'  ==> 1
    2001/FEB/'02 SecondStatus' ==> 1
    2001/FEB/'03 FinalStatus'  ==> 0
    2001/FEB SUBTOTAL ==> 2
    
    /* Cust 1 moved to status 02 */
    2001/MAR/'01 FirstStatus'  ==> 0
    2001/MAR/'02 SecondStatus' ==> 2
    2001/MAR/'03 FinalStatus'  ==> 0
    2001/MAR SUBTOTAL ==> 2
    
    /* Cust 4 is new with same-month update */
    /* Only most recent status is reported */
    2001/APR/'01 FirstStatus'  ==> 0
    2001/APR/'02 SecondStatus' ==> 3
    2001/APR/'03 FinalStatus'  ==> 0
    2001/APR SUBTOTAL ==> 3
    
    /* Nothing at all for this month! */
    2001/MAY/'01 FirstStatus'  ==> 0
    2001/MAY/'02 SecondStatus' ==> 3
    2001/MAY/'03 FinalStatus'  ==> 0
    2001/MAY SUBTOTAL ==> 3
    
    /* Cust 1 moves back to Stat 01 */
    /* Cust 3 newly added */
    /* Cust 4 goes to final status */
    2001/JUN/'01 FirstStatus'  ==> 1
    2001/JUN/'02 SecondStatus' ==> 1
    2001/JUN/'03 FinalStatus'  ==> 2
    2001/JUN SUBTOTAL ==> 4
    
    /* Final month with no action */
    2001/JUL/'01 FirstStatus'  ==> 1
    2001/JUL/'02 SecondStatus' ==> 1
    2001/JUL/'03 FinalStatus'  ==> 2
    2001/JUL SUBTOTAL ==> 4

    I'm coming in with some SQL knowledge, but am fairly new to DAX.  I understand the row/query/filter context distinctions.  At the most basic level, I understand the following:

    (1) This is a measure, because I want the results to change based on the query context.

    (2) I can use LASTDATE() on the XX_SAMPLE_DATE table to get the end date of the filter range.  I'll call this @DATE_MAX below...

    (3) I expected to use CALCULATE() coupled with an updated date column filter on the XX_SAMPLE table to get all the rows.  This would have lost the other filters, as I think I'd have to use ALL() when messing with the fact table.

    (4) I expected to use CALCULATE() on XX_SAMPLE_DATE, where I update the filter on the DateKey to be <= @DATE_MAX.  Due to special handling of DateTable filter updates, this should work.

    (5) I expected to be able to eventually wrap #4 in CALCULATETABLE() to get all the status updates of interest from XX_SAMPLE being less than or equal to the original query context's maximum date.

    (6) I then expected to then wrap #5 in SUMMARIZE() to group by Customer ID, and extract the "most recent" row for each CustomerID, Status, and other filters currently applied to the query context.  I hoped to then simply get a ROWCOUNT().  (Analogizing SUMMARIZE() to OVER/PARTITION BY).

    Any assistance in helping me understand how to succeed at thiese steps will be greatly appreciated. My attempted DAX should follow in my own replies.

    Sincerely,

    Henry Gabryjelski

    Disclosure: I work for Microsoft, but this is entirely my own question. Views and opinions are mine, and may differ from Microsoft's.  etc.


    10 years Windows storage drivers | Microsoft Patent Lawyer | LCA - Patents | Microsoft (not giving legal advice)



    Thursday, April 18, 2013 7:33 AM

Answers

  • LASTDATE() acutally returns a table and this table can be used as a filter in a calculate command:

    LastOrderOrderCount:=CALCULATE(
    COUNTROWS('Internet Sales'),
    CALCULATETABLE(
        LASTDATE('Internet Sales'[Order Date]),
        DATESBETWEEN('Date'[Date], BLANK(), LASTDATE('Date'[Date])))
    )

    DATESBETWEEN() returns all Dates till the last selected date
    this is used as a filter to calculate the last OrderDate for the current slice/filtercontext

    for the last OrderDate we calculate our value - in this case a simple COUNTROWS()

    I guess you can apply a similar logic for your calculation

    hth,
    gerhard


    - www.pmOne.com -

    Thursday, April 18, 2013 6:29 PM

All replies

  • DAX in Excel at each step:

    I can get this latest date in the query filter context rather easily.  The results depends on whether there was an event within the query filter context's date range.  If yes, I get the last day of the month.  If no, I get 1900-01-01.

    Step 2:=LASTDATE(XX_SAMPLE_DATE[DateKey])

    This is good, as it means I should be able to use this in Step 3.  However, each of the following returns the cryptic error: "Semantic Error: A function CALCULATE has been used in a TRUE/FALSE expression that is used as a table filter expression.  This is not allowed."

    Step3a:=LASTDATE( CALCULATETABLE( XX_SAMPLE, XX_SAMPLE[StatusDate] <= [Step 2] ) )
    
    Step3b:=CALCULATE( LASTDATE(XX_SAMPLE_DATE[DateKey]), ALL(XX_SAMPLE_DATE), XX_SAMPLE_DATE[DateKey] <= [Step 2])

    Thus, if for Step 3 I simply wish to obtain the most recent status in the spare update table (using the filter's end date, but ignore the filter's start date), how would I do so?

    Many thanks....


    10 years Windows storage drivers | Microsoft Patent Lawyer | LCA - Patents | Microsoft (not giving legal advice)

    Thursday, April 18, 2013 4:34 PM
  • LASTDATE() acutally returns a table and this table can be used as a filter in a calculate command:

    LastOrderOrderCount:=CALCULATE(
    COUNTROWS('Internet Sales'),
    CALCULATETABLE(
        LASTDATE('Internet Sales'[Order Date]),
        DATESBETWEEN('Date'[Date], BLANK(), LASTDATE('Date'[Date])))
    )

    DATESBETWEEN() returns all Dates till the last selected date
    this is used as a filter to calculate the last OrderDate for the current slice/filtercontext

    for the last OrderDate we calculate our value - in this case a simple COUNTROWS()

    I guess you can apply a similar logic for your calculation

    hth,
    gerhard


    - www.pmOne.com -

    Thursday, April 18, 2013 6:29 PM
  • Hi Gerhard,

    Thank you for the recommendation.  If I look at your command, it appears to count the number of previously ordered items, where there is at least one such order in the current query context's filtered view of time.

    Converting to my sample data, I tried the direct conversion, to count any prior status updates as follows:

    Step3d:=CALCULATE(
      COUNTROWS(XX_SAMPLE)
      ,CALCULATETABLE(
    	  LASTDATE(XX_SAMPLE[StatusDate])
    	  ,DATESBETWEEN(XX_SAMPLE_DATE[DateKey], BLANK(), LASTDATE(XX_SAMPLE_DATE[DateKey]))
    	)
    )
    
    Unfortunately, this did not work.  Although I did not get the semantic error, I only received blank results.  Perhaps I misunderstood how to translate your example?

    10 years Windows storage drivers | Microsoft Patent Lawyer | LCA - Patents | Microsoft (not giving legal advice)

    Thursday, April 18, 2013 9:34 PM
  • did you also mark XX_SAMPLE_DATE as a "Mark as Date-Table" and set the DAte-column to DateKEy?

    - www.pmOne.com -

    Friday, April 19, 2013 4:01 PM
  • I have a similar problem to be solved.

    In one table I have dates and currency values (sparsely distributed). I need to be able to calculate what is the latest currency value of a specific date. I have tried the solutions suggested above but not successfully.

    Hopefully someone can help with this.

    Thursday, May 16, 2013 8:17 AM
  • Hi Gerhard,

    Thanks for the note about DATESBETWEEN() and the ability to use BLANK() to remove the lower bound.  I did not know that was an option.  This is precisely what I was missing!  Although I have not tested this answer, it seems a perfect fit (unless some random restriction as to nesting levels, etc.).

    Sincerely,

    Henry Gabryjelski


    10 years Windows storage drivers | Microsoft Patent Lawyer | LCA - Patents | Microsoft (not giving legal advice)

    Monday, September 16, 2013 8:54 PM
  • Hi Gerhard,

    Yes, I did both of these.  Your other answer seems spot-on, however!

    Thanks,

    Henry Gabryjelski


    10 years Windows storage drivers | Microsoft Patent Lawyer | LCA - Patents | Microsoft (not giving legal advice)

    Monday, September 16, 2013 8:55 PM