none
MDX Performance Issue - Dimension Member Count(Filter())

    Question

  • I have a dimension that will be 700,000 members -- not ideal!  The good news is that no one is likely to be presenting that dimension on a report.  The bad news is that I have a calculation I need to perform that I am pretty sure is going to have serious performance problems.  I've optimized it to the degree I can figure, I'm willing to pursue any other avenues you can suggest including transferring logic to ETL, reworking MDX along different lines, cache warming, specific aggregations, etc.

    Without further ado..

    SCOPE ( [Customer].[Account Code].[All], [Measures].[CEMI-2] ) ; This = Count (
    {
      
    Filter (
        
    NonEmpty (
          {
            
    Descendants ( [Customer].[Account Code].[All],, Leaves ) - [Customer].[Account Code].[Unknown]
          },
          Measures.[Outage Count]
        ),
        ( [Outage].[Interruption Flag], [Measures].[Customer Outage Count] ) > 2
      )
    } ) ;
    format_string ( This ) = '#,##0;(#,##0)' ; END SCOPE ;

    The Customer.[Account Code] attribute is the one that is likely to have 700k members.  It is the key of the customer dimension.  The NONEMPTY() function seemed to help a great deal, but I'm not sure if that will be the case once I get my full production data because a lot of customers will have at least one outage.   I know the filter function is my big hog here, is there a better way?

    I don't see how transferring logic to the ETL can work in this case, given the nature of the calculation.  I would have to create a measure group restricted to just one or maybe two dimensions and have everything hardcoded..

    I haven't done aggregations on my cube yet, but I also don't have the full load of data yet, so I figure the situation will get worse not better.

    Thanks so much for your assistance, I really enjoy a good MDX problem and this looks to be a fun one.

    -Ken

    Wednesday, August 14, 2013 8:05 PM

Answers

  • Hi

    Yes, we can improve the performance of it

    1. Try this using similar to your logic but using IIF and SUM for block mode

    SCOPE ( [Customer].[Account Code].[All], [Measures].[CEMI-2] ) ; 
    This = SUM (
    			 NonEmpty (
    						   {
    							 Descendants ( [Customer].[Account Code].[All],, Leaves ) - [Customer].[Account Code].[Unknown]
    						   }
    						 , Measures.[Outage Count]
    			           ),
    			 IIF( ([Outage].[Interruption Flag], [Measures].[Customer Outage Count] ) > 2 
    			      , 1
    				  , NULL
    				)
             ) ; 
    	format_string ( This ) = '#,##0;(#,##0)' ; 
    	END SCOPE ;

    2. We can further improve the performance using a another measure to hanlde the IIF condition from above query

    try this

    CREATE MEMBER CURRENTCUBE.[Measures].[OutageCount] AS
     IIF( ([Outage].[Interruption Flag], [Measures].[Customer Outage Count] ) > 2 
    			      , 1
    				  , NULL
    				);
    , VISIBLE = 0;
    
    SCOPE ( [Customer].[Account Code].[All], [Measures].[CEMI-2] ) ; 
    This = SUM (
    			 NonEmpty (
    						   {
    							 Descendants ( [Customer].[Account Code].[All],, Leaves ) - [Customer].[Account Code].[Unknown]
    						   }
    						 , Measures.[Outage Count]
    			           ),
    			 [Measures].[OutageCount]
             ) ; 
    	format_string ( This ) = '#,##0;(#,##0)' ; 
    	END SCOPE ;

    3. I am not sure why your using DESCEDANTS here but can try using hierarchy member directly if possible

    Let me know your results


    Prav

    • Marked as answer by Ken in Tampa Wednesday, November 06, 2013 2:19 PM
    Wednesday, August 14, 2013 10:35 PM
  • Having NonEmpty inside SUM is going to slowdown the performance, try to remove the NonEmpty and see if it improves the performance
    Friday, August 16, 2013 6:15 PM

All replies

  • Hi

    Yes, we can improve the performance of it

    1. Try this using similar to your logic but using IIF and SUM for block mode

    SCOPE ( [Customer].[Account Code].[All], [Measures].[CEMI-2] ) ; 
    This = SUM (
    			 NonEmpty (
    						   {
    							 Descendants ( [Customer].[Account Code].[All],, Leaves ) - [Customer].[Account Code].[Unknown]
    						   }
    						 , Measures.[Outage Count]
    			           ),
    			 IIF( ([Outage].[Interruption Flag], [Measures].[Customer Outage Count] ) > 2 
    			      , 1
    				  , NULL
    				)
             ) ; 
    	format_string ( This ) = '#,##0;(#,##0)' ; 
    	END SCOPE ;

    2. We can further improve the performance using a another measure to hanlde the IIF condition from above query

    try this

    CREATE MEMBER CURRENTCUBE.[Measures].[OutageCount] AS
     IIF( ([Outage].[Interruption Flag], [Measures].[Customer Outage Count] ) > 2 
    			      , 1
    				  , NULL
    				);
    , VISIBLE = 0;
    
    SCOPE ( [Customer].[Account Code].[All], [Measures].[CEMI-2] ) ; 
    This = SUM (
    			 NonEmpty (
    						   {
    							 Descendants ( [Customer].[Account Code].[All],, Leaves ) - [Customer].[Account Code].[Unknown]
    						   }
    						 , Measures.[Outage Count]
    			           ),
    			 [Measures].[OutageCount]
             ) ; 
    	format_string ( This ) = '#,##0;(#,##0)' ; 
    	END SCOPE ;

    3. I am not sure why your using DESCEDANTS here but can try using hierarchy member directly if possible

    Let me know your results


    Prav

    • Marked as answer by Ken in Tampa Wednesday, November 06, 2013 2:19 PM
    Wednesday, August 14, 2013 10:35 PM
  • Your answers were very helpful, and the calculation is still returning correct results with these changes.  I will have to wait until I get the full load of data to perform good performance testing on these changes.

    Here is what I have thus far, in case you or anyone else has any other suggestions.  As mentioned this will be a high-stress calculation with such a large dimension.  I am on sql2008r2 enterprise, fyi.

    [Measures].[CEMI-2 Customer Count]=IIF(([Outage].[Interruption Flag], [Measures].[Customer Outage Count] ) > 2, 1, NULL);
    SCOPE ( [Customer].[Account Code].[All], [Measures].[CEMI-2] ) ; This = Sum ( 
        NonEmpty (
          {
            [Customer].[Account Code].[All].CHILDREN - [Customer].[Account Code].[Unknown]
          },
          Measures.[Outage Count]
        ),
        [Measures].[CEMI-2 Customer Count]
     ) ; format_string ( This ) = '#,##0;(#,##0)' ; END SCOPE ;

    Thanks!!  I will mark you as the answerer once I'm sure I'm not getting any more answers from anyone else.

    -Ken

    Thursday, August 15, 2013 1:52 PM
  • Hi

    The above gien MDX will improve the performance but there is also another way to improve based on calculations at ETL

    Try this at your ETL load and then use a regular measure with usage as SUM

    This will be a calculated columns from source to cube

    CASE WHEN Measures.[Outage Count] IS NOT NULL
              AND [Outage].[Interruption Flag] = "Y"-- with your required value 
              AND Measures.[Customer Outage Count] > 2
         THEN 1
         ELSE NULL
     END AS [CEMID-2 Count] 
    Let me know your results


    Prav

    Friday, August 16, 2013 12:57 AM
  • I will keep that in mind, my issue is that the results would change with that ETL fix.  For example if you slice CEMI-2 with the Cause dimension now, you get the customers with >2 interruptions in a single cause.  Doing the ETL fix would basically give you the count of customers under each cause that have >2 interruptions in all causes.  Depending on user feedback and performance, that may be the backup plan.

    Thanks again!


    Friday, August 16, 2013 12:26 PM
  • Having NonEmpty inside SUM is going to slowdown the performance, try to remove the NonEmpty and see if it improves the performance
    Friday, August 16, 2013 6:15 PM
  • Hi Tha_Tyrant,

    Please also take a look at the following article:
    How do I troubleshoot the slow MDX query performance?http://social.technet.microsoft.com/Forums/sqlserver/en-US/f1f57e7b-eced-4009-b635-3ebb1d7fa5b0/how-do-i-troubleshoot-the-slow-mdx-query-performance

    It will give you some hints or ideas to troubleshoot MDX performance issue.

    If you have any feedback on our support, please click here.

    Best Regards,


    Elvis Long
    TechNet Community Support

    Monday, August 19, 2013 6:28 AM
    Moderator
  • Thanks everyone, I will try each of these things.  Hopefully the full load of data will arrive soon so I can really see the difference in the different methods.

    -Ken

    Monday, August 19, 2013 1:16 PM
  • Hi Ken,

    How about this issue? Please let us know how things go.

    Thanks,


    Elvis Long
    TechNet Community Support

    Wednesday, August 28, 2013 2:20 AM
    Moderator
  • It's pretty slow!  Up to about 275k members in the customer dimension, expect to be in the  700k-1M range before too long.  I think the calc split by a simple attribute with 10 members takes about 30-60 seconds.  I expect that will get worse once we are up to 700k dimension members.

    So the good news is that it does run and it doesn't crash.  But I'd like to see it in the 5 second range so I need more ideas on improving the performance of this calc.

    Here's another thread I read and tried to help on.. very similar issue:

    http://social.msdn.microsoft.com/Forums/en-US/sqlanalysisservices/thread/9d8e0f17-e399-4e6f-93f5-8b72c0e3aa2f/#c1afbb98-2089-418b-8b20-ee8bb5b1109e

    Monday, October 21, 2013 8:30 PM
  • Ok, let me revise that.  Must have been load on the server and also I was using an old less-performant version of the calc.  10 seconds at present is where I am.  So I'd imagine I can expect 30+ seconds once the database is fully loaded.  So still hoping for some ideas to speed it up.

    Thanks!!

    Ken

    Monday, October 21, 2013 9:12 PM
  • Hi

    Good to see a feedback,

    Can you provide your tried queries. I think, my first answer with second option will be a winner.

    Let me also know your flexibility to push the logic to ETL i.e. most of this calculation at Leaves' level so would be great to go at ETL layer rather than at aggregate layer.


    Prav

    Tuesday, October 22, 2013 5:14 AM
  • I am totally flexible with moving logic to the ETL layer.  However the nature of a distinct count means that moving it to the ETL layer will freeze what dimensions you do a distinct count across.  If you see the other thread I linked to, it's me and another guy talking about the same issue in his cube, and the feasibility of moving some portion of the work to the ETL.  We didn't come up with anything.

    Current calc is in line with your #2 suggestion:

    [Measures].[CEMI-2 Customer Count]=IIF(([Outage].[Interruption Flag], [Measures].[Customer Outage Count] ) > 2, 1, NULL);
    SCOPE ( [Customer].[Account Code].[All], [Measures].[CEMI-2] ) ; This = Sum ( 
        NonEmpty (
          {
            [Customer].[Account Code].[All].CHILDREN - [Customer].[Account Code].[Unknown]
          },
          Measures.[Outage Count]
        ),
        [Measures].[CEMI-2 Customer Count]
     ) ; format_string ( This ) = '#,##0;(#,##0)' ; END SCOPE ;

    Update: OK Wow, I retried the suggestion of removing the nonempty as mentioned by Sha and that greatly improve performance.  Clearing the cache with XMLA and getting a response time of around 1-2s now.  I guess between the two of you the problem is solved!  I'm going to keep this thread open for another week or so until I get my motherload of data, then I'll mark you both as answers.

    Thanks!!

    Tuesday, October 22, 2013 12:08 PM
  • Hi

    Great, you got solution and let me know for any MDX help!


    Prav

    Tuesday, October 22, 2013 9:49 PM