none
Accessing measure from CubeB in CubeA calculated member + Multiple Cube v/s Single Cube approach

    Question

  • 1) Say a SSAS solution you design has multiple cubes within the same SSAS DB. Can you create a perspective with members from those different cubes (though deployed within the same DB)?

    2) Is it possible to reference the measure from another cube in the current cube's calculated member script?


    Cheers,
    Jason
    P.S. : Please click the 'Mark as Answer' button if a post solves your problem! :)

    Some Random Thoughts

     

    Monday, June 21, 2010 6:12 PM
    Moderator

Answers

  • Hi Jason,

    To answer your questions:

    1) No. If you need to do this, then you should be creating a single cube with multiple measure groups.

    2) Yes, you can do this with the LookUpCube function but you should never ever use it - it can lead to really bad performance. As with (1), you really need to create a single cube with multiple measure groups.

    I know we both answered a thread on this subject last week (http://social.msdn.microsoft.com/Forums/en/sqlanalysisservices/thread/24eab6fc-e17d-43a3-8f07-8746222d305d) but I wanted to pick up on your statement about the multiple cube approach being a best practice - this is only true if there really are no common dimensions and you never want to analyse the measures in the same query. Using the LookUpCube function to get measure values from other cubes is not a viable alternative to creating a single cube.

    Chris


    Blog: http://cwebbbi.spaces.live.com Consultancy: http://www.crossjoin.co.uk/
    Tuesday, June 22, 2010 9:48 AM
    Moderator
  • I think the key issue being discussed here is: should I have one cube with lots of measure groups or lots of cubes with fewer measure groups.

    The answer is to be smart about the decision. As http://technet.microsoft.com/en-us/library/cc966399.aspx mentions:

    "Avoid including unrelated measure groups in the same cube

    Having many measure groups in a cube can adversely affect the query performance of the cube, even for queries that only use a single measure group.  If your DSV contains many potential fact tables and they will not usually be used together, consider creating multiple smaller cubes containing topic areas that are used together.  Then, you can enable occasional cross-topic querying by creating linked measure groups that join together all measure groups in a special cross-topic cube."

    Really what this note is saying is that if you keep adding lots of dimensions to a cube, then the virtual space of the cube grows -- it does not add to the storage cost, but it does hurt formula engine performance in some scenarios because the cell coordinates are based on the number of attributes in the cube space. Increasing the number of attributes in the cube space will start costing performance in lots of small ways and result in performance regressions. Adding lots of unrelated measure groups would result in you adding lots unrelated dimensions to the cube space and cause a performance slowdown -- if you had 10 measure groups and they all shared lots of common dimensions, then one cube makes the most sense.

    This also doesn't mean that its a good idea to create 10 small cubes and then do LookupCube -- because LookupCube is also slow. The recommendation says that you should create 10 small cubes and use those small cubes for the basic scenarios and then create an 11th cube that has linked measure groups to some of the 10 small cubes and it will now give you global access across the cube data. This cube will pay the performance price -- but at least most of your other queries will be fast if you can have your users/reports use the smaller cubes most of the time.

    If it turns out that you don't need to query the 10 small cubes independently, then you should just create one big cube.

    Notice that none of this is about the storage cost. As Chris has described earlier, the storage cost is not dependent on one big cube versus lots of smaller cubes. It is dependent on the granularity of the measure group data.

    HTH,

    Akshai


    -- This posting is provided "AS IS" with no warranties, and confers no rights
    Wednesday, June 23, 2010 11:42 PM
  • As I already explained in my earlier post, it affects the sizes of data structures inside the formula engine that are based on the number of attributes in the cube space. When those data structures get larger, there is an incremental cost that can add up (depending on your calculations and query patterns).

    For example, you see the Query Subcube Verbose events in Profiler -- they show you the subcubes that are used for accessing measure groups. There are similar subcubes that are used for calculating formulas and cell coordinates -- all those subcubes get wider and wider as you start adding more attributes into the cube. The cost of accessing and indexing those data structures is what we're talking about here. If adding new measure groups doesn't require adding new attributes/dimensions, then there is no problem...

    We had measured the difference before 2005 shipped for some real customer cubes and found there there was a noticeable performance improvement to split up into multiple cubes. Which you can do if you don't need to analyze those cubes together. If you do need to analyze the data from the different measure groups together, then yes -- you're better off having just one cube (or using the linked measure group option) than doing LookupCube.

    It boils down to:

    - What are my modeling needs? Do I truly need to analyze these fact tables together?

    - What are my performance needs? Is the performance of a single large cube acceptable? If yes, then you could forget this discussion and just go with the simpler one cube option...

    - What are my manageability needs? Like Chris mentioned, there is a cost to doing linked measure groups -- but there is also a cost to doing a single big cube. E.g. if you need to alter one cube, then it will cause user sessions connected to that cube to potentially get invalidated. Other cubes would not be affected if you had a multiple cube solution.

    Take it altogether -- it's not a black and white answer.

     

    Thanks,

    Akshai


    -- This posting is provided "AS IS" with no warranties, and confers no rights
    Thursday, June 24, 2010 9:07 PM
  • Although I don't want to stoke this controversy too much more, and obviously I can't dispute anything Akshai has said from a technical standpoint, there's one further comment I'd like to make. When you have two fact tables with a number of common dimensions and you ask your customer whether they will ever need to query measures from those two fact tables in the same query, or write calculations that span the two fact tables, should you believe them if they say no?

    SSAS projects have a tendency to grow in complexity over time, and cubes that start out simple in a first release often grow lots of new functionality as time goes on - and the more successful the project, the quicker things get complicated. As soon as your users see what's possible with SSAS they will start to have new, more ambitious ideas about the kind of analysis they want to do with their data, and it's very likely that they will change their mind about cross-measure-group queries and calculations and realise they do want to do this. If you started out on the multiple cube approach and then this happens you will have no choice but to use linked measure groups, and as we've seen this can make maintenance difficult; using the single-cube approach from the start means you won't have this problem. So, really you need to look at the data in each fact table, consider your current and future requirements, listen to the customer, weigh up the risks and make your own decision about which approach to use and not leave it in the customer's hands.

    Equally, when you ask your customer about performance they will always answer that they want the best possible performance! What you need to do is deliver query performance that is safely within their acceptable boundaries, and design your cube so that even if you have new calculations and increased data volumes you are still likely to deliver acceptable performance. I would say that even if the multiple cube approach performs 10% faster than a single cube with your initial design, so long as single-cube performance is safely within your query performance boundaries then it might still be preferable for the other reasons detailed in this thread.

    Ultimately there are good arguments on both sides, and it comes down to a matter of personal preference.

    Chris


    Blog: http://cwebbbi.spaces.live.com Consultancy: http://www.crossjoin.co.uk/
    Friday, June 25, 2010 9:36 AM
    Moderator
  • And Chris is absolutely right about the risks here -- it is very true that requirements change over time. I also mentioned this point in my earlier posting:

    "- What are my performance needs? Is the performance of a single large cube acceptable? If yes, then you could forget this discussion and just go with the simpler one cube option..."

    All said and done, the goal here is to be aware of the performance and manageability issues and make an educated decision based on your business needs and risk awareness. Don't blindly go with the one monster cube just because you can, and don't go splitting everything up into small cubes because they will perform "better". I like intentional decision making based on education and analysis -- and in the end, there is no substitue for testing :)

     

    Thanks,

    Akshai


    -- This posting is provided "AS IS" with no warranties, and confers no rights
    Friday, June 25, 2010 4:45 PM

All replies

  • Hello Jason,

    1) no, I think this is not possible. But may be someone knows a hack.

    2) yes, please look here: http://msdn.microsoft.com/en-us/library/ms144720.aspx

     

    HTH

    Jörg

     

    Monday, June 21, 2010 8:57 PM
  • Hi,

    You can do both of the things you're asking for via Linked Measuregroups; http://msdn.microsoft.com/en-us/library/ms174899.aspx (http://msdn.microsoft.com/en-us/library/ms175648.aspx)

     

    Tuesday, June 22, 2010 6:28 AM
  • Is it possible to explain a bit more in detail about this? I am seeing some light but not everything is clear. 
    Cheers,
    Jason
    P.S. : Please click the 'Mark as Answer' button if a post solves your problem! :)

    Some Random Thoughts

    Tuesday, June 22, 2010 8:50 AM
    Moderator
  • Hi Jason,

    To answer your questions:

    1) No. If you need to do this, then you should be creating a single cube with multiple measure groups.

    2) Yes, you can do this with the LookUpCube function but you should never ever use it - it can lead to really bad performance. As with (1), you really need to create a single cube with multiple measure groups.

    I know we both answered a thread on this subject last week (http://social.msdn.microsoft.com/Forums/en/sqlanalysisservices/thread/24eab6fc-e17d-43a3-8f07-8746222d305d) but I wanted to pick up on your statement about the multiple cube approach being a best practice - this is only true if there really are no common dimensions and you never want to analyse the measures in the same query. Using the LookUpCube function to get measure values from other cubes is not a viable alternative to creating a single cube.

    Chris


    Blog: http://cwebbbi.spaces.live.com Consultancy: http://www.crossjoin.co.uk/
    Tuesday, June 22, 2010 9:48 AM
    Moderator
  • Hi Chris,

     

    Thanks for the answers :)

     

    As for the second part, I completely agree with you that there should be no common measures (and this question was just to check some functionality, nowhere related to the previous question). But I was wondering about the dimension part. Suppose you have 2 fact tables, sharing the same 3 dimensions. If you implement it as 2 cubes in the same database, wouldn't the performance be the same?

    Now coming to a more practical situation, lets say we have 2 facts and 4 dimensions. Fact A is related to Dim 1,2,3 and Fact B is related to Dim 2,3,4. In this case, wouldnt it be better to have 2 cubes with 3 dimensions each rather than 1 cube with 4 dimensions? I had actually done a small PoC last year and had found a performance gain of 10-15% in the multiple cube approach but then I admit the conditions were not strict and the data was also not that huge. 

    And I dont think there would be a case where none of the dimensions would be common. For eg, time would be there in almost all the measure groups and hence would be a common dimension.

     

    I know this section is a bit cryptic and has never been answered openly, so I would definitely love to have your views on this as you have got a lot of real time experience dealing with performance tuning.


    Cheers,
    Jason
    P.S. : Please click the 'Mark as Answer' button if a post solves your problem! :)

    Some Random Thoughts

    Tuesday, June 22, 2010 10:52 AM
    Moderator
  • It's an interesting question. In SSAS2005 there were reports that having a large number of measure groups in a single cube, with very few common dimensions, could perform worse than a single cube. I tested this several times and could never reproduce it and I'm told the underlying problem was fixed in SSAS2008.

    In your example where Dims 2 and 3 are shared between Facts A and B, I would say therefore that you're not going to see any performance degradation from putting both measure groups in the same cube, and it gives you the option of being able to query measures from both measure groups in the same query or calculation if you need to. As I said last week there are some valid reasons for using the multiple cube approach but in general I would always start off assuming you are going to build a single cube, and only move to the multiple cube approach if you need to.

    Chris


    Blog: http://cwebbbi.spaces.live.com Consultancy: http://www.crossjoin.co.uk/
    Tuesday, June 22, 2010 11:03 AM
    Moderator
  • I agree with Chris and I believe that the following is important in his note

    " it gives you the option of being able to query measures from both measure groups in the same query or calculation if you need to "

    In most of the real time scenarios, when you have common dimensions, there will be a need to reference both measure groups in a query or calculation which will drive you towards single cube/multiple measure groups approach.

    I would suggest a multiple cube approach when you have no requirement to combine 2 measure groups AND the number of common dimensions is low.

    Tuesday, June 22, 2010 11:20 AM
    Answerer
  • Hi Jason and Chris,

     

          I had previously worked on a project which was similar to Jason's requirement and we had gone for multiple cube approach. Since the requirement kept on changing, it was difficult to handle with the multiple cube approach. We had to get into the concept of linked measures and all and administering them was bit difficult. However i felt the same could have been easily achieved using a single cube appraoch. I also tested the performance w.r.t multiple cube and sigle cube approach on the test sever. However i could not find a major issues in processing performance or query performance. Both were almost the same. The backend database size was around 60Gig. But i always suggest you to go for single cube approach as suggested by chris.

     

    HTH,

    Ram

    Tuesday, June 22, 2010 11:22 AM
  • Hi Prakash,

    Even my viewpoint is the same -  a multiple cube approach when you have no requirement to compare the measures of 2 measure groups AND the number of common dimensions is LOW. But if I am right in understanding Chris' viewpoint, he says that the number of common dimensions should be 0 for performance gain, and I was trying to think of a valid scenario where atleast one dimension would not be common.

     

    Chris,

    I am trying to dig into your vast experience and hence please ignore if my questions are silly. Lets take the case of a 3 dimension example so that we can visualize the cube - Dim1, Dim2, Dim3. Suppose each of the dimension has 2 attributes and each attribute has 4 members. Hence no of cells in the cube would be 2(number of attributes)^3(no of dimensions)^4(no of members)=4096 for each measure group. 

    Now if we have 2 measure groups sharing all the 3 dimensions, each MG would be having 4096 cells and it doesnt make a difference if you make one cube or multiple cube, as the total number of cells would be 4096*2=8192

    Now if we have 2 MGs with MG1 having Dim1 and Dim2, and MG2 having Dim2 and Dim3. If you make a single cube, still the number of cells would be 4096*2 because those cells need to be preserved even if the dimensions are unrelated, because if you need to write a scope statement for those dimensions such that it will display a particular value (suppose, 99999 wherever it is unrelated), you need to store the metadata into these cells.

    But if you are making 2 cubes, cube1 will be containing just 2(number of attributes)^2(no of dimensions)^4(no of members)=256 and cube 2 also will be having just 256 cells. 

     

    So if that is the case, wouldn't it imply that the cube size would increase on having unnecessary dimensions (or attributes) tagging to it? Now you can always argue that the unrelated dimensions would not consume size because essentially there is no data stored into it. But then, wouldnt this result in having a large number of empty cells, and as far as I know, a large number of empty cells in your dimension data would slow down your queries considerably, especially when used in a cross join with other dimensions (which is why we use NonEmpty() within a NON EMPTY on rows when we query a sparse set resulting from a crossjoin - source : SAL Server Analysis Services Unleashed - page 172). In any case, I am unable to relate why better performance would not be guaranteed when there is a difference in dimensions.

     

    (P.S. : The number of cells that I have calculated may not be entirely right, but should be used as an indicator rather than a precise value)  

     


    Cheers,
    Jason
    P.S. : Please click the 'Mark as Answer' button if a post solves your problem! :)

    Some Random Thoughts

     

    Tuesday, June 22, 2010 3:09 PM
    Moderator
  • My assumptions why the following ways help performance - 

    1) Reducing the number of unnecessary attributes or dimensions increase cube performance because cube size decreases (based on the number of calculations I put above) and also, there are better chances of the right aggregations being made when the number of attributes are lesser.

    Quoting from SQL Server 2008 Analysis Services Unleashed - "Your cube is made up of the collection of points of both the theoretical and logical spaces. Each point in the cube's space is called a cell.

    Therefore a cell in the cube can fall into one of these spaces. The cell in the fact space is associated with an actual sale of a product to a customer. If the sale wasnt made (that is, a potential sale), our cell is just a theoretical point in the cube (a theoretical cell). We dont have any data in this cell. It is an empty cell with a value of NULL."

    2) Having a large number of measure groups decreases the performance because when the number of measure groups increases, the indexes used to access the MGs become inefficient and hence take more time.

     

    Please correct me wherever I am wrong as I have just started diving deep into SSAS.


    Cheers,
    Jason
    P.S. : Please click the 'Mark as Answer' button if a post solves your problem! :)

    Some Random Thoughts

    Tuesday, June 22, 2010 3:36 PM
    Moderator
  • Hi Jason,

    I think you're making a fundamental mistake here in assuming that the size of the cube is related to the theoretical number of cells that it could contain (ie the product of the number of members on the key attribute of each dimension). Analysis Services does not store empty cells - it does not behave like a multidimensional array - so all your calculations about the total number of cells in a cube are wrong.

    The only reference to this I could find online was this old newsgroup thread:

    http://www.sqlmonster.com/Uwe/Forum.aspx/sql-server-olap/4008/Do-missing-values-get-stored

    But it does a reasonably good job of explaining the issues.

    Regards,

    Chris


    Blog: http://cwebbbi.spaces.live.com Consultancy: http://www.crossjoin.co.uk/
    Tuesday, June 22, 2010 4:49 PM
    Moderator
  • Hi Chris,

     

    Presuming that you are talking about the empty cells of the theoretical spaces (because you can store NULL values in the fact spaces. In the data store, Analysis Services adds one bit to the end of record for each nullable field and uses this bit to indicate that the field contains a NULL value), I would like to ask a question based on a quote from the link that you sent - "Analysis Services MOLAP storage is based on cells; not measures. We do not store empty cells; nor aggregates where the combination of members is empty. In your case, there are no facts or aggregates with this particular product back through time -- so we don't store it."  

    What exactly do we mean by combination of members? My interpretation was that for a particular member in an axis, if it has values for the ALL level of all the other axes put together, then the combination of members is not empty. As we know, Measures is an axis in itself

    Lets take the example which is given above - MG1 related to Dim1 and Dim2 & MG2 related to Dim2 and Dim3. Now if we look at the tuple (M1,Dim1.A1,Dim2.A2,Dim3.A3) there is a value which would be equal to (M1, Dim1.A1, Dim2.A2). Now I am presuming that this value would be stored in both the cells (unless you put the IgnoreUnrelatedDimension property to false for the measure group). If it indeed is stored like this in SSAS, the number of cells in a single cube is going to be more than the multiple cube approach. If it is not stored like this, I think I am starting to see light with your help.

    I do agree that if there is no data for  (M1,Dim1.A1,Dim2.A2), then (M1,Dim1.A1,Dim2.A2,Dim3.A3) would also be empty and hence would not be stored.


    Cheers,
    Jason
    P.S. : Please click the 'Mark as Answer' button if a post solves your problem! :)

    Some Random Thoughts

    Tuesday, June 22, 2010 5:44 PM
    Moderator
  • Why isn't there an official explanation from Microsoft regarding this? As far as I know, this topic has been running for ages but the answer always seems to be on a high level. 
    Cheers,
    Jason
    P.S. : Please click the 'Mark as Answer' button if a post solves your problem! :)

    Some Random Thoughts

    Wednesday, June 23, 2010 5:35 AM
    Moderator
  • Let me see if I can get someone from the dev team, who really understands this area, to answer.

    My understanding is that Analysis Services will store only the following data on disk:

    • Data at the lowest possible level of granularity. This is usually the granularity of the fact table, but it's possible that the lowest level of granularity in the cube is higher than that of the fact table.
    • Aggregations, which are pre-calculated slices of data at a specific granularity (rather like the results you get from a SQL GROUP BY query), and indexes.
    • Dimension data, ie the members on every hierarchy of your SSAS dimensions

    The data above is also compressed during processing; it will also hold cached values in memory. Therefore the 'theoretical' space of the cube may be large, and this is the space that can be queried, but the size of the theoretical space of the cube often bears little relation to the size of the cube on disk.

    In your example, although you can query for the tuple (M1,Dim1.A1,Dim2.A2,Dim3.A3) and you can place a value there with MDX calculations, the storage engine knows that that tuple cannot exist in the underlying data because there is no relationship between MG1 and Dim3.A3; it cannot ever have an impact of the size of the data on disk because it can't ever be stored. Just because values appear when you change IgnoreUnrelatedDimensions does not mean they are physically present in the cube and affect the size of the cube.

    Chris

     


    Blog: http://cwebbbi.spaces.live.com Consultancy: http://www.crossjoin.co.uk/
    Wednesday, June 23, 2010 8:56 AM
    Moderator
  • I would really appreciate that Chris :)

    I know I am asking a lot of stupid doubts, but think it's better to let it out and become a fool for the moment rather than forever.

     

    And could you please also tell in what scenarios a multiple cube approach would be used? because based on the above explanations, it looks like a single cube is better for all times.


    Cheers,
    Jason
    P.S. : Please click the 'Mark as Answer' button if a post solves your problem! :)

    Some Random Thoughts

    Wednesday, June 23, 2010 3:49 PM
    Moderator
  • As I said, in my opinion a multiple cube approach is only useful very rarely, for example in the following scenarios:

    • When you are using Standard Edition but would like to use Perspectives (which are Enterprise Edition only). In that case, you would need to make different cubes available to different groups of users.
    • Similarly, when you need to apply security to restrict access to measures in a given measure group. Dimension security can be applied to the measures dimension but it can be a pain to administer; it's easier to have separate cubes and then grant/deny access to them.

    Chris

     


    Blog: http://cwebbbi.spaces.live.com Consultancy: http://www.crossjoin.co.uk/
    Wednesday, June 23, 2010 4:28 PM
    Moderator
  • Chris is absolutely right.

    Analysis Services does not store on disk any data that is not ether direct result of query it sends to RDBMS ( fact data/dimension data) or some derivative of it ( aggregations data).

    You can easily see what kind of queries Analyis Services sends to RDBMS. It does do some post processing on top of it: eliminating duplicates, catching referential integrity issues, converting NULLs.. etc. And then it builds aggregations, that is derivative of that data. Any particular aggregation would have fewer rows stored than you have in partition ( I can think of extreme case of distinct count where aggregation might have to store exactly the same number of rows than partition)
    There is very simple way to discover the count of rows stored by Analysis Services - you could use Discover_Partition_Stat for that.

    As far as Single cube vs multiple cubes.

    Chris, Alberto, and Marco are having good discussion about that in their book http://www.amazon.com/Expert-Development-Microsoft-Analysis-Services/dp/1847197221  Main points from there Chris has mentioned in here already.
    It is a great book I would suggest you pick it up if you are looking for design advice in particular situation, which looks like the case here.

    Edward Melomed


    http://www.sqljunkies.com/WebLog/edwardm/
    Wednesday, June 23, 2010 10:24 PM
    Owner
  • I think the key issue being discussed here is: should I have one cube with lots of measure groups or lots of cubes with fewer measure groups.

    The answer is to be smart about the decision. As http://technet.microsoft.com/en-us/library/cc966399.aspx mentions:

    "Avoid including unrelated measure groups in the same cube

    Having many measure groups in a cube can adversely affect the query performance of the cube, even for queries that only use a single measure group.  If your DSV contains many potential fact tables and they will not usually be used together, consider creating multiple smaller cubes containing topic areas that are used together.  Then, you can enable occasional cross-topic querying by creating linked measure groups that join together all measure groups in a special cross-topic cube."

    Really what this note is saying is that if you keep adding lots of dimensions to a cube, then the virtual space of the cube grows -- it does not add to the storage cost, but it does hurt formula engine performance in some scenarios because the cell coordinates are based on the number of attributes in the cube space. Increasing the number of attributes in the cube space will start costing performance in lots of small ways and result in performance regressions. Adding lots of unrelated measure groups would result in you adding lots unrelated dimensions to the cube space and cause a performance slowdown -- if you had 10 measure groups and they all shared lots of common dimensions, then one cube makes the most sense.

    This also doesn't mean that its a good idea to create 10 small cubes and then do LookupCube -- because LookupCube is also slow. The recommendation says that you should create 10 small cubes and use those small cubes for the basic scenarios and then create an 11th cube that has linked measure groups to some of the 10 small cubes and it will now give you global access across the cube data. This cube will pay the performance price -- but at least most of your other queries will be fast if you can have your users/reports use the smaller cubes most of the time.

    If it turns out that you don't need to query the 10 small cubes independently, then you should just create one big cube.

    Notice that none of this is about the storage cost. As Chris has described earlier, the storage cost is not dependent on one big cube versus lots of smaller cubes. It is dependent on the granularity of the measure group data.

    HTH,

    Akshai


    -- This posting is provided "AS IS" with no warranties, and confers no rights
    Wednesday, June 23, 2010 11:42 PM
  • Great, I do have a copy of that book and I am trying to find the page number. So my final question, just to have a straight answer - 

    if you have n MG sharing Dim1, Dim2 and m MGs sharing Dim2 and Dim3, would it be better to have one cube having all Dim1, Dim2, Dim3 and m+n MGs or 2 cubes with n MGs and Dim1, Dim2 and m MGs with Dim2,Dim3?

    And my goal is just a high performing system, in the end users should be able to query out asap. Storage is not an issue nowadays (but it shouldnt be that I am getting double the size if it was multiple cubes). And could you validate Akshai's reply also when you reply?

     

    And thanks a lot guys, this discussion has indeed been a huge learning for me. Really appreciate all of you for your time.

     

     

     


    Cheers,
    Jason
    P.S. : Please click the 'Mark as Answer' button if a post solves your problem! :)

    Some Random Thoughts

    Thursday, June 24, 2010 3:10 AM
    Moderator
  • Take a look at pages 94-95 for the discussion of this problem.

    For your specific example, I would say definitely go with the single cube approach.

    Chris


    Blog: http://cwebbbi.spaces.live.com Consultancy: http://www.crossjoin.co.uk/
    Thursday, June 24, 2010 8:36 AM
    Moderator
  • To implement this you need to first create a linked cube. For eample you have two cubes Cube A & Cube B and you want to include measure of Cube B in Cube A. Then go through New Linked Object wizard from Cube Structure tab of BIDS and include measure from Cube A to Cube B. Now you are able to create calculated measure or perspective using the measures of Cube B.

    - Pushkar Bhushan

    Thursday, June 24, 2010 9:11 AM
  • Since a few people have mentioned linked measure groups as a possible strategy, I thought I'd mention a few reasons why I don't like using them (see also pages 101-102 of "Expert Cube Development", and Ram's post above):

    • They can be a real pain from a maintenance point of view. If you ever alter the underlying measure group, you need to delete and recreate the linked measure group.
    • If you use linked measure groups you end up duplicating MDX calculations between the original measure group and the cube that uses the linked measure group, which is another maintenance headache if you need to alter these calculations.
    • There's a slight query performance overhead with using linked measure groups.

    For these reasons I prefer the single cube approach over the use of multiple cubes and linked measure groups.

    Chris


    Blog: http://cwebbbi.spaces.live.com Consultancy: http://www.crossjoin.co.uk/
    Thursday, June 24, 2010 10:04 AM
    Moderator
  • Having worked with a multiple cube approach i would always suggest you to go for a single cube as chris pointed out. I don't see anything alarming in performance w.r.t single cube Vs multiple cube appraoch.  The only advantage i see using linked MG is that your fact gets processed only once when it is linked with other cubes.  Other than these maintainance is a pain. This is the major disadvantage. For your query i would go with Chris "single cube approach".

     

    HTH,

    Ram

    Thursday, June 24, 2010 10:31 AM
  • A big thanks to all of you wonderful guys at this forum :)
    Cheers,
    Jason
    P.S. : Please click the 'Mark as Answer' button if a post solves your problem! :)

    Some Random Thoughts

    Thursday, June 24, 2010 2:03 PM
    Moderator
  • "if you have n MG sharing Dim1, Dim2 and m MGs sharing Dim2 and Dim3, would it be better to have one cube having all Dim1, Dim2, Dim3 and m+n MGs or 2 cubes with n MGs and Dim1, Dim2 and m MGs with Dim2,Dim3?"

    The answer is: it depends on whether you need to analyze the data from m and n measure groups together. If you do, then put them in one cube. But if you very definitely don't, then DON'T put them in one cube.

     

    I've seen cases where folks blindly follow the "one universal UDM to hold all my data" rule, without considering that simpler, smaller cubes are going to perform better and also be easier for end users. That's why the advice is there to split the models into multiple cubes where appropriate.

     

    The fact of the matter is that expanding the dimensionality of a cube will hurt performance to some degree. Like you mentioned yourself:

    "I had actually done a small PoC last year and had found a performance gain of 10-15% in the multiple cube approach but then I admit the conditions were not strict and the data was also not that huge."

     

    And this is exactly what you can expect (although the range of performance degradation will definitely vary) -- the multiple cube approach can give you better performance, but it makes analyzing the data from the different cubes more challenging like Chris has mentioned. You have to pick what is acceptable and required in your scenario:

    - Do I need to analyze the data from the different measure groups together? No -- okay, build separate cubes

    - If yes, then how often do my reports go against the different measure groups together? Is it worth having separate cubes and managing them so that I can get better performance because that's my most common usage pattern? If yes, then do the separate cubes + linked measure groups. Otherwise, go with one big cube...

     

    Thanks,

    Akshai


    -- This posting is provided "AS IS" with no warranties, and confers no rights
    Thursday, June 24, 2010 4:00 PM
  • Akshai, you are raising the controversy again :)

    Do you have any proof why it would be faster? Because the evidence put forward by Chris and Edward is pretty compelling that a single cube approach is going to be faster even IF you dont need to analyse them together


    Cheers,
    Jason
    P.S. : Please click the 'Mark as Answer' button if a post solves your problem! :)

    Some Random Thoughts

    Thursday, June 24, 2010 4:43 PM
    Moderator
  • As I already explained in my earlier post, it affects the sizes of data structures inside the formula engine that are based on the number of attributes in the cube space. When those data structures get larger, there is an incremental cost that can add up (depending on your calculations and query patterns).

    For example, you see the Query Subcube Verbose events in Profiler -- they show you the subcubes that are used for accessing measure groups. There are similar subcubes that are used for calculating formulas and cell coordinates -- all those subcubes get wider and wider as you start adding more attributes into the cube. The cost of accessing and indexing those data structures is what we're talking about here. If adding new measure groups doesn't require adding new attributes/dimensions, then there is no problem...

    We had measured the difference before 2005 shipped for some real customer cubes and found there there was a noticeable performance improvement to split up into multiple cubes. Which you can do if you don't need to analyze those cubes together. If you do need to analyze the data from the different measure groups together, then yes -- you're better off having just one cube (or using the linked measure group option) than doing LookupCube.

    It boils down to:

    - What are my modeling needs? Do I truly need to analyze these fact tables together?

    - What are my performance needs? Is the performance of a single large cube acceptable? If yes, then you could forget this discussion and just go with the simpler one cube option...

    - What are my manageability needs? Like Chris mentioned, there is a cost to doing linked measure groups -- but there is also a cost to doing a single big cube. E.g. if you need to alter one cube, then it will cause user sessions connected to that cube to potentially get invalidated. Other cubes would not be affected if you had a multiple cube solution.

    Take it altogether -- it's not a black and white answer.

     

    Thanks,

    Akshai


    -- This posting is provided "AS IS" with no warranties, and confers no rights
    Thursday, June 24, 2010 9:07 PM
  • - What are my modeling needs? Do I truly need to analyze these fact tables together?

    Jason: NO!

    - What are my performance needs? Is the performance of a single large cube acceptable? If yes, then you could forget this discussion and just go with the simpler one cube option...

    Jason: I need the best possible performing system, client is very particular on it.

    - What are my manageability needs? Like Chris mentioned, there is a cost to doing linked measure groups -- but there is also a cost to doing a single big cube. E.g. if you need to alter one cube, then it will cause user sessions connected to that cube to potentially get invalidated. Other cubes would not be affected if you had a multiple cube solution.

    Jason: I dont need to analyze those facts together once I split it, so need of this.

     

    If the above points are true (and this is exactly what I have been trying to say in this forum), you are telling that a multiple cube will have better performance (which is also what my PoC seemed to indicate) and those also makes complete sense to me, which is why I am unmarking the answers and waiting for some more responses.

     


    Cheers,
    Jason
    P.S. : Please click the 'Mark as Answer' button if a post solves your problem! :)

    Some Random Thoughts

    Friday, June 25, 2010 2:11 AM
    Moderator
  • Performance of the cube will never impacted iff you create a single cube if you are using a good hardaware and Infrastructure design. Even a good UDM Design suggest for a sigle cube. Single cube provide a better flexibility than creating multiple cube. So try to change your dimension model to merge your cube. you can able to achieve every thing in single cube. SSAS is a very powerfull tool and handle terabyte of data and more than 100 of measures without any performance impact. So, it is incorrect to say single cube will downgrade performance.

    You can go for multiple cube only in the case , when any fact and it's related dimension has no relationship with any other fact or dimension of the  dimension model.

    There is no other option other than linked measure in your scenario.

    Thanks,

    Pushkar

    Friday, June 25, 2010 4:15 AM
  • "SSAS is a very powerfull tool and handle terabyte of data and more than 100 of measures without any performance impact."

    Pushkar - the ability to handle a design doesn't mean that it will perform as well or be as successful as a different design on the same platform. If one huge cube is going to perform worse than 5 small cubes -- even if it is only by 10% -- then that's what this discussion is about.

    If you have attempted to test this out then you might find evidence one way or the other to back up your statements -- note that calculations is where you will typically notice the most difference. And I do have some small knowledge of the internals of the Analysis Services engine to give a theoretical validation to my claims...

    By the way, Chris' book does describe this topic -- and as he has mentioned on pages 94-95, Analysis Services 2008 should reduce the impact of the increase in dimensionality on performance as compared with Analysis Services 2005. I just don't agree that block computation will lower the price to zero -- in my opinion, performance will be better with AS 2008, but there will still be a price to be paid in many scenarios.

    And as Chris has mentioned, this is definitely a somewhat controversial topic -- but I personally do like the credo that it is better to build smaller, simpler cubes than universal, complex cubes. Even if the performance gains were negligible, I would prefer to build separate cubes for better manageability and independence.

     

    Thanks,

    Akshai


    -- This posting is provided "AS IS" with no warranties, and confers no rights
    Friday, June 25, 2010 5:51 AM
  • Although I don't want to stoke this controversy too much more, and obviously I can't dispute anything Akshai has said from a technical standpoint, there's one further comment I'd like to make. When you have two fact tables with a number of common dimensions and you ask your customer whether they will ever need to query measures from those two fact tables in the same query, or write calculations that span the two fact tables, should you believe them if they say no?

    SSAS projects have a tendency to grow in complexity over time, and cubes that start out simple in a first release often grow lots of new functionality as time goes on - and the more successful the project, the quicker things get complicated. As soon as your users see what's possible with SSAS they will start to have new, more ambitious ideas about the kind of analysis they want to do with their data, and it's very likely that they will change their mind about cross-measure-group queries and calculations and realise they do want to do this. If you started out on the multiple cube approach and then this happens you will have no choice but to use linked measure groups, and as we've seen this can make maintenance difficult; using the single-cube approach from the start means you won't have this problem. So, really you need to look at the data in each fact table, consider your current and future requirements, listen to the customer, weigh up the risks and make your own decision about which approach to use and not leave it in the customer's hands.

    Equally, when you ask your customer about performance they will always answer that they want the best possible performance! What you need to do is deliver query performance that is safely within their acceptable boundaries, and design your cube so that even if you have new calculations and increased data volumes you are still likely to deliver acceptable performance. I would say that even if the multiple cube approach performs 10% faster than a single cube with your initial design, so long as single-cube performance is safely within your query performance boundaries then it might still be preferable for the other reasons detailed in this thread.

    Ultimately there are good arguments on both sides, and it comes down to a matter of personal preference.

    Chris


    Blog: http://cwebbbi.spaces.live.com Consultancy: http://www.crossjoin.co.uk/
    Friday, June 25, 2010 9:36 AM
    Moderator
  • And Chris is absolutely right about the risks here -- it is very true that requirements change over time. I also mentioned this point in my earlier posting:

    "- What are my performance needs? Is the performance of a single large cube acceptable? If yes, then you could forget this discussion and just go with the simpler one cube option..."

    All said and done, the goal here is to be aware of the performance and manageability issues and make an educated decision based on your business needs and risk awareness. Don't blindly go with the one monster cube just because you can, and don't go splitting everything up into small cubes because they will perform "better". I like intentional decision making based on education and analysis -- and in the end, there is no substitue for testing :)

     

    Thanks,

    Akshai


    -- This posting is provided "AS IS" with no warranties, and confers no rights
    Friday, June 25, 2010 4:45 PM
  • I deal with terabyte sized SSAS db's and always use the multiple cube approach. We have a database with ten measure groups so we go with ten different cubes instead of a super cube.

    Say nine months down the road a new column is added to a fact table that just one of those measure group references. The business wants that new dimension added to a cube. In the super cube the dimension would be added (thus invalidating all the data in the cube) and all ten measure groups would need to be reprocessed again. In our environment that would mean reprocessing billions of rows of data, getting the dba to throw the data back into a database (we have to archive data every 90 days), ect. It's not exactly a quick process. Using the single cube approach means we wouldn't have to reprocess those nine other measure groups.

    I'm curious as to how AdCenter deals with this situation.

     

     

    Wednesday, October 06, 2010 6:15 PM