Answered by:
How to anonymize all data in a cube - UPDATE CUBE?

Question
-
I have an Analysis Services cube that contains sensitive data. For performance tuning I'd like to take a copy of the cube and replace all data with random values (leaving empty cells empty). This way I could take the cube anywhere and not worry about live data being revealed to anyone, but in all other respects it would behave the same way as the original.
In a SQL Server relational database this would be easy - I could just run something like
UPDATE fact SET datavalue = RAND()
and ALL existing records would have their value randomised, whilst empty cells would stay empty because they wouldn't have a record in the fact table. All in a one line update command.
Is there a simple way to do a similar thing in Analysis Services? Even better would be to update all leaf values and set aggregations so that totals still add up, but I would be quite happy with a cube even where the totals are broken, because I just want to use the cube for performance testing.
Any ideas anyone, please?
Thanks in advance.
Tuesday, November 5, 2013 5:15 PM
Answers
-
You cannot update Analysis Services Databases / Cubes, whether it be a measure or a dimension attribute, directly.
I had read about UPDATE CUBE commands allowing "write-backs" and even supporting setting child elements when a parent element is set (e.g.,http://technet.microsoft.com/en-us/library/ms145568.aspx), so I was hoping I could do something like that for the whole cube, but I wasn't sure of the syntax and how to craft an UPDATE CUBE command to update the whole cube. Is there some limitation on using UPDATE CUBE making it not work for what I'm trying to do?
Data always comes from the DataSource via processing. There is no other way I can think of, except literally hacking the DataFiles.
Does Analysis Services build up its own "fact table" from the underlying data source? If so I could perhaps replace all data in that? If not, then my only option, it would seem, is to replace all of the data in the underlying system, which might be harder to arrange.
Thanks again for your help, and apologies if these are silly questions!
The UPDATE Command is for "Write enabled Cubes", which is a completely different concept though.
You can read more about it here: http://technet.microsoft.com/en-us/library/aa216376%28v=sql.80%29.aspx
A short quote will make it clear: "If you write-enable a cube, client applications can record changes to the cube's data. These changes, known as writeback data, are stored in a table separate from the cube and its underlying data, but they are incorporated into query results as if they are part of the cube data. Write-enabled cubes allow end users to explore scenarios by changing cell values and analyzing the effects of the changes on cube data."
.. totally different concept and not anoynymizing the original data at all.
I don't completely understand the second question. There is one or more fact-tables per cube in the DataSource. And those are specified by you.
To anonymize the data you really have to use the approach from my first answer and anonymize the datawarehouse/datamart first and the re-load/process
I am sorry, but that's the only secure way.
Andreas Wolter | Microsoft Certified Master SQL Server
Blog: www.insidesql.org/blogs/andreaswolter
Web: www.andreas-wolter.com | www.SarpedonQualityLab.com- Marked as answer by Elvis Long Wednesday, November 20, 2013 9:17 AM
Wednesday, November 6, 2013 9:56 AM
All replies
-
You cannot update Analysis Services Databases / Cubes, whether it be a measure or a dimension attribute, directly.
Data always comes from the DataSource via processing. There is no other way I can think of, except literally hacking the DataFiles.
Therefore the way to go is:
- Create a copy of your DataSource (SQL Server Database?)
- Create an unprocessed copy of your Analysis Services Database and Cubes
- Change the Datasource’s data/anonymize it
- Point the Copy of the AS DB/Cubes to the new Database
- Re-Process, effectively loading the anonymized data
Andreas Wolter | Microsoft Certified Master SQL Server
Blog: www.insidesql.org/blogs/andreaswolter
Web: www.andreas-wolter.com | www.SarpedonQualityLab.com- Proposed as answer by Andreas.WolterMicrosoft employee Wednesday, November 20, 2013 9:22 AM
Tuesday, November 5, 2013 11:29 PM -
You cannot update Analysis Services Databases / Cubes, whether it be a measure or a dimension attribute, directly.
I had read about UPDATE CUBE commands allowing "write-backs" and even supporting setting child elements when a parent element is set (e.g.,http://technet.microsoft.com/en-us/library/ms145568.aspx), so I was hoping I could do something like that for the whole cube, but I wasn't sure of the syntax and how to craft an UPDATE CUBE command to update the whole cube. Is there some limitation on using UPDATE CUBE making it not work for what I'm trying to do?
Data always comes from the DataSource via processing. There is no other way I can think of, except literally hacking the DataFiles.
Does Analysis Services build up its own "fact table" from the underlying data source? If so I could perhaps replace all data in that? If not, then my only option, it would seem, is to replace all of the data in the underlying system, which might be harder to arrange.
Thanks again for your help, and apologies if these are silly questions!
- Edited by Bluntek Wednesday, November 6, 2013 9:39 AM Removed spurious blockquote
Wednesday, November 6, 2013 9:38 AM -
You cannot update Analysis Services Databases / Cubes, whether it be a measure or a dimension attribute, directly.
I had read about UPDATE CUBE commands allowing "write-backs" and even supporting setting child elements when a parent element is set (e.g.,http://technet.microsoft.com/en-us/library/ms145568.aspx), so I was hoping I could do something like that for the whole cube, but I wasn't sure of the syntax and how to craft an UPDATE CUBE command to update the whole cube. Is there some limitation on using UPDATE CUBE making it not work for what I'm trying to do?
Data always comes from the DataSource via processing. There is no other way I can think of, except literally hacking the DataFiles.
Does Analysis Services build up its own "fact table" from the underlying data source? If so I could perhaps replace all data in that? If not, then my only option, it would seem, is to replace all of the data in the underlying system, which might be harder to arrange.
Thanks again for your help, and apologies if these are silly questions!
The UPDATE Command is for "Write enabled Cubes", which is a completely different concept though.
You can read more about it here: http://technet.microsoft.com/en-us/library/aa216376%28v=sql.80%29.aspx
A short quote will make it clear: "If you write-enable a cube, client applications can record changes to the cube's data. These changes, known as writeback data, are stored in a table separate from the cube and its underlying data, but they are incorporated into query results as if they are part of the cube data. Write-enabled cubes allow end users to explore scenarios by changing cell values and analyzing the effects of the changes on cube data."
.. totally different concept and not anoynymizing the original data at all.
I don't completely understand the second question. There is one or more fact-tables per cube in the DataSource. And those are specified by you.
To anonymize the data you really have to use the approach from my first answer and anonymize the datawarehouse/datamart first and the re-load/process
I am sorry, but that's the only secure way.
Andreas Wolter | Microsoft Certified Master SQL Server
Blog: www.insidesql.org/blogs/andreaswolter
Web: www.andreas-wolter.com | www.SarpedonQualityLab.com- Marked as answer by Elvis Long Wednesday, November 20, 2013 9:17 AM
Wednesday, November 6, 2013 9:56 AM -
Many thanks, Andreas. That's very clear.
We have a cube that is standalone (we don't have access at the moment to the underlying relational database, which uses Cartesis), so I had assumed it must have all of its data cached somewhere, and was (naively) hoping that there might be a 'fact' table we could access directly in this disconnected cube. If so we could simply overwrite all of the data in there. But I guess accessing this data cache, wherever it is stored, would probably not be simple because I gather it is all in a proprietary format split across multiple files/tables.
I'm not sure how this "disconnected cube" is working without a valid DataSource - maybe it only needs direct access to the Data Source if it is rebuilt - Is there a way of updating the DataSource on a live cube and somehow "pushing" the data from the cube back into a suitably-configured empty relational database?
Another suggestion someone came up with is to use SSIS to copy the whole cube into a new cube somewhere else. It might then be a reasonable step to add a (possibly custom) data transformation into the middle of the SSIS pipeline to simply replace all data values with random data. Do you think this could work?
Friday, November 8, 2013 10:53 AM -
The Data storage depends on the storage mode (ROLA; MOLAP; HOLAP)
Read more about it here: http://technet.microsoft.com/en-us/library/ms174915.aspx
As I said: "Data always comes from the DataSource via processing. There is no other way I can think of, except literally hacking the DataFiles."
If you are in ROLAP; you could indeed change the data in the Source Database and it will reflect that. But there is nothing special about it.
For MOLAP, again: not possible.
SSIS just copies structure or structure and data. The result is the very same. You really have to go with my first answer containing a step-by-step description.
Please read through the links how Analysis Services work, what the architecture is, and it will be very clear, trust me. It's all documented and quite straight forward.
Andreas Wolter | Microsoft Certified Master SQL Server
Blog: www.insidesql.org/blogs/andreaswolter
Web: www.andreas-wolter.com | www.SarpedonQualityLab.comFriday, November 8, 2013 11:16 AM -
Hello,
In SSAS, we can implement dynamic security to hide sensitive data for users. For more detail information, please refer to the following article:
Setup dynamic dimension security in Analysis Services: http://bifuture.blogspot.in/2011/09/ssas-setup-dynamic-security-in-analysis.htmlFor performance tuning I'd like to take a copy of the cube and replace all data with random values (leaving empty cells empty).
I don't fully understand why you want to take the actions to tune SSAS performance. Here are some good articles about performance tuning in SSAS for your reference, please see:
Best Practices and Performance Optimization: http://www.mssqltips.com/sqlservertip/2565/ssas--best-practices-and-performance-optimization--part-1-of-4/
Analysis Services Performance Guide: http://download.microsoft.com/download/6/5/6/6567C845-FC8D-4D62-920F-C027A349C889/SSASPerfGuide2008R2.pdf Regards, Elvis Long
TechNet Community SupportTuesday, November 12, 2013 8:56 AM -
Hi Elvis,
Thanks for your reply.
The reason we need to anonymise the data is that the cube contains sensitive data and cannot be delivered to the developer team for optimisation whilst it still contains live data. We want to try to get a cube that is as similar as possible to the live cube, in terms of structure and setup, etc., so that performance will be the same, but which does not contain sensitive data values. The actual data values are irrelevant as far as performance goes, and the live data values cannot be released for testing.
It's a real shame that it seems like such a simple thing to do - replace the data values within a (disconnected) cube - but SSAS does not let us do it: we need instead to replace values in the underlying database and rebuild the cube from those.
Tuesday, November 12, 2013 11:19 AM