SQL11UPD06-TSCRIPT-02
This wiki is a transcript of a previously recorded video.
Related content assets:
Welcome to this Demonstration, Developing a SQL Server 2012 Tabular Project with SQL Server Data Tools. I’m Peter Myers. I’m a SQL Server MVP and a Mentor with SolidQ.
The demonstration is going to create a tabular project by using SQL Server Data Tools (SSDT). Let me go ahead and launch SQL Server Data Tools. Before I go ahead and create the project I want to point out to you that on the Tools | Options we now have the ability to define some defaults that are relevant to tabular projects that are being developed here in this environment. First of all, under Data Modeling, we can specify what the instance will be that will host the default workspace database. Be aware that all of the modeling work that you do, loading data and managing the model, is also being done on the instance Analysis Services, so it is a recommendation that you have an instance available – perhaps the Developers Edition could be installed on the developer's workstation itself. In this case it will be the localhost. The second point of configuration will be the Deployment option. By default, when a new project is created, it will inherit the server instance that you specified here. There is no need in this demonstration to update either, but now when I go ahead and create a new project – so here under the Analysis Services template I have the Analysis Services Tabular Project – I’m going to name this project Sales Analysis and this project will produce a model to allow us to analyze the sales of the Adventure Works bicycle company.
What we discover in Solution Explorer is that a tabular project consists of a single model and that’s it. There’s that one relationship of a single model per project and when the project is deployed to the server, it results in the creation of a database. For those that are familiar with PowerPivot, there are many, many similarities – in fact, they’re based on the same technology and the same development approach. However, with PowerPivot being an add-in to Excel, you may well appreciate that it uses that familiar ribbon interface which is not a feature in Visual Studio. So while we have very much the same functionality you have in the PowerPivot window, it’s expressed in terms of the menus that you have in Visual Studio. The starting point then to develop the model is to come to the Model menu and to import from a data source. All the data sources support by PowerPivot are also available to us as developers here inside SQL Server Data Tools. So the Adventure Works database is actually based on SQL Server, so I select it. I define the location for the server and, using Windows authentication, I select the database, which happens to be the AdventureWorksDWDenali database. I always like to update the friendly name as well so it consists of just the name of the database.
The second step then in the Table Import Wizard is to define the impersonation information. This is different from PowerPivot – this tab doesn’t exist because it’s using your credentials to connect to data stores. Once the project is deployed to the server, of course, we don’t necessarily want it to use the server’s service account. The best practice would be that you would specify a dedicated domain account that would apply the principle of least privilege to read the data in the corresponding data source. For convenience and brevity in this demonstration, I’m going to use the serivce account. I click Next. Then we have the ability, because this is a relational data store, to import from the list of tables and views or to define a query to import from. The first phase will be to import from existing tables in the database. So the first table that I choose will be the DimEmployee table. Here, for the friendly name, I provide the friendly name of Salesperson. Under Preview and Filter we have the ability to explore the data and also filter by columns and by rows. This is important – you don’t want to introduce unnecessary data into the model because it will consume memory resources. So I deselect all columns if interest to this model development is the EmployeeKey and the ParentEmployeeKey. So be aware there’s a self-referencing relationship working here that supports the definition of the organizational hierarchy that sales representatives report to managers, directors, and so on.
In addition to these two keys we need to be able to scribe the salesperson, so I introduce the FirstName and the LastName. Then I’m going to apply a filter because these are all the employees at the Adventure Works company and as this is a sales analysis I’m only interested in those employees where the SalespersonFlag is True. Now with these selected columns and this filter, this will define my first table in the model named Salesperson. The second table that I’ll import at the same time is the FactResellerSales table, which I’ll rename to become simply Sales. Likewise, I click Preview and Filter. My intention isn’t to bring in all of the columns, so I’m selective, just introducing the ProductKey, OrderDateKey, ShipDateKey, and then I want the ResellerKey – I don’t even need the reseller, I’m going to keep this really simple – EmployeeKey, and then a couple of measures: what is the TotalProductCost and what is the SalesAmount. In this single Table Import Wizard process I’m importing two tables. Now I click Finish. The wizard will create the tables and according to my filters, go ahead and load the data into them. We can see statistics, the number of rows that have been loaded, in addition to details that confirm that relationships were identified, and, in this case, because it was a self-referencing relationship that is not supported natively in tabular models, it has not been imported. However the relationship between the DimEmployee table and the FactResellerSales table hasn’t been identified and has been created as a relationship in the model.
So we can see here in this designer for this model we have two tables in the bottom left corner, Salesperson and Sales. Just to appreciate that the data in here – has been loaded in here – is completely read-only. What I can do is filter the data, so let’s take a look at all the rows where the EmployeeKey is 272. Let’s sort them in descending order of SalesAmount. (This is lightning fast, and I like to point this out, although it’s difficult with such a small amount of data.) We presently have about 60-odd thousand rows of data and this filtered set is 792 rows. It could be that I have 60 million rows of data, and provided we have adequate resources, i.e., memory, then the filtering and sorting would be instantaneous as you just saw. Do appreciate this technology using the VertiPaq engine or the (in-memory query mode) is incredibly efficient when it comes to storing the data and then for allowing us to filter and aggregate the data at query time. Let me go ahead and remove those filters and reset that sort. A new feature in the model designer, available in PowerPivot also, is the Diagram View, which is somewhat helpful for database developers building these models in that they might choose to work in this visual diagram view rather than the grid data view. What I’m going to do is import some more data to make this model a little more interesting. In the model menu I go to Open Existing Connections, and to relaunch the Table Import Wizard, I simply select that data connection I defined earlier and open it. This time, in the Table Import Wizard, I select the option to write a query and the purpose of this query will be to introduce the product data. Now I could write a valid select statement here, or of course use my favorite tool and copy and paste it into this box. For convenience I simply click Design. This allows me to go ahead and construct a query by drag and drop or multi selection. For example, I bring in from the DimProduct table this ProductKey, the EnglishProductName, and the Color. Because I know there are relationships to other tables I can introduce columns from the DimProductSubcategory table and also the DimProductCategory table. To test this I can run the query – this all looks great and so it’s this query result that will then be used to create the next table in my model. I like to take it a step further as well, to rename columns – I can do that in the query itself. That’s completely up to you. You can also rename at model level but I’m choosing to alias these columns at this point. Now, when I click Finish, this introduces a third table based on the same data connection that I defined for the other two tables. Now here in diagram view we see the addition of the Product table.
We don’t a relationship between Sales and Product, however there is indeed a relationship. The ProductKey here relates to the ProductKey in this table. What I can do is simply drag and drop to establish that relationship in my model. What’s very cool to appreciate is that this product data could have come from a different data source, even a different data format, but once introduced into this tabular model it’s like any other data and can be easily integrated simply by defining these relationships.
I need to introduce one more table and that’s to allow me to analyze by date. In this demonstration, what I have is a data feed and the date data I need lives in an OData feed that I’m going to obtain by copying the URL. Switching back to my project, this time from the model menu import data from data source and when I scroll right down to the bottom we see that feeds are an available type. So I paste in the URL to the data service and I call this the Adventure Works Data Service. The next step in the table import wizard is to define the credentials that will be used to connect to that service, and again, for convenience in this demonstration, I use the service account. I click Next. It interrogates the service and discovers there’s a single feed named CorporateDate. For the purpose of my model I rename that to Date and then in the Preview in Filter I make a selection of the columns that I want to include in my table. I’m interested in analyzing by fiscal date so I bring in the DateKey that will be used in relationships, I can bring in the MonthKey and the MonthLabel, and when I scroll across we see that we also have the FiscalQuarterLabel and the FiscalYearLabel. I click Finish and that results in the creation of my fourth table for date and the import of the fiscal data. So now to create relationships between the Sales table and the Date table, it’s the same thing. But what we notice is that I actually have two date keys, the OrderDateKey and the ShipDateKey. So I create the first one by dragging the OrderDateKey column to the DateKey column, and a new capability for tabular model development here in SQL Server 2012 is that we can establish multiple relationships between two tables, or indeed two paths.
What you’ll note is the second relationship that I created is indicated by using a dashed line. What it means is that the first relationship is active and other relationships between the two table paths must be configured as inactive. Inactive relationships are not used by default, but can be used explicitly in calculations. If you stay tuned for the following demonstration we’ll explore the new features in Data Analysis Expressions (DAX). We’ll take a look at how we can create some explicit calculations that will navigate this type of inactive relationship.
So I’m at the point where I have my fundamental model. What I refer to as the next phase – as the model enrichment – can be achieved by making the model optimized for the end user experience. So what I do is I right-click my Date table, and then I select Go To. This switches me across to the data grid view. What I’m going to do in data grid view is optimize the Date table first of all by coming into the Table menu and we’ll see we have this option to mark this as a Date table. In fact I have made a mistake in my demonstration already; and that is that it is disabled. The reason for this is that a requirement for marking a date table is that you must have a column of typed date, which I omitted by mistake. It’s easily fixed however. So here on the Table menu, I open up the Table Properties and I reconfigure the filter selection and I introduce that Date column. So now that I have the Date column appended to this table I can now come across to the Table menu and mark this as a date table. It’s that date column that stores the date that provides intelligence and metadata about how data is defined within our model. So this will do a couple of things. It will support the time intelligence functions that are part of DAX. The second thing is, if you’re using Excel and a PivotTable to filter this, you will know that this is a date column and it will provide you date-related filter options. The next thing that I’m going to do is point out that the month labels here, if we go to filter them, we get to see that the distinct values that are available for filtering are actually sorted alphabetically and not, as we would expect, chronologically.
Another great new feature in SQL Server 2012 is the ability to sort a column by the values in another column. This is why I introduced the MonthKey column. So this is pretty straightforward to configure. Here on the Column menu, I can select Sort by Column, so the MonthLabel will sort by the values in the MonthKey. Now what we see as we attempt to filter the MonthLabel, we will now see that those distinct values are indeed sorted chronologically. So, a bit of a clean up at this point. I’m going to ensure that these two columns, so that DateKey itself is used to support the relationship to the Sales table and that MonthKey, I right-click them and I hide them from client tools. It’s not my intention that end users should even know they exist. This Date column here, that was added in my second take, I go ahead and hide it also. That leaves me these three columns of MonthLabel, FiscalQuarterLabel and FiscalYearLabel. So the last optimization I have for this table is to provide more user-friendly names. So I right click, rename each of these columns, so that they are named Month, Quarter, and Year.
There is one more thing I need to do. As you might appreciate, it is very common for people to navigate from a year, to a quarter, to a month. So this ability is now available to us, in tabular model development, to define a hierarchy to support navigation and the aggregation at different levels. So I switch across to the Diagram View. So let me just point out to you that here in the bottom right corner, we have the ability to switch to Diagram View as well. What I can do is come to the Date table, and here in the right corner I can go ahead and maximize it. So when I maximize it I have a full view. What we see is that this table consists of six columns, three of which are visible. What I’m going to do is multi-select these three, right click the selection, and create a hierarchy from the selection. What we see is the addition of a hierarchy, which I will rename to become Fiscal, consisting of the levels of Year, Quarter, and Month. You might ask yourself, while scratching your head at the same time, how did it know to sequence those levels, Year, Quarter, Month? The fact is, it does use built-in intelligence to inspect the values and the cardinality of data between the columns. If you have sufficient data, it may well (and hopefully) detect the correct sequence. Now if it doesn’t, what you can do is simply drag and drop these to sequence the columns correctly. Now that I have this defined, I’m now going to multi-select these columns and hide these. So all that will be visible to client tools, using the Date table in this model, will be a Fiscal hierarchy consisting of Year, Quarter and Month levels.
Next I move onto the optimization of the Product table. So let me go ahead and maximize the view here. I already renamed these columns. I was a little bit tricky when I created the SELECT statement. I simply aliased the column and that saved me the task of renaming them here. So, to create that hierarchy, let me go ahead and multi-select the three columns and then create a hierarchy based on these. Again, we get to see that the sequence of the columns is correct. There was sufficient data for it to detect that Category is the first level, Subcategory is the second level. So, I provide a more intuitive name for the hierarchy and then I go ahead and hide all columns with the exception of the color. That completes the optimization of my Product table.
Now, for Salesperson, what we have are the two keys and the FirstName and LastName, but what I would like is a single column that brings together. Now, calculations are not something that can be performed in Diagram View. So I right-click the table, select Go To to switch across to the Grid View, and then, to add a calculated column, I simply come in here in the formula bar, I enter equals, click the FirstName column, concatenate a space, and then click the LastName column. Then I rename this column to become Salesperson. Last optimization will be to multi-select all the columns except the new one and hide them from the client tools. So now switching back to Diagram View, we can see that the Salesperson table consists of one single visible column named Salesperson.
That brings our attention to the final table for optimization, that being the Sales table. Well, the Sales table being based on a fact table really consists of dimension keys and the measures. The two measures being, TotalProductCost and SalesAmount. So, what I’m going to do is switch across to the Diagram View because I just made mentioned that calculations need to be done in the Grid View. Calculations come in two forms. In tabular model development you have calculated columns but with the addition of new columns to the table. Then, down here in this area of the designer is what is known as the Calculation Area (or the Measure Grid). This is where we define a second type of calculation known as a measure. Now measures are very, very different because their context is driven by the filter context of a query. Measures almost always involve aggregation. So, when it comes to analyzing sales and cost, we typically want the sum of sales, the sum of cost by a type of certain time period, or product, or salesperson. To facilitate the creation of these measures, I can multi-select columns and then we have this ability on the toolbar to go ahead and create the sum of them. What you’ll see is the calculations are added into this Measure Grid. So the selection of this cell will show us, in the formula bar, that the measure is named, Sum of TotalProductCost, colon, equals, SUM (DAX expression), which in this case is the sum of the TotalProductCost column. So, what I’m going to do is to rename this just to become Cost. Let me point out to you, that when I open up the Properties window, so I press F4. So what’s happening is, I’ve renamed the measure Cost, I pressed Enter – you can see here on the status bar, that it’s attempting to commit that to the database, but there’s some delay going on inside my virtual machine. OK, it’s caught up. That’s an unusual delay. What we now find is this measure named Cost, I’ve opened up the Properties window here in the bottom right corner. So we can update the measure name here. We see that we have properties like Format, because this is currency, we have currency symbol. We have decimal places. What I’m going to do then for the second measure, Sum of SalesAmount, is simply rename it to Sales. We’ll see that it’s also format as currency. Why are these both currency? Well, they’re based off columns that use the currency data type.
What I’m going to do is introduce some additional measures. For example, what I’d like here is Profit. So, Profit, colon, equals, Sales subtract Cost. There’s the addition of a third measure. Then I will come across to the Property window and update its format to currency. Then, one further measure will be the Profitability. Profitability is computed by dividing the Profit measure by Sales. This time the format property for this measure will be set to Percentage. Now, what’s very cool is in the measure grid you actually get to see values. So, for example, if I filter by EmployeeKey and say show me employee 272, we get to test, in a limited way, what the result of these measures will be. I say limited because it’s limited by filtering by columns in the same table, not by other tables that relate to this table. We will explore that later by analyzing in Excel. Let me go ahead and reset that filter. There’s one last thing that I will need to do to enhance that measure – that is to create a KPI based on the Profitability measure. So the KPI is based on the measure. You can compare to another measure in the model, but I’m going to use the value of itself. I’m going to create some thresholds here that says, if Profitability is 0 or less, it’s well off track. It’s in the red zone. Anything between 0 and 0.05, in other words, five percent, is slightly off track, and anything above is on track. I can also change the icon style here. When I click OK, we get to see that this measure includes this icon to show me that is, in fact, also a KPI. It allows us to compare to a goal and to express a status. So now what I can do is select all columns in this table and I can hide them. They’re not relevant. The Sales table will now express four measures, one of which has now been extended to become a KPI. We see that here in the Diagram View.
So how do we test this? Well I think this is a great feature of this designer. But when it comes to testing it does require that you have Excel installed. But what developer doesn’t have Excel installed? So we can go ahead and here on the Model menu, and then select Analyze in Excel. We have the ability to analyze using a different security context. Perspectives are a new feature that have been added to support narrowing down the visibility of a large model into smaller subject area perspectives. I don’t have either in this model design, so I’m not going to go ahead and change the configuration of this. I click OK. And what this does, then, is it downloads or opens up an Office Data Connection (ODC) file that points to the workspace database on our Analysis Services instance. What that means to Excel is it will create a PivotTable in a worksheet that will connect to it. Now, when I look at Excel – here is the PivotTable, the PivotTable Field List here on the right hand side – then I can go ahead and create a report. For example, go ahead and show me Sales, and Profit and Profitability. Then from my KPI’s I’d like to know what the status of that Profitability is. It’s so simple.
Now, what’s interesting about a tabular model is, it exposes a multi-dimensional interface. Excel does not even know that it is communicating to a tabular model. The fact that a tabular model exposes this multi-dimensional interface, consisting of measures, KPI’s, dimensions, and hierarchies, means that existing OLAP client tools continue to work as they have been designed to work against the cubes. So, it provides some more context here we can filter. So here in my Date dimension, or so the PivotTable thinks is a dimension, I can take the Fiscal hierarchy and drag it into the Report Filter drop zone. Then I can filter by a period of time. For example, here is what our filled arrow, Fiscal hierarchy looks like in 2004 – go ahead and show me what was going on in the month of October. Then from our Products hierarchy point of view I drag that down to the rows. Now, we can see the bikes are off track and we can explore, for this time period, what was going on for all the different products.
Just to complete the demonstration, I just want to point out that there are other capabilities that I don’t have time to explore here in this recording. But we do have that ability in a tabular project to define roles. So we can limit visibility to different groups of users based on static, or even dynamic, DAX expressions. For example, you can only see the data for the sales territory that you are from. In addition, we can define partitions. This is a great scalability feature that allows us to create partitions at table level, and therefore refresh or process data at a lower level than the entire table itself. So, it’s there for manageability.
Return to SQL Server 2012 Developer Training Kit BOM (en-US)