"Master data is information that is key to the operation of a business. It is the primary focus of the Information Technology (IT) discipline of Master Data Management (MDM), and can include reference data. This key business information may include data
about customers, products, employees, materials, suppliers, and the like. While it is often non-transactional in nature, it is not limited to non-transactional data, and often supports transactional processes and operations. For example, analysis and reporting
is greatly dependent on an organization's master data. Because master data may not be stored and referenced centrally, but is often used by several functional groups and stored in different data systems across an organization, master data may be duplicated
and inconsistent (and if so, inaccurate)." Wikipedia
In this article, we will be talking about master data services in the cloud. This article is being presented in the following order:
Fictional company "Westwind" is operating business in 98 countries around the world with 50,000 employees and more than 2800 enterprise client.
The company wants a consistent record of employees, client, contacts and subcontractors across the entire enterprise. These data will be use by more than in 500 different departments in different delivery centers around the globe.
These department level user are known as application team or "app team" in short.
An example of this is the "mylearning" team - a team similar to Microsft's mylearning team. They are know as app team because they build a specific application for the entire "westwind" enterprise. "Westwind" has an existing data warehouse solution and they
will make use of Master Data Service and Data Quality Service (DQS) for master data management.
Reference tables, dimensions and look-ups such as employees, client, and accounts will be supplied to department level application consumer as a "data as a service" Some department such as sales, finance and human resource supplies data to the central data
warehouse. They are referred to as "inbound" applications becuase they supply data to master database. While the data from these departments are being loaded, the warehouse is almost unavailable to the user.
However, "Westwind" requires that the data be available in the soonest possible time and they require an uptime of 99.9999 percent. "Westwind" requires that there should be one source of truth. In the current setting the only source of truth is the data
warehouse. However, the data warehouse is use for reporting and application team that may need that data are only allowed to pull the data from the warehouse using integration packages. This will later results to 1000 copies of the employees table alone and
1000 SSIS package being maintained and monitored by the integration team. There are at least 7 reference table in the company which will be used by 500 different department. Each department having different application.
This scenario will result to deployment of integration team around the world for 24/7 operation coverage Westwind would like to conserve hardisk space and insists to limit the number of copies of employee tables, accounts, and other reference table.
Another problem that "Westwind" encounters is that when inbound application changes their metadata. The change should be coordinated to the outbound or downstream application. The downstream application may or may not need the data.
The real headache comes in for example the 7 reference table changes their metadata each reference with at least 3 fields and each 7 reference table having at least 500 to 1000 downsteam application. With the above metioned problems, the integration team
also faces an extreme challenge commencing from requirements gathering, build deadlines, testing and port openings due to changes in the server or server migration. Westwind are also facing problems with the environment aside from port opening. These are patching,
VM backup retention of 3 days only, harddisk capacity planning and expansion and high availability solution. "Westwind" stakeholders are well aware of the cloud however they are extra paranoid about putting their data on the public cloud. "Westwind" want to
simplify all of these complexity and reduce their total cost of ownership.
Westwind implemented master data services to solve many of their problems above. Master data services provided some sort of governance over the data that goes in to the dimension table. To resolve downtime issue, "Westwind" initially implemented 3 MDS database
in the cloud. While the main MDS server is being populated the rest of the MDS database are kept alive supplying data to the enterprise. There are 3 MDS implementation, one each for EMEA, APAC and ASIA. EMEA servers are populated during EMEA off peak hours and
APAC are replicated during APAC off peak hours and so it is in ASIA. Synchronization of data in every region does not affect each other. With cloud implementation "Westwind" business insight team are now empowered to scale-up their cloud VM roles by adding compute
power or scale-out master data implementation by provisioning new servers. To resolve the problems on schema changes from the inbound applications, Westwind has eliminated all the reference tables in the downstream application and forces all of them to use
the MDS reference table that is geographically nearest to them. EMEA user for example uses the EMEA-MDS to minimize bandwidth. To rapidly respond to changing inbound schema changes, Westwind consolidate this changes to a monthly release. To facilitate faster
testing westwind provisions new VMRoles for a particular release leaving the current MDS online. When the release passes all the testing requirements The server are then tag as the new production servers and the current servers are archieved and decommisioned from
the cloud. All applications using the MDS table repoints to the new production server by changing their configuration. While most of the work of the integration team concerning integration were all gone, the team now focusses on integration that are not involving
master data. The problems with requirements gathering, port opening, metadata familiarity and sample data are still present. To resolve this problem "Westwind" business insight team a implemented metadata store in which all participating applications/department
can query and explore. Application team are required to connect their database to the metadata store before requesting any integration activity. This metadata allows "Westwind" business insight team to explore enterprise database and architecture even before
an official integration request. With the new "Westwind" cloud enabled BI. Most of the environment and configurations were gone. Cloud environment delivery team can expand or shrink computing resources such as CPU, hard disk storage and test environment provisioning
were all done in matter of minutes. "Westwind" stakeholder paranoia concerning their data on the cloud were secured by implementing some sort of DMZ in the cloud. Their servers can be only accessed onsite. This has been implemented via the VMrole internal
and virtual IP.
Below is the westwind proposed architecture.
To gain more understanding the new BI and Cloud architecture please take your time to watch the video below:
Cloud-Ready Data Services
To setup a master data service in the cloud you should create a windows server 2012 64 bit VMrole. Master data services works only on 64 bit operating system.
The first step is to provision a cloud "VM role". For this step please refer to the first demo in the
Cloud-Ready Data Services video published above.
Launch the MDS installer on the windows server 2012 VM role.
Figure 2. Showing MDS tiles in Windows Server 2012
To be able to configure Master Data services you should configure IIS. The following links show how to
Configure IIS in Windows server 2012.
For a complete list of
web application requirements, check here.
Once you have completed the IIS requirements you will then configure the MDS database. Click on the MDS Icon to begin installation.
Next step is to configure the MDS web site
Before proceeding lets review some of the basic terminology of MDS.
Entities are objects that are contained in Master Data Services models. Each entity contains members, which are the rows of master data that you manage. Entities is very much like a table
in Microsoft RDBMS terminology.
Attributes are objects that are contained in Master Data Services entities. Attribute values describe the members of the entity. An attribute can be used to describe a leaf member, a consolidated
member, or a collection.
One of the main objective of MDS is to manage warehouse dimension. In these videos you will learn how to manage warehouse dimension using MDS
Managing Data Warehouse Dimensions with MDS (part 2)
For more information on this feature you can consult the following videos below: Using MDS Excel Add-In to Model and Manage your Master Data
SQL 2012 Master Data Services - Web User Interface Explorer overview
This Technet video will give you an overview of DQS and MDS working together
DQS Integration with MDS: Data Matching using MDS
Although you can manage data entry with excel add-in you will be using SSIS to "inbound" the data to MDS database especially if you have an OLTP system as source. In the video link below, you will be shown how SSIS, DQS and MDS integrate with each other.
Enterprise Information Management (EIM): Bringing Together SSIS, DQS, and MDS