I have blogged in the past about SQL Database to SQL Database synchronization using Data Sync Service for SQL Database (Introduction to Data Sync Service for Windows Azure SQL Database). However what if Data Sync Service doesn’t do exactly what you want? It stands to reason that the Service will be adding more features as it matures, however if you need a custom synchronization you can easily program your own using the Sync Framework V2.1 Software Development Kit and a Windows Azure worker role. In this wiki article I am going to show how to synchronize between two SQL Databases using the Sync Framework SDK and a Windows Azure worker role.
In Data Sync Service for SQL Database, the source database is called the Hub; the destination databases are called the members. There is only one Hub, and there can be many members. We are going to maintain this terminology in the example.
First thing to do is download and install the Microsoft Sync Framework 2.1 Software Development Kit (SDK). This will install the assemblies on your local machine that you need to reference in order to make your console application run.
Next step is to create the cloud application with a worker role in Visual Studio 2010 (for more information about creating a cloud application look here). Then reference the Sync Framework assemblies, which are:
Found in: C:\Program Files\Microsoft SDKs\Microsoft Sync Framework\2.1\Runtime\x64\ and the Synchronization Framework providers:
Found in: C:\Program Files\Microsoft SDKs\Microsoft Sync Framework\2.1\Runtime\ADO.NET\V3.1\x64
Windows Azure requires that you use the x64 version of the Sync Framework assemblies, if you are running a 32-bit development environment, these will not work in the development fabric, you will have to have the x86 version. These assemblies do not natively exist in Windows Azure so you will need to mark them with local copy in order to deploy them to Windows Azure.
Besides the managed assemblies you will need to include Synchronization21.dll, which is a native DLL that requires a manifest. This can become a little tricky, for more information see How to: Deploy Sync Framework to Windows Azure. Without this native DLL your worker role will not deploy onto Windows Azure (the Windows Azure role instance will never initialize properly), however the worker role will work within your Development Fabric on a local box with the Sync Framework SDK installed.
For this example, I am going to use the Adventure Works database (which can be downloaded here) and synchronize just a subset of the tables: the Customer and Product tables. Once you have it download, install it on a new SQL Database under your server using the command line utility that comes with the download.
I also need to create a member database on the SQL Database Server that is empty database; the cloud application, using the Sync Framwork, will do the initial provisioning to setup the table schema on member SQL Database.
Once you have created the database on the SQL Database server you will have to create the SalesLT schema that the Product and Customer tables are in. You can do this by attaching to SQL Database with SQL Server Management Studio and executing this Transact-SQL statement:
There are really two steps to creating a relationship between two databases in the synchronization framework, the setup which provisions the databases and the sync which moves the data. The setup only needs to be run once, however our clever code checks for the provisioning before it applies the provisioning, so it can be safely be run multiple times.
A Windows Azure worker role has several entry points, including Run(). Within Run() we are going to call the setup once, when the worker role starts. Our example code will check to make sure that both the hub and the member databases are provisioned before trying to move the data. Then we are going to loop, calling sync every 10 seconds. With a Windows Azure worker role, you pay for the time that the worker role is running, so why not synchronize every 10 seconds? The Sync framework does have to query the databases every time you try to sync to check for changes, if the worker role is running in the same data center as the SQL Databases there is no charge, however if you are calling databases outside the data center, you will have a minor bandwidth charge every 10 seconds.
The setup code is the same as the code in this wiki article that synchronizes an on-premise SQL Server to a SQL Database:
So why is the code so similar to the blog post that invoked SQL Server? Because the Sync Framework was designed to easily sync different data sources and SQL Database and SQL Server are not that different.
The synchronization code is also the same as the blog post:
The synchronization direction in the code above states that the synchronization is going in both directions. Writes made to the member database and moved to the hub and vice versa. One you have the code running on Windows Azure try changing the database with an update you will be surprised how fast everything is synchronized.
The functionality in the example is very similar to Data Sync Service for SQL Database. Now that we have the basics out of the way, in an upcoming blog post I will be showing row level filtering that allows you to move rows that meet specific criteria from the hub database to the member database.