Inside Microsoft Azure SQL Database

Inside Microsoft Azure SQL Database

With Microsoft Azure SQL Database, you can create SQL Server databases in the cloud. Using SQL Database, you can provision and deploy your relational databases and your database solutions to the cloud, without the startup cost, administrative overhead, and physical resource management required by on-premises databases. The paper will examine the internals of the SQL Databases, and how they are managed in the Microsoft Data Centers, to provide you high availability and immediate scalability in a familiar SQL Server development environment.

 Note
If you wish to contribute to this page, use the Edit tab at the top (sign-in required). If you wish to provide feedback for this documentation please either send e-mail to azuredocs@microsoft.com or use the Comment field at the bottom of this page (sign- in required).

Introduction

SQL Database is Microsoft’s cloud-based relational database service. Cloud computing refers to the applications delivered as services over the Internet and includes the systems, both hardware and software, providing those services from centralized data centers. This introductory section will present basic information about what SQL Database is and what it is not, define general terminology for use in describing SQL Databases and applications, and provide an overview of the rest of the paper.

What Is SQL Database?

Many, if not most, cloud-based databases provide computing resources using a virtual machine (VM) model. When you purchase your subscription from the vendor and set up an account, you are provided with a VM hosted in a vendor-managed data center. However, what you do with that VM is then entirely isolated from anything that goes on in any other VM in the data center. Although the VM may come with some specified applications preinstalled, you are free to install additional applications for your own business needs in your own personalized environment. Even though your applications can run in isolation, you are dependent on the hosting company to provide the physical infrastructure, and the performance of your applications is impacted by the load on the data center machines, from other VMs using the same CPU, memory, disk I/O, and network resources.

Microsoft Azure SQL Database uses a completely different model. The Microsoft data centers have installed large-capacity SQL Server instances on commodity hardware that are used to provide data storage to the SQL Databases created by subscribers. One SQL Server database in the data center hosts multiple client databases created through the SQL Database interface. In addition to the data storage, SQL Database provides services to manage the SQL Server instances and databases. More details regarding the relationship between the SQL Databases and the databases in the data centers, as well as the details regarding the various services that interact with the databases, are provided later in this paper.

What Is in This Paper

In this paper, we describe the underlying architecture of the SQL Server databases in the SQL Database data centers, in order to explain how SQL Database provides high availability and immediate scalability for your data. We tell you how SQL Database provides load balancing, throttling, and online upgrades, and we show how SQL Server enables you to focus on your logical database and application design, instead of having to worry about the physical implementation and management of your servers in an on-premises data center.

If you are considering moving your SQL Server databases to the cloud, knowing how SQL Database works will provide you with the confidence that SQL Database can meet your data storage and availability needs.

What Is NOT in This Paper

This paper is not a tutorial on how to develop an application using SQL Database, but some of the basics of setting up your environment will be included to set a foundation. We will provide a list of links where you can get more tutorial-type information.

This paper will not cover specific Microsoft Azure features such Microsoft Azure Table store and Blob store.

Also, this paper is not a definitive list of SQL Database features. Although SQL Database does not support all the features of SQL Server, the list of supported features grows with every new SQL Database release. Because things are changing so fast with SQL Database service updates every few months and new releases several times a year, users will need to get in the habit of checking the online documentation regularly. New service updates are announced in the common Microsoft Azure blog, found here: http://blogs.msdn.com/b/windowsazure/. In addition to announcing the updates, the blog is also your main source for information about new features included in the updates, as well as new options and new tools. Updates not visible to the customer may not have corresponding documentation or announcement.

You can check to see the date of the service update you are connected to by examining the @@version function. At this writing, we receive the following result:

Query:

SELECT @@version;

Result:

Microsoft Azure SQL Database (RTM) - 10.25.9386.0

Jul 21 2010 12:47:47

Copyright (c) 1988-2009 Microsoft Corporation

Terminology

The terms used in this paper will be based on the definitions in the UC Berkeley Technical Report -  “Above the Clouds: A Berkeley View of Cloud Computing”, EECS Technical Report, February 2009. Terms specific to Microsoft’s implementation of database cloud computing with SQL Database will be defined as they are introduced. This section lists and defines the basic terms used in the paper.

Cloud computing –refers to both applications delivered as services over the Internet and the hardware and software in the data centers that provide those services; in the case of SQL Database, the data centers are set up and managed by Microsoft.

The cloud – specifically refers to the hardware and software in the data centers.

Target Audience

This paper is intended to provide information about SQL Database for cloud database application developers, DBAs, and corporate decision makers.

Prerequisites

Basic knowledge on relational databases is expected, SQL Server and Windows knowledge from http://msdn.microsoft.com/en-us/ would be helpful. We strongly recommend that you read the paper entitled “Getting Started with Microsoft Azure SQL Database,” which is available at: http://msdn.microsoft.com/en-us/windowsazure/sqlazure


Setting Up SQL Database

 

Subscriptions

To use SQL Database, you must first create a Microsoft Azure platform account, which allows you to access all the Microsoft Azure-related services, including SQL Database. The account is used to set up and manage your subscriptions and to bill for consumption of any of the Microsoft Azure services including SQL Database. You can buy your subscription from the Microsoft Online Services Customer Portal (https://mocp.microsoftonline.com/site/default.aspx). After your purchase is complete, you receive a confirmation notification in email with instructions for accessing your account in the SQL Database platform. This confirmation includes the server name that you use in your connection strings or that you provide when registering a server in SQL Server Management Studio. In SQL Server 2008 R2, SQL Server Management Studio includes built-in support for SQL Database

From a billing and usage standpoint of your platform account, while it is considered a best practice to run applications that access a SQL Database on the Microsoft Azure platform, it is not mandatory. Also, SQL Database usage does not require Microsoft Azure usage as a prerequisite.

After you create a Microsoft Azure platform account, you can use the Microsoft Azure Platform Management portal (http://windows.azure.com) to create a single SQL Database server and its associated administrator account. To create additional SQL Database servers, create additional subscriptions. The Microsoft Azure Platform Management portal provides a user interface that you can use to create servers and to quickly create databases on those servers.

Each subscription allows one instance of SQL Server to be defined, which will initially include only a master database. For each server, you will also need to configure the firewall settings, to determine which connections will be allowed access.

We cover more details about SQL Database login and user security later in this paper. In addition, for more information about connection settings, logins, users, and configuring the SQL Database Firewall, see the following articles:

Databases

Each SQL Database server always includes a master database. Up to 149 additional databases can be created for each SQL Database server. Databases can be created using the Microsoft Azure Platform Management portal or the Transact-SQL CREATE DATABASE command or SQL Database Management API. You can also use SQL Server Management Studio to connect to your SQL Database server, and create a database using Transact-SQL via a query template. To open a query template in SQL Server Management Studio, right click Databases and then click New Database.

The Microsoft Azure Platform Management portal interface for creating a database is shown in Figure 1. You can specify the name, edition, and the maximum size of the database. There are two editions of SQL Databases: Web and Business, and when you create a database using the Microsoft Azure Platform Management portal, the maximum size you specify determines the edition you create. A Web Edition database can have a MAXSIZE value of 1 GB or 5 GB. A Business Edition database can have a MAXSIZE value of 10 GB, 20 GB, 30 GB, 40 GB, 50 GB, 100 GB, or 150 GB. If the size of your database reaches the limit you set in MAXSIZE, you receive an error code 40544. When this happens, you cannot insert data, update data, or create new database objects. However, you can still read and delete data, truncate tables, drop tables and indexes, and rebuild indexes. If you delete data or drop objects to free storage space, there can be as much as a fifteen-minute delay before you can insert new data or modify existing data.

Figure 1: Creating a new Web Edition SQL Database

To change the name, edition, or maximum size of your database after creation, you can use the ALTER DATABASE command. The SQL Database ALTER DATABASE command allows three options to be specified (name, maximum size, and edition) as shown here:

ALTER DATABASE database_name

{

MODIFY NAME = new_database_name

|MODIFY (MAXSIZE = {1 | 5 | 10 | 20 | 30 | 40 | 50 | 100 | 150} GB)

|MODIFY (EDITION = {'web' | 'business'})

}

Pricing for your SQL Database subscription is per database, and it is based on the edition. The per-database fee is amortized over the month and charged on a daily basis. You pay for the user databases you have on the days you have them. There is no fee for the master database. You should be aware, however, that if you drop and create a database with the same name several times in one day, each CREATE counts as a new database for that day.

There are also additional charges for data transfer volume, any time data comes into or out of the data center. You have your choice of running your application code on your own premises and connecting to your SQL Database in the data center, or of running your application code in Microsoft Azure, which is hosted in the same data center as your SQL Database. The first model is referred to as the code-far model and the second is referred to as code-near. The code-near model avoids the additional data transfer charges and is very close to an on-premises programming model. In either case, you should be aware of the Internet network latency that cannot be mitigated using either model.

The SQL Database data access model does not support cross-database queries in the current version, and the USE command is not supported. A connection is made to a single database; if you need data from another database, you must make a new connection. In SQL Server Management Studio, to specify the database to which you want to connect interactively, click Options in the New Connection dialog box. Programmatically, you can specify the database in your application’s connection string. You can code cross-database joins or comparisons in the application after the data has been returned from the appropriate databases.

The master Database

When first creating a SQL Database server, you have only one database available, called the master database. The master database in SQL Database is slightly different from the corresponding database in SQL Server. Like a SQL Server master database, the SQL Database master database is the store for login names, but the similarities end there. SQL Database doesn’t support server-level roles, and introduces two new database roles: dbmanager and loginmanager that are available only in the master database.

  • dbmanager – The SQL Database dbmanager role is similar to the dbcreator role for an on-premises instance of SQL Server. It is required in order to create databases.
  • loginmanager - Like the securityadmin role for an on-premises instance of SQL Server, the loginmanager role in SQL Database is required in order to create logins.

The SQL Database master database has some other key differences from an on-premises SQL Server master database:

  • The master database in SQL Database contains some catalog views that are unavailable in an on-premises instance of SQL Server. These include: sys.firewall_rules, sys.bandwidth_usage, and sys.database_usage.
  • Although users can be created in your SQL Database master database, those users cannot take full control of the database; in other words, no one other than server-level principal login (created by the initial SQL Database server creation process) can become a database owner of the master database.
  • Guest access to the master database is disabled (this is also true for all databases created on your SQL Database server.)

Security

Most security issues for your SQL Databases are managed by Microsoft within the SQL Database data center, with very little setup required by the users. However, there are some security issues that it will be useful to be aware of to fully understand how your data is kept secure.

Access to a SQL Database

Just as when accessing an on-premises database, a user must have a valid login and password in order to connect to the SQL Database. Because SQL Database supports only standard security, each login must be explicitly created. An administrator can create an unlimited number of logins using SQL Server Management Studio for SQL Server 2008 R2 or using Transact-SQL. Common login names such as ‘sa’, ‘admin’, ‘administrator’, ‘root’, and ‘guest’ are not allowed, requiring an illegal user to guess at valid login names.

Passwords are required to be strong, following the Windows and SQL Server guidelines for password strength.

In addition, the firewall can be configured on each SQL Database server to only allow traffic from specified IP addresses to access the SQL Database server. By default, no IP addresses are included in the list. This helps to greatly reduce any chance of a denial-of-service (DoS) attack. All communications between clients and SQL Database must be SSL encrypted, and clients should always connect with Encrypt = True to ensure that there is no risk of man-in-the-middle attacks. DoS attacks are further reduced by a service called DoSGuard that actively tracks failed logins from IP addresses and if it notices too many failed logins from the same IP address within a period of time, the IP address is blocked from accessing any resources in the service.

Database Users

The security model within a database is identical to that in SQL Server. Users are created and mapped to login names. Users can be assigned to roles, and users can be granted permissions. Data in each database is protected from users in other databases because the connections from the client application are established directly to the connecting user’s database. There is no way to break out of this connection.

For more basic information about security with SQL Database, see the white paper Security Guidelines for Microsoft Azure SQL Database (http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=4322517e-9d80-4ad3-8a75-bf0a10aa64d9).

Compatibility with SQL Server

Although we talk about your SQL Database being similar to a SQL Server database, there are some important differences. In particular, many SQL Server features are not currently supported in a SQL Database.

Because this list is changing with every SQL Database Service Update, be sure to check the list of supported and unsupported features. For more information, see SQL Server Feature Limitations (Microsoft Azure SQL Database) (http://msdn.microsoft.com/en-us/library/ff394115(v=MSDN.10).aspx).


SQL Database Architecture Overview

As discussed earlier, each SQL Database is associated with its own subscription. From the subscriber’s perspective, SQL Database provides logical databases for application data storage. In reality, each subscriber’s data is actually stored multiple times, replicated across at least three SQL Server databases that are distributed across multiple physical servers in a single data center. Many subscribers may share the same physical database, but the data is presented to each subscriber through a logical database that abstracts the physical storage architecture and uses automatic load balancing and connection routing to access the data. The logical database that the subscriber creates and uses for database storage is referred to as a SQL Database.

Logical Databases on a SQL Database Server

SQL Database subscribers access the actual databases, which are stored on multiple machines in the data center, through the logical server. The SQL Database Gateway service acts as a proxy, forwarding the Tabular Data Stream (TDS) requests to the logical server. It also acts as a security boundary providing login validation, enforcing your firewall and protecting the instances of SQL Server behind the gateway against denial-of-service attacks. The Gateway is composed of multiple computers, each of which accepts connections from clients, validates the connection information and then passes on the TDS to the appropriate physical server, based on the database name specified in the connection. Figure 2 shows the complex physical architecture represented by the single logical server.

Figure 2: A logical server and its databases distributed across machines in the data center (SQL stands for SQL Server)

In Figure 2, the logical server provides access to three databases: DB1, DB3, and DB4. Each database physically exists on one of the actual SQL Server instances in the data center. DB1 exists as part of a database on a SQL Server instance on Machine 6, DB3 exists as part of a database on a SQL Server instance on Machine 4, and DB4 exists as part of a SQL Server instance on Machine 5. There are other SQL Databases existing within the same SQL Server instances in the data center (such as DB2), available to other subscribers and completely unavailable and invisible to the subscriber going through the logical server shown here.

Each database hosted in the SQL Database data center has three replicas: one primary replica and two secondary replicas. All reads and writes go through the primary replica, and any changes are replicated to the secondary replicas asynchronously. The replicas are the central means of providing high availability for your SQL Databases. For more information about how the replicas are managed, see “High Availability with SQL Database” later in this paper.

In Figure 2, the logical server contains three databases: DB1, DB2, and DB3. The primary replica for DB1 is on Machine 6 and the secondary replicas are on Machine 4 and Machine 5. For DB3, the primary replica is on Machine 4, and the secondary replicas are on Machine 5 and on another machine not shown in this figure. For DB4, the primary replica is on Machine 5, and the secondary replicas are on Machine 6 and on another machine not shown in this figure. Note that this diagram is a simplification. Most production Microsoft SQL Database data centers have hundreds of machines with hundreds of actual instances of SQL Server to host the SQL Database replicas, so it is extremely unlikely that if multiple SQL Databases have their primary replicas on the same machine, their secondary replicas will also share a machine.

The physical distribution of databases that all are part of one logical instance of SQL Server means that each connection is tied to a single database, not a single instance of SQL Server. If a connection were to issue a USE command, the TDS might have to be rerouted to a completely different physical machine in the data center; this is the reason that the USE command is not supported for SQL Database connections.

Network Topology

Four distinct layers of abstraction work together to provide the logical database for the subscriber’s application to use: the client layer, the services layer, the platform layer, and the infrastructure layer. Figure 3 illustrates the relationship between these four layers.

Figure 3: Four layers of abstraction provide the SQL Database logical database for a client application to use

The client layer resides closest to your application, and it is used by your application to communicate directly with SQL Database. The client layer can reside on-premises in your data center, or it can be hosted in Microsoft Azure. Every protocol that can generate TDS over the wire is supported. Because SQL Database provides the TDS interface as SQL Server, you can use familiar tools and libraries to build client applications for data that is in the cloud.

The infrastructure layer represents the IT administration of the physical hardware and operating systems that support the services layer. Because this layer is technically not a part of SQL Database, it is not discussed further in this paper.

The services and platform layers are discussed in detail in the next sections.

Services Layer

The services layer contains the machines that run the gateway services, which include connection routing, provisioning, and billing/metering. These services are provided by four groups of machines. Figure 4 shows the groups and the services each group includes.

Figure 4: Four groups of machines provide the services layer in SQL Database

The front-end cluster contains the actual gateway machines. The utility layer machines validate the requested server and database and manage the billing. The service platform machines monitor and manage the health of the SQL Server instances within the data center, and the master cluster machines keep track of which replicas of which databases physically exist on each actual SQL Server instance in the data center.

The numbered flow lines in Figure 4 indicate the process of validating and setting up a client connection:

  1. When a new TDS connection comes in, the gateway, hosted in the front-end cluster, is able to establish a connection with the client. A minimal parser verifies that the command is one that should be passed to the database, and is not something such as a CREATE DATABASE, which must be handled in the utility layer.
  2. The gateway performs the SSL handshake with the client. If the client refuses to use SSL, the gateway disconnects. All traffic must be fully encrypted. The protocol parser also includes a “Denial of Service” guard, which keeps track of IP addresses, and if too many requests come from the same IP address or range of addresses, further connections are denied from those addresses.
  3. Server name and login credentials supplied by the user must be verified. Firewall validation is also performed, only allowing connections from the range of IP addresses specified in the firewall configuration.
  4. After a server is validated, the master cluster is accessed to map the database name used by the client to a database name used internally. The master cluster is a set of machines maintaining this mapping information. For SQL Database, partition means something much different than it means on your on-premises SQL Server instances. For SQL Database, a partition is a piece of a SQL Server database in the data center that maps to one SQL Database. In Figure 2, for example, each of the databases contains three partitions, because each hosts three SQL Databases.
  5. After the database is found, authentication of the user name is performed, and the connection is rejected if the authentication fails. The gateway verifies that it has found the database that the user actually wants to connect to.
  6. After all connection information is determined to be acceptable, a new connection can be set up.
  7. This new connection goes straight from the user to the back-end (data) node.
  8. After the connection is established, the gateway’s job is only to proxy packets back and forth from the client to the data platform.

Platform Layer

The platform layer includes the computers hosting the actual SQL Server databases in the data center. These computers are called the data nodes. As described in the previous section on logical databases on a SQL Database server, each SQL Database is stored as part of a real SQL Server database, and it is replicated twice onto other SQL Server instances on other computers. Figure 5 provides more details on how the data nodes are organized. Each data node contains a single SQL Server instance, and each instance has a single user database, divided into partitions. Each partition contains one SQL Database client database, either a primary or secondary replica.

Figure 5: The actual data nodes are part of the platform layer

A SQL Server database on a typical data node can host up to 650 partitions. Within the data center, you manage these hosting databases just as you would manage an on-premises SQL Server database, with regular maintenance and backups being performed within the data center. There is one log file shared by all the hosted databases on the data node, which allows for better logging throughput with sequential I/O/group commits. Unlike on-premises databases, in SQL Database, the database log files pre-allocate and zero out gigabytes of log file space before the space is needed, thus avoiding stalls due to autogrow operations.

Another difference between log management in the SQL Database data center and in on-premises databases is that every commit needs to be a quorum commit. That is, the primary replica and at least one of the secondary replicas must confirm that the log records have been written before the transaction is considered to be committed.

Figure 5 also indicates that each data node machine hosts a set of processes referred to as the fabric. The fabric processes perform the following tasks:

  • Failure detection: notes when a primary or secondary replica becomes unavailable so that the Reconfiguration Agent can be triggered
  • Reconfiguration Agent: manages the re-establishment of primary or secondary replicas after a node failure
  • PM (Partition Manager) Location Resolution: allows messages to be sent to the Partition Manager
  • Engine Throttling: ensures that one logical server does not use an disproportionate amount of the node’s resources, or exceed its physical limits
  • Ring Topology: manages the machines in a cluster as a logical ring, so that each machine has two neighbors that can detect when the machine goes down

The machines in the data center are all commodity machines with components that are of low-to-medium quality and low-to-medium performance capacity. At this writing, a commodity machine is a SKU with 32 GB RAM, 8 cores, and 12 disks, with a cost of around $3,500. The low cost and the easily available configuration make it easy to quickly replace machines in case of a failure condition. In addition, Microsoft Azure machines use the same commodity hardware, so that all machines in the data center, whether used for SQL Database or for Microsoft Azure, are interchangeable.

The term cluster refers to a collection of machines in the data center plus the operating system and network. A cluster can have up to 1,000 machines, and at this time, most data centers have one cluster of machines in the platform layer, over which SQL Database replicas can be spread. The SQL Database architecture does not require a single cluster, and if more than 1,000 machines are needed, or if there is need for a set of machines to dedicate all their capacity to a single use, machines can be grouped into multiple clusters.


High Availability with SQL Database

SQL Database offers the high availability of an enterprise data center without the administrative overhead usually associated with managing your own on-premises servers.

Concept

The goal for SQL Database is to maintain 99.9 percent availability for the subscribers’ databases. This goal is achieved by the use of commodity hardware that can be quickly and easily replaced in the case of machine or drive failure and the management of the replicas, one primary and two secondary, for each SQL Database.

Failure Detection

Management in the data centers needs to detect not only a complete failure of a machine, but also conditions where machines are slowly degenerating and communication with them is affected. The concept of quorum commit, discussed earlier, addresses these conditions. First, a transaction is not considered to be committed unless the primary replica and at least one secondary replica can confirm that the transaction log records were successfully written to disk. Second, if both a primary replica and a secondary replica must report success, small failures that might not prevent a transaction from committing but that might point to a growing problem can be detected.

Reconfiguration

The process of replacing failed replicas is called reconfiguration. Reconfiguration can be required due to failed hardware or to an operating system crash, or to a problem with the instance of SQL Server running on the node in the data center. Reconfiguration can also be necessary when an upgrade is performed, whether for the operating system, for SQL Server, or for SQL Database.

All nodes are monitored by six peers, each on a different rack than the failed machine. The peers are referred to as neighbors. A failure is reported by one of the neighbors of the failed node, and the process of reconfiguration is carried out for each database that has a replica on the failed node. Because each machine holds replicas of hundreds of SQL Databases (some primary replicas and some secondary replicas), if a node fails, the reconfiguration operations are performed hundreds of times. There is no prioritization in handling the hundreds of failures when a node fails; the Partition Manager randomly selects a failed replica to handle, and when it is done with that one, it chooses another, until all of the replica failures have been dealt with.

If a node goes down because of a reboot, that is considered a clean failure, because the neighbors receive a clear exception message.

Another possibility is that a machine stops responding for an unknown reason, and an ambiguous failure is detected. In this case, an arbitrator process determines whether the node is really down.

Although this discussion centers on the failure a single replica, it is really the failure of a node that is detected and dealt with. A node contains an entire SQL Server instance with multiple partitions containing replicas from up to 650 different databases. Some of the replicas will be primary and some will be secondary. When a node fails, the processes described earlier are performed for each affected database. That is, for some of the databases, the primary replica fails, and the arbitrator chooses a new primary replica from the existing secondary replicas, and for other databases, a secondary replica fails, and a new secondary replica is created.

The majority of the replicas of any SQL Database must confirm the commit. At this time, user databases maintain three replicas, so a quorum commit would require two of the replicas to acknowledge the transaction. A metadata store, which is part of the Gateway components in the data centers, maintains five replicas and so needs three confirmations to satisfy a quorum commit. The master cluster, which maintains seven replicas, needs four of them to confirm a transaction. However, for the master cluster, even if all seven replicas fail, the information is recoverable, because mechanisms are in place to rebuild the master cluster automatically in case of such a massive failure.

Failure of Primary Replica

Because all reads and writes take place on the primary replica first, it is immediately evident if and when the primary replica has failed and work cannot continue. When reconfiguring for a failed primary replica, the Partition Manager chooses one of the secondary replicas as the new primary. In general, the secondary replica on the node with the least load is chosen as the new primary replica. This process of promoting a secondary replica to the status of primary is virtually undetectable by most users, so that as far as the users running the applications are concerned, there is no down time. The gateway will pass a “disconnection” message to the client application, and it should immediately attempt to reconnect. Because it may take up to 30 seconds for the information about the new primary replica to propagate through all the gateway servers, the best practice is to try to reconnect several times with a smaller wait time after each failed attempt.

Failure of Secondary Replica

When a secondary replica fails, the database is left with only the two replicas needed for the quorum commit, with no spares. The reconfiguration process is similar to the process that follows failure of the primary replica, in which one of the secondary replicas is promoted to primary. In both cases, only one secondary replica remains, so the Partition Manager attempts, after a short wait to determine whether the failure is permanent, to re-create a new secondary replica.

In some cases, such as an operating system failure or an upgrade, what appears to be a secondary replica failure may not actually be a failure. Because a secondary replica on a failed node may only be out of service temporarily, a new replica is not built immediately. If the secondary replica comes back online, checkdisk and other verification commands are run to confirm the health of the replica.

If a replica stays in the failed state for more than two hours, the Partition Manager starts building a replacement replica. There are of course some cases in which this fixed time-out is not optimum, such as in the cases where the machine fails due to an unrecoverable hardware problem. Future enhancements to SQL Database might include the ability to differentiate different types of replica failure, and the ability to handle nonrecoverable failures more quickly.

If a node does permanently fail, to create a new secondary replica, a machine is chosen from those in the cluster that has sufficient disk space and CPU capacity. This new machine is now used for the new secondary replica. A copy of the database is made from the primary replica, and then it is connected to the existing configuration. The time required for copying the entire database contents is the limiting factor in the maximum manageable size of SQL Databases.

Availability Guarantees

As mentioned earlier, the goal for SQL Database is to maintain 99.9 percent availability. Because of the way that database replicas are distributed across multiple servers and the efficient algorithms for promoting secondary replicas to primary, up to 15 percent of the machines in the data center can be down and the availability can still be guaranteed. In addition, even with the loss of up to 15 percent of the machines, the supportable workload will not be reduced.


Scalability with SQL Database

One of the biggest benefits of hosting your databases with SQL Database is the built-in scalability. You add more databases only when and if you need them, and if the need is only temporary, you can then drop the unneeded databases. There are two components within SQL Database that allow this scalability by continuously monitoring the load on each node. One component is Engine Throttling, which ensures that the server doesn’t get overloaded. The other component is the Load Balancer, which ensures that a server isn’t continuously in the throttled state. In this section, we’ll look at these two components and discuss how engine throttling applies when predefined limits are reached and how load balancing works as the number of hosted database increases.

Throttling

Because of the multitenant use of each SQL Server in the data center, it is possible that one subscriber’s application could render the entire instance of SQL Server ineffective by imposing heavy loads. For example, under full recovery mode, inserting lots of large rows, especially ones containing large objects, can fill up the transaction log and eventually the drive that the transaction log resides on. In addition each instance of SQL Server in the data center shares the machine with other critical system processes that cannot be starved – most relevantly the fabric process that monitors the health of the system, which must be running for SQL Database to stay up.

To keep a data center server’s resources from being overloaded and jeopardizing the health of the entire machine, the load on each machine is monitored by the Engine Throttling component. In addition, each database replica is monitored to make sure that statistics such as log size, log write duration, CPU usage, the actual physical database size limit, and the SQL Database user database size are all below target limits. If the limits are exceeded, the result can be that a SQL Database rejects reads or writes for 10 seconds at a time. Occasionally, violation of resource limits may result in the SQL Database permanently rejecting reads and writes (depending on the resource type in question).

For more information on throttling, see Microsoft Azure SQL Database Connection Management. This article lists the reasons for connection-losses and provides guidelines and best coding practices on how to manage the connection life cycle in SQL Database.

Load Balancer

At this time, although there are availability guarantees with SQL Database, there are no performance guarantees. Part of the reason for this is the multitenant problem: many subscribers with their own SQL Databases share the same instance of SQL Server and the same computer, and it is impossible to predict the workload that each subscriber’s connections will be requesting. However, not having guarantees doesn’t mean that performance is not a critical aspect of the design of the SQL Database infrastructure. SQL Database provides load balancing services that evaluate the load on each machine in the data center.

When a new SQL Database is added to the cluster, the Load Balancer determines the locations of the new primary and secondary replicas based on the current load on the machines.

If one machine gets loaded too heavily, the Load Balancer can move a primary replica to a machine that is less loaded. The simplest way to do this move is to switch a primary and secondary replica for a SQL Database that is performing sluggishly. This switch can have a major and immediate impact on performance, because all reads and writes take place on the primary replica.


SQL Database Development Considerations

Although developing SQL Server applications for SQL Database is very similar to developing applications for an on-premises instance of SQL Server, there are some key differences that you need to be aware of. As mentioned earlier, make sure you monitor the SQL Database Team blog, where some of the most crucial differences are discussed, and where new features are announced when they are added. In addition, you should read the TechNet article Compare SQL Server with Microsoft Azure SQL Database, which is about the differences between SQL Server and SQL Database.

In the section, we point out some of the most important differences that you need to be aware of before you start planning your first SQL Database project.

Connections, Disconnections

When using a web-enabled database like SQL Database, you should be prepared to handle unexpected dropping of connections, including planning for them in your code. The best way to handle connection loss is to re-establish the connection and then re-execute any commands or queries that failed.

If a network problem causes a disconnection, SQL Database cannot return a meaningful error to the application before the session ends. However, if you attempt to reuse this connection (like when you use connection pooling with SQL Server), you receive an error message informing you that a transport-level error has occurred.

Like any other database, SQL Database will at times terminate sessions due to errors, resource shortages, and other transient reasons. In these situations SQL Database always attempts to return a specific error message if the client connection has an active request. However, it may not always be possible to return an error message to a client application if there are no pending requests. For example, if you are connected to your database through SQL Server Management Studio for longer than 30 minutes without an active request, your session will time out, and because there are no active requests, SQL Database can’t return an error message.

Clustered Index Requirement

Unlike SQL Server, every table in your SQL Databases needs to have a clustered index. You might be aware that when you declare a primary key on a table, a clustered index is created on the primary key column(s) by default. Keep in mind that a clustered index on the primary key is not the only way to create your clustered index, and sometimes it is not the best choice. Clustered indexes sort and store the data rows in the table based on their key values. There can only be one clustered index per table, because the data rows themselves can only be sorted in one order, and it is up to you to make sure every table has a clustered index.

SQL Database does not allow heap tables – a heap table, by definition, is a table that doesn't have a clustered index. However, this rule applies only to your SQL Databases. Temporary tables exist in the tempdb database that is part of the underlying SQL Server instances in the data center, and those tables can be heaps.

When additional nonclustered indexes are built on tables that have clustered indexes, those additional indexes reference the table data using the clustered key. In SQL Server, a nonclustered index on a heap references data using physical addresses of the data. Because of the way SQL Database can store multiple SQL Databases within one SQL Server database, referencing actual physical locations would restrict the flexibility of the SQL Database system.

All indexes in SQL Server and SQL Database are stored as B-Trees. The underlying high availability and replication technology in SQL Database is based on replicating B-Tree rows. Because of this structure, you can maintain machines independently, and your system can benefit from I/O optimizations that would not be possible with heaps.

Transaction Management

SQL Database supports local transactions with the normal Transact-SQL commands BEGIN TRANSACTION, ROLLBACK TRANSACTION, and COMMIT TRANSACTION. They work exactly the way they do on SQL Server.

Every SQL Databases is configured with the database options to enable snapshot isolation. Both READ_COMMITTED_SNAPSHOT and ALLOW_SNAPSHOT_ISOLATION database options are set to ON. The default isolation level in both SQL Server and SQL Database is READ COMMITTED, and because of the database option READ_COMMITTED_SNAPSHOT, your SQL Database transactions run in optimistic concurrency. You cannot change the database options in a SQL Database. You may be able to control the isolation level explicitly on a connection using the SET TRANSACTION ISOLATION LEVEL command prior to beginning a transaction. However, you cannot change the default setting, READ COMMITTED with optimistic concurrency, to the SQL Server default setting, READ COMMITTED with pessimistic concurrency. The only way to duplicate the default SQL Server behavior is to use the locking hint WITH (READCOMMITTEDLOCK) with every table in every transaction.


SQL Database Management

Because your SQL Databases are hosted within larger SQL Server instances on machines in the data centers, the management work that you need to do is very limited. However, some maintenance tasks are still necessary. You can use familiar SQL Server tools and methods to perform these tasks. This section discusses common database management tasks such as troubleshooting and maintenance. It also covers tools you use to manage SQL Database, including supported metadata objects, well as jobs, alerts, and other tools.

Maintenance and Upgrades

As a DBA, your primary job is typically to manage regular backups and performance consistency checks. These tasks are now handled by Microsoft within the data centers. DBAs are not responsible for the physical management and placement of files and filegroups, or of monitoring file growth. All physical aspects of dealing with your databases are handled in the data center.

Upgrades are also a typical DBA task. In SQL Database, those are also handled in the data center, one replica at a time. In fact, SQL Database upgrades are released much more often than SQL Server upgrades. Currently, a new SQL Database version is released every couple of months. Upgrades are handled using the same reconfiguration process used when a node goes down, because the node is made unavailable while the upgrade is being performed. In addition to upgrades of SQL Database itself, the hosting SQL Server instance, as well as the operating system software, will occasionally need upgrades or patches, necessitating bringing the node offline temporarily.

For more information about the patching process, see the blog post Patching Microsoft Azure SQL Database (http://blogs.msdn.com/b/sqlazure/archive/2010/04/30/10004818.aspx).

Troubleshooting

Because you have no control over the physical provisioning or configuration of your machines or of the database files, your troubleshooting requirements are minimized. You may need to perform troubleshooting in areas that involve poorly performing queries and concurrency problems, such as blocking. Because your SQL Database runs by default in READ COMMITTED SNAPSHOT isolation level, you may not encounter many blocking problems.

The techniques for troubleshooting blocking problems or suboptimal execution plans are basically the same in SQL Database as for SQL Server, and a thorough discussion of techniques is beyond the scope of this paper. However, this paper does describe the most useful tools available for monitoring and troubleshooting your SQL Database applications. Just like in SQL Server, some of the main tools available for troubleshooting are the dynamic management views (DMVs). However, SQL Database supports only a fraction of the full set of dynamic management views that SQL Server provides, and some of the available dynamic management views work slightly differently. For example, in SQL Server, users need the VIEW SERVER STATE permission to see the contents of many of the views, but in SQL Database, users need the VIEW DATABASE STATE permission. In SQL Server, views such as sys.dm_tran_locks, sys.dm_exec_requests, and sys.dm_exec_query_stats show details for processes and queries across the entire server instance. In SQL Database, these views return information about your SQL Database only. For more information about supported dynamic management views, see “Metadata” later in this paper.

For more information about troubleshooting in SQL Database, see Monitoring Microsoft Azure SQL Database Using Dynamic Management Views (http://msdn.microsoft.com/en-us/library/ff394114.aspx).

Metadata

Metadata objects in SQL Database are a subset of those available in SQL Server. Metadata objects in SQL Server are divided into three categories: compatibility views, catalog views, and dynamic management views. Compatibility views handle backward compatibility, and SQL Database only supports one of these views, called sys.syslanguages. Catalog views are the preferred interface for accessing SQL Server metadata regarding objects, users, and schemas, and many of the SQL Server catalog views are supported in SQL Database. Some of the ones that aren’t supported deal with physical structures such as sys.filegroups and sys.allocation_units, or with features that are not supported in SQL Database, such as sys.fulltext_indexes and sys.service_broker_endpoints.

SQL Database partially supports three categories of dynamic management views useful for troubleshooting, monitoring, and system analysis:

  • Database-related dynamic management views.Only one view in this category is supported:
    • sys.dm_db_partition_stats.
  • Execution-related dynamic management views and functions. Seven objects in this category are supported, and the information returned pertains only to activity in the current SQL Database:
       
  • Transaction-related dynamic management views. Four views in this category are supported, and again, the information returned pertains only to activity in the current SQL Database:
       
    • sys.dm_tran_active_transactions
    • sys.dm_tran_database_transactions
    • sys.dm_tran_locks
    • sys.dm_tran_session_transactions

For the full list of both supported and unsupported metadata objects (which changes with each service update to SQL Database), see System Views (Microsoft Azure SQL Database) (http://msdn.microsoft.com/en-us/library/ee336238.aspx).

Jobs and Alerts

Many DBA tasks in an on-premises instance of SQL Server are managed using jobs and alerts through SQL Server Agent. Because SQL Database does not support SQL Server Agent, setting up any kind of recurring task is a bit more problematic. However, many tasks are not required or possible with SQL Database, such as backup operations. In addition, many common warnings for which you might want to configure alerts, such as database size and system resource usage, cannot be directly observed in SQL Database, so jobs and alerts are really not a feature in a SQL Database DBA’s life.

If you need functionality that is similar to SQL Server Agent for SQL Database for other tasks, you can use a Microsoft Azure worker role and your own custom code. For a series of blog posts that describe the details of how to set this up, see I Miss You SQL Server Agent (http://blogs.msdn.com/b/sqlazure/archive/2010/07/30/10044271.aspx).

Tools

As of this writing, the supported graphical interface for working with SQL Databases is the same tool, SQL Server Management Studio, you use to work with on-premises databases. The online SQL Database management portal also has a lightweight and easy-to-use web-based database management tool for SQL Databases, which was formerly referred to as Project "Houston". This tool is designed for Web developers and other technology professionals who want to quickly develop, deploy, and manage their data-driven applications in the cloud without downloading any client tools. Additionally, Microsoft Visual Studio integrates with SQL Database so developers can view and interact with database objects in their application development environment.


Future Plans for SQL Database

The list of features and capabilities of SQL Database is changing rapidly, and Microsoft is working continuously to release more enhancements.

For example, in SQL Database building big databases means harnessing the easy provisioning of SQL Databases and spreading large tables across many databases. Because of the SQL Database architecture, this also means scaling out the processing power. To assist with scaling out, Microsoft plans to make it easier to manage tables partitioned across a large set of databases. Initially, querying is expected to remain restricted to one database at a time, so developers will have to handle the access to multiple databases to retrieve data. In later versions, improvements are planned in query fan-out, to make the partitioning across databases more transparent to user applications.

Another feature in development is the ability to take control of your backups. Currently, backups are performed in the data centers to protect your data against disk or system problems. However, there is no way currently to control your own backups to provide protection against logical errors and use a RESTORE operation to return to an earlier point in time when a backup was made. The new feature involves the ability to make your own backups of your SQL Databases to your own on-premises storage, and the ability to restore those backups either to an on-premises database or to a SQL Database. Eventually Microsoft plans to provide the ability to perform SQL Database backups across data centers and also make log backups so that point-in-time recovery can be implemented.

Conclusion

Using SQL Database, you can provision and deploy your relational databases and your database solutions to the cloud, without the startup cost, administrative overhead, and physical resource management required by on-premises databases. In this paper, we examined the internals of the SQL Databases. By creating multiple replicas of each user database, spread across multiple machines in the Microsoft data centers, SQL Database can provide high availability and immediate scalability in a familiar SQL Server development environment.



See Also

The author of the original information contained in this article is Kalen Delaney.
Comments
  • Good article, thanks.

  • Thanks for such an in-depth article!! Cleared a lot of doubts.

  • Hi,

    In High-Availability section - it says "The reconfiguration process is similar to the process that follows failure of the primary replica, in which one of the secondary replicas is promoted to primary. In both cases, only one secondary replica remains, so the Partition Manager ..."

    1. I am little confused here "in which one of the secondary replicas is promoted to primary". When secondary replica failed, is it promoting to primary?

    2. "only one secondary replica remains" - when primary replica failed, it has two secondary replica in which one will promote to primary and left with one seconday. Is it correct.

    Please correct me if i am wrong.

  • @Thirumalai Muniswmy

    Hi,

    1. No, when a secondary replica fails, the steps to determine if the system needs to provide a new replica or not (could be a clean failure like update due to reboot) are similar to when the primary replica fails but there are no promotions. The promoting secundary to primary refers to primary failure.

    2. Correct.

  • @Thirumalai Muniswmy

    1. *(could be a clean failure like reboot due to some system update)

  • As I understand, the DoSGuard is implemented on application layer (when TDS connection comes in... protocol parser...). Why? AFAIK, such IP blacklisting can be done on network layer with much less resources consumption (maybe purely network hardware capabilities can be used, not involving higher level structures at all). Am I missing smth?

Page 1 of 1 (6 items)