This article describes the data storage offerings available on the Azure platform.
The following topics are discussed in this article:
If you are familiar with developing applications in a Windows Server environment, the data storage options available on the Windows Azure platform can be readily mapped to concepts that you are probably already familiar with. Some things, such as creating
or accessing a database in SQL Database, are nearly identical to the Windows Server environment, while others will require different access methods.
Probably the most familiar storage offerings on the Windows Azure platform are SQL Database and Azure Queue storage.
SQL Database is, essentially, SQL Server on the Azure platform; you can access it using the same access methods and tools that you would use with SQL Server. For the purposes of this article, we will only discuss the basic database storage functionality.
Queue storage will be familiar to those of you who have used Microsoft Message Queue (MSMQ,) as it serves a similar purpose; durable storage for passing messages between processes.
If you’ve ever worked with a distributed application, or a clustered application, you’ve probably encountered the need for a durable, highly available, shared storage location. The most common solution in a Windows Server environment is a network file share.
This could be a cluster of dedicated file servers, a Distributed File System (DFS,) or Network Attached Storage (NAS,) but as far as your application is concerned it’s just a path to the file resource that the application stores data on. The Windows Azure
platform offers two specialized storage options that fill this purpose; Windows Azure
For the storage of temporary data on a per instance basis, Windows Azure offers
local storage. This provides fast access to the physical storage on the hardware node that your application instance is running on; however it is not durable storage. If an application instance is stopped and then restarted on a different hardware node,
data stored in local storage does not follow the application instance.
Back to Top
When you move to the Windows Azure platform, there are some important differences between the Windows Azure platform and a Windows Server environment that you need to plan for:
The bottom line is that the Windows Azure is a fully distributed platform that calls for new storage offerings to meet the needs of distributed applications, as well as new thinking about how data is accessed and used.
There are three main storage offerings on the Windows Azure platform, one of which (Windows Azure Storage,) is subdivided into more granular storage options:
Large binary objects such as video or audio
All of the storage offerings except local storage and Azure Drive are accessible from both inside and outside the Windows Azure Platform. Local storage and Azure Drive are only accessible from applications running on Windows Azure.
Local storage is provided as part of the Windows Azure Compute offering, and provides temporary storage for a running application instance. Local storage represents a directory on the physical file system of the underlying hardware that an application instance
is running on, and can be used to store any information that is specific to the local running application instance. Since a local store is a directory on the local file system, you can use standard .NET file IO with it.
A local store is only accessible by the local instance, and can be configured to persist when the Web or Worker Role the instance runs in is recycled; however this only applies to a simple recycle of the role. If the instance is restarted on different hardware,
such as in the case of hardware failure or hardware maintenance, data in the local store will not follow the instance even if it was configured to persist through a recycle. If you require reliable durability of your data, want to share data between instances,
or access your data outside of Windows Azure, consider using a Windows Azure Storage account or SQL Database instead.
You can create a number of local stores for each instance, with a default size for each store is 1mb. The store size can be increased to the maximum allowed by your compute instance. The maximum disk space for a compute instance depends on the VM size selected
for your instance, and is listed in the following table:
For more information on local storage, see:
Blobs, Tables, and Queues are all available as part of the Windows Azure Storage account, and provide durable storage on the Windows Azure platform. All are accessible from both inside and outside the Windows Azure platform by using classes in the
Windows Azure Storage Client SDK (http://msdn.microsoft.com/en-us/library/microsoft.windowsazure.storageclient.aspx,) or via URI using
REST APIs (http://msdn.microsoft.com/en-us/library/dd179355.aspx.) Windows Azure Table Storage also supports LINQ access.
Unlike local storage, blobs, tables, and queues are accessible by multiple applications or application instances simultaneously, and represent dedicated storage instead of temporary.
See Windows Azure Pricing (http://www.microsoft.com/windowsazure/pricing/) for latest pricing information on Windows Azure Storage.
Blobs provide a way to store large amounts of unstructured, binary data, such as video, audio, images, etc. In fact, one of the features of blobs is streaming content such as video or audio. There are two types of blob storage available, each provides specific
Blob storage provides options for storing metadata for each blob and for taking snapshots of blobs for backups. Blobs can also leverage the Content Delivery Network (CDN,) which can be used to cache blobs at a datacenter located near your customers to ensure
fast access to the data stored in the blob.
Blob storage can also be used to provide an NTFS file system to applications on the Windows Azure platform. This is called an Azure Drive, and is implemented as a page blob, which contains an NTFS Virtual Hard Drive. This VHD is mounted and exposed as a
local drive letter (e.g. X:\,) to the application. The Azure Drive VHD is only available in Windows Azure Guest OS 1.1 or later, and is only mountable by an application hosted within Windows Azure.
For more information on blob storage, see:
Queues provide storage for passing messages between applications, similar to Microsoft Message Queuing (MSMQ.) Messages stored to the queue are limited to a maximum of 8KB in size, and are generally stored and retrieved on a first in, first out (FIFO,) basis;
however FIFO is not guaranteed.
Processing messages from a queue is a two stage process, which involves getting the message, and then deleting the message after it has been processed. This pattern allows you to implement guaranteed message delivery by leaving the message in the queue
until it has been fully processed. If the application processing the message fails before it has completed processing, the message is left in the queue and can be processed by another application. To prevent the message from being processed by multiple applications
simultaneously, Getting the message cause it to be marked as invisible when it is first read and remains invisible until it is either deleted or a specified time interval has passed. Peeking the message reads the message but does not mark it as invisible.
For more information on queue storage, see:
Table storage is a collection of row like entities, each of which can contain up to 255 properties; however unlike tables in a database, there is no schema that enforces a certain set of values on all the rows within a table. And while a table stores structured
data, it does not provide any way to represent relationships between data. Windows Azure Storage tables are more like rows within a spreadsheet application such as Excel than rows within a database such as SQL Database, in that each row can contain a different
number of columns, and of different data types, than the other rows in the same table.
While table storage does support basic operations such as insert, update, delete, and select, it does not support joins, foreign keys, stored procedures, triggers, or any processing on the storage engine side, such as SQL Database does. Queries returning
a large number of results, or queries that time out, return partial results along with a continuation token that allows the query to be resumed.
A table can be up to 100TB in size, and each row entity within a table can be up to 1MB in size. Each row entity can contain up to 255 properties (columns,) of which 3 are always a partition key that uniquely identifies the table, a row key that uniquely
identifies the row entity within the table, a timestamp. Partition and row keys are limited to 1KB in size, and represent the only index that exists on data within the table. Tables are partitioned based on the partition key for load balancing or scale out,
as each partition can be located on a different storage node on the Windows Azure platform.
Properties within a row entity are name value pairs that conform to a subset of the data types defined by the
ADO.NET Data Services specification. The following table shows the supported types:
An array of bytes up to 64 KB in size.
A Boolean value.
A 64-bit value expressed as Coordinated Universal Time (UTC). The supported
DateTime range begins from 12:00 midnight, January 1, 1601 A.D. (C.E.), UTC. The range ends at December 31, 9999.
A 64-bit floating point value.
A 128-bit globally unique identifier.
Int32 or int
A 32-bit integer.
Int64 or long
A 64-bit integer.
A UTF-16-encoded value. String values may be up to 64 KB in size.
For more information on table storage, see:
SQL Database provides a Relational Database Management System for the Windows Azure platform, and is based on SQL Server technology. Similar to SQL Server, SQL Database exposes a tabular data stream (TDS) interface, and Transact-SQL (T-SQL, so many of the
tools and applications that work with SQL Server also work with SQL Database. Applications written using existing technologies such as ADO.NET and ODBC that communicate with SQL Server can be used to access SQL Databases with minimal code changes. SQL Database
also provides standard SQL Server features such as stored procedures, views, multiple indices, joins, aggregation, etc.
Since the Windows Azure platform does not provide direct access to the underlying hardware, administration tasks that involve hardware access, such as defining where the database file is located, are inaccessible in SQL Database. Physical administration
tasks are handled automatically by the platform, though you must still perform logical administration tasks such as creating logins, users, roles, etc. Because you cannot directly access the hardware, there are some differences between SQL Server and SQL Database
in terms of administration, provisioning, T-SQL support, programming model and features. For more information, see
General Guidelines and Limitations (Windows Azure SQL Database) (http://msdn.microsoft.com/en-us/library/ee336245.aspx.)
Unlike the other storage offerings discussed so-far, SQL Database provides more than simple data storage; it also provides server side processing, which allows you to perform complex processing on stored data without having to retrieve and process the entire
data set within your application. For example, a query to find all salesmen with sales greater than $1,000.00 in the past year, within a specific region of the country, and provide a sum total of all their sales, can be ran completely by SQL Server and the
results returned to your application.
A SQL Database can be up to 150GB in size and can contain multiple tables with complex relationships between data in the tables. Rows can be up to 8MB in size, and can contain 1024 columns. A table within SQL Database can have one clustered index on any
column, and up to 999 secondary indexes.
See Windows Azure Pricing (http://www.microsoft.com/windowsazure/pricing/) for latest pricing information on SQL Database.
For more information on SQL Database, see:
I don't see any discussion here on the actual performance of each storage type. Either real numbers or ball park rations. For example, I was going to use table storage to log important audit events but discovered that the inserts seem to be about 100x slower than inserting in sqlserver. Even if I queued them, I believe it would still be too slow for my app.
If you could have discussions like that, it would help me to figure out which medium to use.
Thanks for the article, all help and guidance is good
Thanks for the wonderful consolidation of information and references
Good starting point. Not all ERP consultants have had to consider hardware configuration. http://www.moore-resources.com
I think I'll go through this before re-organizing another Windows Azure Bootcamp in future.Thnx for posting :D
Excellent article. Just what i wanted to know about Azure Storage.