Data Storage Offerings on the Windows Azure Platform

Data Storage Offerings on the Windows Azure Platform

This article describes the data storage offerings available on the Windows Azure platform.

 Note
If you wish to contribute to this page, use the Edit tab at the top (sign-in required). If you wish to provide feedback for this documentation please either send e-mail to azuredocs@microsoft.com or use the Comment field at the bottom of this page (sign-in required).

The following topics are discussed in this article:

Comparing Windows Server and Windows Azure Data Storage

If you are familiar with developing applications in a Windows Server environment, the data storage options available on the Windows Azure platform can be readily mapped to concepts that you are probably already familiar with. Some things, such as creating or accessing a database in SQL Database, are nearly identical to the Windows Server environment, while others will require different access methods.

Probably the most familiar storage offerings on the Windows Azure platform are SQL Database and Windows Azure Queue storage. SQL Database is, essentially, SQL Server on the Azure platform; you can access it using the same access methods and tools that you would use with SQL Server. For the purposes of this article, we will only discuss the basic database storage functionality.
Windows Azure Queue storage will be familiar to those of you who have used Microsoft Message Queue (MSMQ,) as it serves a similar purpose; durable storage for passing messages between processes.

If you’ve ever worked with a distributed application, or a clustered application, you’ve probably encountered the need for a durable, highly available, shared storage location. The most common solution in a Windows Server environment is a network file share. This could be a cluster of dedicated file servers, a Distributed File System (DFS,) or Network Attached Storage (NAS,) but as far as your application is concerned it’s just a path to the file resource that the application stores data on. The Windows Azure platform offers two specialized storage options that fill this purpose; Windows Azure Blob and Table storage.

For the storage of temporary data on a per instance basis, Windows Azure offers local storage. This provides fast access to the physical storage on the hardware node that your application instance is running on; however it is not durable storage. If an application instance is stopped and then restarted on a different hardware node, data stored in local storage does not follow the application instance.

Back to Top

What is Different about Data Storage on Windows Azure

When you move to the Windows Azure platform, there are some important differences between the Windows Azure platform and a Windows Server environment that you need to plan for:

  • While you can use normal .NET file IO classes with local storage, and ODBC or ADO.NET with SQL Database, both Blob and Table storage require different access methods than you may be familiar with from a Windows Server environment background.
  • Your application must gracefully handle failures when it loses a connection to data storage. Data storage instances or SQL Database servers need to be stopped and restarted on other hardware nodes for maintenance purposes, hardware may fail, etc. Applications written for the Windows Azure platform need to implement retry logic that allows them to handle such occurrences gracefully.
  • Your data store or SQL Database server may be sharing a hardware node with someone else's data or SQL Database server; don't assume that you have unlimited bandwidth or processing usage when accessing your data or running stored procedures. Consider designing your data to be scaled out, or cache it in local storage for faster access.
  • You have to pay monthly for the storage you use, and some storage options cost extra when accessed from outside the Windows Azure platform or when accessed over a certain number of times. Allocate only what you need to use and consider batching IO requests or caching data and periodically refreshing the cache with a worker process.

The bottom line is that the Windows Azure is a fully distributed platform that calls for new storage offerings to meet the needs of distributed applications, as well as new thinking about how data is accessed and used.

Back to Top

Windows Azure Platform Storage Offerings

There are three main storage offerings on the Windows Azure platform, one of which (Windows Azure Storage,) is subdivided into more granular storage options:

Storage Offering Purpose Maximum Size Cost
Local Storage Per-instance temporary storage 250GB to 2TB Included in Compute Account price. See Windows Azure Pricing (http://www.microsoft.com/windowsazure/pricing/) for latest pricing information on Compute accounts.
Windows Azure Storage Durable storage for.....   See Windows Azure Pricing (http://www.microsoft.com/windowsazure/pricing/) for latest pricing information on Storage accounts.

Blob

Large binary objects such as video or audio

200GB or 1TB  

Table

Structured data

100TB  

Queue

Inter-process Messages

100TB  
SQL Database Relational Database Management System 150GB See Windows Azure Pricing (http://www.microsoft.com/windowsazure/pricing/) for latest pricing information on SQL Database.

All of the storage offerings except local storage and Azure Drive are accessible from both inside and outside the Windows Azure Platform. Local storage and Azure Drive are only accessible from applications running on Windows Azure.

Back to Top

Local Storage

Local storage is provided as part of the Windows Azure Compute offering, and provides temporary storage for a running application instance.  Local storage represents a directory on the physical file system of the underlying hardware that an application instance is running on, and can be used to store any information that is specific to the local running application instance. Since a local store is a directory on the local file system, you can use standard .NET file IO with it.

A local store is only accessible by the local instance, and can be configured to persist when the Web or Worker Role the instance runs in is recycled; however this only applies to a simple recycle of the role. If the instance is restarted on different hardware, such as in the case of hardware failure or hardware maintenance, data in the local store will not follow the instance even if it was configured to persist through a recycle. If you require reliable durability of your data, want to share data between instances, or access your data outside of Windows Azure, consider using a Windows Azure Storage account or SQL Database instead.

You can create a number of local stores for each instance, with a default size for each store is 1mb. The store size can be increased to the maximum allowed by your compute instance. The maximum disk space for a compute instance depends on the VM size selected for your instance, and is listed in the following table:

VM Size CPU Cores Memory Disk Space for Local Storage Resources

Small

1

1.7 GB

250 GB

Medium

2

3.5 GB

500 GB

Large

4

7 GB

1 TB

ExtraLarge

8

14 GB

2 TB

For more information on local storage, see:

Back to Top

Windows Azure Storage

Blobs, Tables, and Queues are all available as part of the Windows Azure Storage account, and provide durable storage on the Windows Azure platform. All are accessible from both inside and outside the Windows Azure platform by using classes in the Windows Azure Storage Client SDK (http://msdn.microsoft.com/en-us/library/microsoft.windowsazure.storageclient.aspx,) or via URI using REST APIs (http://msdn.microsoft.com/en-us/library/dd179355.aspx.) Windows Azure Table Storage also supports LINQ access.

Unlike local storage, blobs, tables, and queues are accessible by multiple applications or application instances simultaneously, and represent dedicated storage instead of temporary.

See Windows Azure Pricing (http://www.microsoft.com/windowsazure/pricing/) for latest pricing information on Windows Azure Storage.

Back to Top

Blob

Blobs provide a way to store large amounts of unstructured, binary data, such as video, audio, images, etc.  In fact, one of the features of blobs is streaming content such as video or audio. There are two types of blob storage available, each provides specific functionality:

Block Blob

  • Optimized for streaming (upload and download)
  • Composed of blocks up to 4MB (largest block that can be submitted in one operation)
  • Blocks referenced by a unique Block ID
  • Allows blocks to be uploaded before being committed
  • Maximum size of 200GB (50,000 blocks)

Page Blob

  • Optimized for random access
  • Composed of pages that are referenced by offsets from the beginning of the blob
  • Maximum size of 1TB, which can be composed of multiple pages, or a single 1TB page

Blob storage provides options for storing metadata for each blob and for taking snapshots of blobs for backups. Blobs can also leverage the Content Delivery Network (CDN,) which can be used to cache blobs at a datacenter located near your customers to ensure fast access to the data stored in the blob.

Blob storage can also be used to provide an NTFS file system to applications on the Windows Azure platform. This is called an Azure Drive, and is implemented as a page blob, which contains an NTFS Virtual Hard Drive. This VHD is mounted and exposed as a local drive letter (e.g. X:\,) to the application. The Azure Drive VHD is only available in Windows Azure Guest OS 1.1 or later, and is only mountable by an application hosted within Windows Azure.

For more information on blob storage, see:

Back to Top

Queue Storage

Queues provide storage for passing messages between applications, similar to Microsoft Message Queuing (MSMQ.) Messages stored to the queue are limited to a maximum of 8KB in size, and are generally stored and retrieved on a first in, first out (FIFO,) basis; however FIFO is not guaranteed.

Processing messages from a queue is a two stage process, which involves getting the message, and then deleting the message after it has been processed.  This pattern allows you to implement guaranteed message delivery by leaving the message in the queue until it has been fully processed. If the application processing the message fails before it has completed processing, the message is left in the queue and can be processed by another application. To prevent the message from being processed by multiple applications simultaneously, Getting the message cause it to be marked as invisible when it is first read and remains invisible until it is either deleted or a specified time interval has passed. Peeking the message reads the message but does not mark it as invisible.

For more information on queue storage, see:

Back to Top

Table Storage

Table storage is a collection of row like entities, each of which can contain up to 255 properties; however unlike tables in a database, there is no schema that enforces a certain set of values on all the rows within a table. And while a table stores structured data, it does not provide any way to represent relationships between data. Windows Azure Storage tables are more like rows within a spreadsheet application such as Excel than rows within a database such as SQL Database, in that each row can contain a different number of columns, and of different data types, than the other rows in the same table.

While table storage does support basic operations such as insert, update, delete, and select, it does not support joins, foreign keys, stored procedures, triggers, or any processing on the storage engine side, such as SQL Database does. Queries returning a large number of results, or queries that time out, return partial results along with a continuation token that allows the query to be resumed.

A table can be up to 100TB in size, and each row entity within a table can be up to 1MB in size. Each row entity can contain up to 255 properties (columns,) of which 3 are always a partition key that uniquely identifies the table, a row key that uniquely identifies the row entity within the table, a timestamp. Partition and row keys are limited to 1KB in size, and represent the only index that exists on data within the table. Tables are partitioned based on the partition key for load balancing or scale out, as each partition can be located on a different storage node on the Windows Azure platform.

Properties within a row entity are name value pairs that conform to a subset of the data types defined by the ADO.NET Data Services specification. The following table shows the supported types:

ADO.NET Data Services type Common Language Runtime type Details

Edm.Binary

byte[]

An array of bytes up to 64 KB in size.

Edm.Boolean

bool

A Boolean value.

Edm.DateTime

DateTime

A 64-bit value expressed as Coordinated Universal Time (UTC). The supported DateTime range begins from 12:00 midnight, January 1, 1601 A.D. (C.E.), UTC. The range ends at December 31, 9999.

Edm.Double

double

A 64-bit floating point value.

Edm.Guid

Guid

A 128-bit globally unique identifier.

Edm.Int32

Int32 or int

A 32-bit integer.

Edm.Int64

Int64 or long

A 64-bit integer.

Edm.String

String

A UTF-16-encoded value. String values may be up to 64 KB in size.

For more information on table storage, see:

Back to Top

Windows Azure SQL Database

SQL Database provides a Relational Database Management System for the Windows Azure platform, and is based on SQL Server technology. Similar to SQL Server, SQL Database exposes a tabular data stream (TDS) interface, and Transact-SQL (T-SQL, so many of the tools and applications that work with SQL Server also work with SQL Database. Applications written using existing technologies such as ADO.NET and ODBC that communicate with SQL Server can be used to access SQL Databases with minimal code changes. SQL Database also provides standard SQL Server features such as stored procedures, views, multiple indices, joins, aggregation, etc.

Since the Windows Azure platform does not provide direct access to the underlying hardware, administration tasks that involve hardware access, such as defining where the database file is located, are inaccessible in SQL Database. Physical administration tasks are handled automatically by the platform, though you must still perform logical administration tasks such as creating logins, users, roles, etc. Because you cannot directly access the hardware, there are some differences between SQL Server and SQL Database in terms of administration, provisioning, T-SQL support, programming model and features. For more information, see General Guidelines and Limitations (Windows Azure SQL Database) (http://msdn.microsoft.com/en-us/library/ee336245.aspx.)

Unlike the other storage offerings discussed so-far, SQL Database provides more than simple data storage; it also provides server side processing, which allows you to perform complex processing on stored data without having to retrieve and process the entire data set within your application. For example, a query to find all salesmen with sales greater than $1,000.00 in the past year, within a specific region of the country, and provide a sum total of all their sales, can be ran completely by SQL Server and the results returned to your application.

A SQL Database can be up to 150GB in size and can contain multiple tables with complex relationships between data in the tables. Rows can be up to 8MB in size, and can contain 1024 columns. A table within SQL Database can have one clustered index on any column, and up to 999 secondary indexes.

See Windows Azure Pricing (http://www.microsoft.com/windowsazure/pricing/) for latest pricing information on SQL Database.

For more information on SQL Database, see:

Back to Top

See Also

Back to Top

References

Back to Top

Sort by: Published Date | Most Recent | Most Useful
Comments
  • I don't see any discussion here on the actual performance of each storage type. Either real numbers or ball park rations.  For example, I was going to use table storage to log important audit events but discovered that the inserts seem to be about 100x slower than inserting in sqlserver.  Even if I queued them, I believe it would still be too slow for my app.  

    If you could have discussions like that, it would help me to figure out which medium to use.

    Thanks for the article, all help and guidance is good

  • Thanks for the wonderful consolidation of information and references

  • Good Article.

  • Good starting point.  Not all ERP consultants have had to consider hardware configuration.  http://www.moore-resources.com

  • GOOD ARTICLE!!!

  • I think I'll go through this before re-organizing another Windows Azure Bootcamp in future.Thnx for posting :D

  • Good article.

  • Excellent article. Just what i wanted to know about Azure Storage.

Page 1 of 1 (8 items)