With Storage Spaces, Windows Server 2012 provides the storage performance, flexibility, and scale required by today’s enterprise workloads. This paper focuses on the parameters that impact the performance of Storage Spaces and how these parameters can be optimized for specific workloads and enable Storage Spaces to achieve optimal performance.
This paper applies to the following operating systems:
The suggested audience for this paper is:
The performance of a storage solution is important when evaluating its cost efficiency and usefulness. Storage Spaces is a storage platform that was designed with performance as one of its primary goals. The premise was to deliver what the underlying hardware could provide. There exist a large number of variables that affect performance, responsiveness, and resiliency. Most notably, the characteristics of a given workload, such as Microsoft SQL Server, Microsoft Exchange, or virtual desktop infrastructure (VDI), determine how the storage can be optimally configured. In some scenarios raw throughput – often measured in MB/s – is the most important factor, while workloads with random data transfer patterns typically measure effectiveness in terms of input/output operations (I/Ops). Storage Spaces is capable of being a storage back-end to a variety of workloads. This paper describes how you can configure Storage Spaces for optimal performance and what hardware to select for a given workload.
For a glossary of common storage terms, see the end of the paper.
Windows Server 2012 introduces Storage Spaces, which is a new capability in Windows that enables the aggregation and virtualization of physical disks into logical groups, known as storage pools and storage spaces. You can choose the resiliency type of a storage space based on business needs, with choices of Simple (no resiliency), Mirror, and Parity. Below is a short explanation of how each resiliency type works.
For an introduction to Storage Spaces, please see this blog post: http://go.microsoft.com/fwlink/?LinkId=260762
The concept of striping is most easily explained using the example of a simple space. Striping refers to the idea of writing data across multiple disks to reduce access and response times. To use a simplified example: imagine you have four disks and 1 MB of data to write to these disks. You could either write all of the data to a single disk and access it from there, or write 256KB to each of the four disks simultaneously, resulting in a 4x decrease in write times. This general principal holds true for sequential, as well as random workloads. The more disks the storage space can stripe across, the better the performance will be. Note however that this can only be realized if your applications have enough data outstanding. Striping in storage spaces can be manipulated by the NumberOfColumns and Interleave parameters – explained later in this paper.
The diagram below shows a simple example of striping. Using the example of a simple space, you can see that 256 KB are allocated on each of the four disks to write a total of 1 MB of data. A1 through A4 represent that data in the diagram. The next write to this storage space will allocate another four 256 KB chunks and write data to it, depicted by B1 through B4. Note that the 256 KB chunks are not necessarily in the same location in the below diagram:
Simple spaces are not resilient to disk failures and should thus be used only for temporary data and data that can easily be recreated, for example intermediary video rendering files or caches. From a performance standpoint, simple spaces provide the best overall performance when considering reads and writes. They provide the cumulative performance of multiple disks, depending on how many disks are in the storage pool.
Mirroring refers to creating two or more copies of data and storing them in separate places, so that if one copy gets lost the other is still available. Mirror spaces use this concept to become resilient to one or two disk failures, depending on the configuration.
Take, for example, a two-column two-way mirror space. Mirror spaces add a layer of data copies below the stripe, which means that one column, two-way mirror space duplicates each individual column's data onto two disks.
Assume 512 KB of data are written to the storage space. For the first stripe of data in this example (A1), Storage Spaces writes 256 KB of data to the first column, which is written in duplicate to the first two disks. For the second stripe of data (A2), Storage Spaces writes 256 KB of data to the second column, which is written in duplicate to the next two disks. The column-to-disk correlation of a two-way mirror is 1:2; for a three-way mirror, the correlation is 1:3.
Reads on mirror spaces are very fast, since the mirror not only benefits from the stripe, but also from having 2 copies of data. The requested data can be read from either set of disks. If disks 1 and 3 are busy servicing another request, the needed data can be read from disks 2 and 4.
Mirrors, while being fast on reads and resilient to a single disk failure (in a two-way mirror), have to complete two write operations for every bit of data that is written. One write occurs for the original data and a second to the other side of the mirror (disk 2 and 4 in the above example). In other words, a two-way mirror requires 2 TB of physical storage for 1 TB of usable capacity, since two data copies are stored. In a three-way mirror, two copies of the original data are kept, thus making the storage space resilient to two disk failures, but only yielding one third of the total physical capacity as useable storage capacity. If a disk fails, the storage space remains online but with reduced or eliminated resiliency. If a new physical disk is added or a hot-spare is present, the mirror regenerates its resiliency.
Parity leverages computation to recover data from a failed disk. One of its columns (explained later) in this type of storage space is used to store a parity-bit, which can be used to reconstruct data from a failed disk. Storage Spaces utilizes rotating parity which means that the parity bit for all stripes does not reside on a single disk, but that it rotates from stripe to stripe across different disks. In essence, every disk in a parity space contains data as well as parity information.
In the diagram below, a 768 KB stripe of data is written across disks 1 through 3 (A1, A2, A3), while the corresponding parity bit (AP) is placed on disk 4. For the second stripe of data, Storage Spaces writes the data on disks 1, 2 and 4, thereby rotating the parity to disk 3 (BP). Since parity is striping across all but one disk in the storage space it has good read performance while still providing resiliency to a single disk failure. Capacity utilization is more efficient than a mirror space.
The caveat of a parity space is low write performance compared to that of a simple or mirrored storage space, since existing data and parity information must be read and processed before a new write can occur. Parity spaces are an excellent choice for workloads that are almost exclusively read-based, highly sequential, and require resiliency, or workloads that write data in large sequential append blocks (such as bulk backups).
The sequential write performance of parity spaces can be improved by using dedicated journals. A journal is used to stage and coalesce writes to provide resiliency for in-flight I/Os. This journal resides on the same disks as the parity space unless you designate physical disks in the pool as journal disks. The journal is a mirror space and thus resilient by itself. The advantage of dedicated journal disks is a significant improvement in sequential write throughput for parity spaces. Incoming writes are de-staged to the parity space from dedicated disks, thus significantly reducing seeking on the disks used by the parity space. Using two SSDs as a dedicated journal to a three disk parity space, a sequential throughput increase of ~150% was observed under lab conditions. Note that the throughput of the journal disks will now be the overall throughput limit to all parity spaces created on this specific storage pool and you might trade extra capacity for performance. In other words, ensure that dedicated journal disks are very fast and scale the number of journal disks with the number of parity spaces on the pool.
The type of disk that is used for a given workload is important, as there are significant performance and capacity differences between them. The list below is a quick overview of what types of disks are generally available and what their characteristics are:
To ensure that SSDs do not wear out prematurely Windows, supports trim and unmap commands. The storage optimizer sends trim commands to SSD devices to maximize their longevity. Furthermore, if applications support trim and/or un-map, Windows honors those commands and passes them on to the underlying device.
Many storage systems contain mixed media types. This means that a few high capacity 7,200 RPM disks are available for storing backups, or data that is not accessed very often. 10,000 RPM or 15,000 RPM disks might be deployed to house databases or other data that needs to be accessed and manipulated relatively quickly. Lastly, SSDs could be found in such a system as well, storing data that has to be readily available or data that is very sensitive to access latency, such as transaction logs or parent VHDs in a VDI-type deployment. This type of provisioning is also possible with Storage Spaces and allows the user to take advantage of the various characteristics of different disk types without having to buy the most high-performing type of disk for the job. The system administrator can create pools of different disk types, or group different disk types in a single pool and manually select which disks should be used for which storage space. These types of heterogeneous storage deployments are fully supported by Storage Spaces.
Storage Spaces supports disks using SAS, SATA, and USB interfaces. SAS is typically considered enterprise-grade, while SATA equates to consumer storage. The major difference is that most SAS disks have two links to connect to, as opposed to just one for SATA. This allows SAS disks to be shared resources and thus be used as storage in clustered and high-availability environments and be resilient to a single link failure. Additionally, SATA disks attached via a “just a bunch of disks” (JBOD) enclosure typically incur a slight performance penalty when compared to SAS. SATA SSDs further exacerbate the problem by having an order of magnitude larger CPU overhead. USB disks, while supported by Storage Spaces, typically do not qualify for high-performing system configurations. We recommend utilizing dual-port SAS disks when possible.
Storage Spaces takes full advantage of the performance capabilities of the underlying storage. The following charts show the performance scaling of a Simple Space with up to 32 disks, with data striped across the disks in a simple space with no resiliency:
At TechEd North America 2012, as well as TechEd Europe 2012, Microsoft presented a three-node cluster system backed by 24 enterprise-level SSDs and Storage Spaces. The performance of this setup showcased a random read 1.4 million I/Ops and 10.9 GB/s of sequential throughput. More details about this system can be found in the Sample Deployments section of this paper.
Two parameters that are important in any workload are block size and interleave (stripe size). Block size is a measurement (usually in KB) of the I/Os that the storage sub-system has to handle. Many database-style applications such as Microsoft SQL Server and Microsoft Exchange use relatively small blocks (8KB – 32KB), though other applications may use larger blocks. Sequential workloads profit significantly from larger block sizes. In general, IOPs and throughput are inversely related to block size: a small block size yields high IOPs with low throughput, while a large block size results in high throughput with low IOPS.
Storage Spaces can utilize multiple physical disks by striping data across them. One of the parameters that can be adjusted when creating a storage space is how large this stripe is supposed to be. The default setting is 256 KB. This means that Storage Spaces stores 256 KB of data on each column that was assigned to the storage space. If, for example, you have eight disks in the storage pool and want to write 4 MB of data, each disk receives two 256KB stripes (4MB / 256KB (Stripe) / 8 Columns = 2x 256KB stripes per column).
To maximize performance, ensure that the interleave value used by the storage space is at least as large as the I/Os of your workload. I/Os that exceed the interleave size are split into multiple stripes, turning one write into multiple writes, reducing performance.
The number of columns specifies how many physical disk data is striped across and has an impact on performance regardless of stripe and block size. Storage Spaces intelligently scales the column count up to eight by default, but you can adjust this parameter by using Windows PowerShell.
One example of how the number of columns affects performance is as follows. Assume that you create a simple (no resiliency) storage space in a storage pool with 12 disks, and that each disk is capable of 150 MB/s sequential throughput. Creating a simple space with one column yields a maximum throughput of 150 MB/s (the concept of spanning creates exceptions – see next section). Using two columns increases this to almost 300 MB/s, with further increases scaling linearly (for example, four columns yields close to 600 MB/s throughput). The more columns a storage space is assigned, the more disks can be striped across and the higher performance.
If you choose a high column count storage space and you intend to expand the capacity of the storage space in the future, you should add at least as many disks as the storage space has columns (in case of a simple or parity space), or columns times number of data copies (in the case of a mirrored space). For instance, when deploying a two-way mirrored space with four columns, you should have eight disks with available capacity (two columns times two data copies), and add disks in groups of eight when expanding the storage pool.
The following table provides real-world performance numbers comparing two simple spaces with different numbers of columns. In both conditions, the storage pool contained four SSDs, and each storage space received a sequential read request of 1 MB I/Os, akin to reading a large number of 1 MB files.
Disks
Total Capacity (TB)
Columns
Throughput (MB/s)
Avg. Latency (ms)
4
1.45
2
902.45
8
1622.50
The data shows that increasing the number of columns can significantly improve performance for sequential workloads, though at the expense of some flexibility when expanding capacity (you should add disks in groupings that match the number of columns x data copies). Random workloads do not experience as significant a performance increase, exhibiting more uniform performance across different column counts.
Another factor that ties in directly with the column count of the space is the amount of outstanding I/Os or data that is to be read from or written to the storage space. A large number of columns benefits applications that generate enough load to saturated multiple disks, but introduce unnecessary limitation on capacity expansion for less demanding applications.
The following example describes this trade-off: A typical hard disk is saturated writing a large number of 1MB files or block data in a sequential fashion. To saturate a simple space with four hard disks and the same workload, it is necessary to provide roughly 32 outstanding I/Os of this type. If less data is outstanding, not all disks are fully utilized. If the application using this storage space will never have more than 16 I/Os outstanding at a time, creating a storage space with two manually assigned disks could service this workload and preserve the other two disks for capacity expansion or for use with another application (using its own manually allocated storage space). This also has the advantage that the administrator can add two disks to extend the storage space as opposed to four.
To adjust the number of columns used by storage spaces, you must use Windows PowerShell. For more information, see: http://go.microsoft.com/fwlink/p/?LinkId=260763
There are a few more points to consider and adjust when maximizing the performance of Storage Spaces. To get the most out of a storage subsystem it is important to ensure that the physical disks are the bottleneck and not the rest of the infrastructure, such as cables, HBAs, PCI slots, CPUs, or MPIO, as discussed below:
As with all systems, there are a few important considerations for every deployment. Here is a short list of Storage Spaces best practices.
The following example describes how to deploy Storage Spaces in a standalone Microsoft SQL Server instance.
To build a resilient system it is important to have redundant paths from the server to the storage. The below diagram depicts how a standalone server can be connected to two JBODs in such a fashion that the system would be resilient to path-, HBA-, Disk-, I/O-Module- and JBOD failure.
The typical workload of an actively used SQL database is random in nature with relatively small block sizes (8KB – 32KB). The read / write percentage can vary significantly, based on what kind of application(s) the database supports. As a result of these characteristics, it is advisable to utilize fast disks (10,000 RPM or faster) and keep the storage space stripe size relatively small to accommodate the blocks read and written by the database.
All 48 disks can be aggregated into a single storage pool. The active database tables and indexes can be stored on two mirror spaces with 12 10,000 RPM manually assigned and the following parameters:
Database Tables – Storage Provisioning (2 Mirror Spaces)
Number of Columns
6
Number of Data Copies
Resiliency Setting
Mirror
Provisioning Type
Fixed
Interleave
64KB
Physical Disks To Use
<6 10,000 RPM disks from each JBOD>
This will allow for fast access times to the data and 7.2 TB of total capacity for the database tables.
Logs are important in SQL deployments and require relatively low (fast) read/write latencies to ensure smooth operation of the database. The SQL logs will be placed on a mirror space in a pool with 15,000 RPM disks from each JBOD. The parameters are almost the same as above:
Database Logs – Storage Provisioning (1 Mirror Space)
<4 15,000 RPM disks from each JBOD>
Lastly, we need Space for backups of the database. Backup disks typically do not have to be very fast as the access pattern changes here. Backups are often in larger chunks and more sequential than day-to-day operations. Capacity is of importance though. The 1 TB, 7,200 RPM disks are a good capacity option and can be configured with the following parameters:
Database Backup – Storage Provisioning (1 Mirror Space)
256 KB
<8 7.2k RPM disks from each JBOD>
This provisioning scheme yields a total database capacity of ~7.2 TB, log capacity of ~584 GB, and ~8 TB of backup capacity while being highly focused on resiliency to component failure. Individual disk failures in each storage space can be tolerated, as well as adapter, cable, or JBOD failure. If this degree of resiliency is not required, then the provisioning can change to provide more performance or capacity.
This cluster deployment element is designed to provide extremely high levels of performance at a low price point. Over 1.4 million random read I/Ops and 10.9GB/s of sequential throughput have been demonstrated on the below hardware with Storage Spaces.
The below link is a recording of the TechEd talk presenting Storage Spaces in depth. The performance demo can be found in the talk “Windows Server 2012 Storage Solutions: Storage Capabilities for Everyone” at time 23:51: http://video.ch9.ms/teched/2012/na/WSV327.wmv
As shown in this article, Storage Spaces is capable of scaling to multi-million I/Ops while satisfying many different workload types that will shape the behavior and configuration of the storage solution. Further, examples were provided of a near doubling in sequential performance when foregoing some flexibility. Storage Spaces has parameters that can be fine-tuned toward workload needs in order to achieve optimal performance for a wide range of different applications, thus allowing customization toward specific business needs. The use of commodity hardware makes Storage Spaces deployments flexible, easily expandable, and cost-efficient.
Below is a list of terminology used in this paper:
Term:
Definition:
Column
Columns correlate to underlying physical disks across which one stripe of data for a storage space is written. Typically the column count will be equal to the number of physical disks of the storage space (for simple spaces) or half of the number of disks (for mirror spaces). The column count can be lower than the number of physical disks but never higher.
Interleave represents the amount of data written to a single column per stripe.
Block Size
Block size is a measure of how large or small, in kilobytes or megabytes a read or write request is. This is one of the most important parameters of a workload.
Outstanding I/Os
Also referred to as “Queue Depth”, outstanding I/Os measures how many read or write requests are queued up to be serviced by a disk, Space or other device.
Hat tip to OP -- I've been struggling to find detailed technical articles and best practices for Storage Spaces, this is exactly the kind of thing I've been searching for.
Yeah, this is awesome. Thanks for taking the time to post this. Great article. Very informative.
Really good article !, I added it as a nonimee to be featured on the technet homepage
Really good article ! A added it to be a nominee to be featured