This document is part of a collection of documents that comprise the
Reference Architecture for Private Cloud
document set. The Reference Architecture for Private Cloud documentation is a community collaboration project. Please feel free to edit this document to improve its quality. If you would like to be recognized for your work on improving this article,
please include your name and any contact information you wish to share at the bottom of this page.
This article is no longer being updated by the Microsoft team that originally published it. It remains online for the community to update, if desired. Current documents from Microsoft that help you plan for cloud solutions with
Microsoft products are found at the TechNet Library
Solutions or Cloud and Datacenter
When orchestrating the provision or migration of these workloads the fabric may employ several methods to select an appropriate host to accommodate the resource instance. One such method may be intelligent placement based upon host utilization at the time
the selection is made. Utilization measurements include processor utilization, memory utilization and I/O utilization or may be an overall aggregation of these values. Another method may be to pre-configure resource classifications by group and associate the
group with host resources. This would cause all Extra-Small instances to be hosted by one group of hosts, all Small instances to be hosted by another and so on.
Therefor the selection of a scale unit based on technical characteristics will evaluate the anticipated workload demands on the infrastructure and the resource classification mix each workload requires of the infrastructure.
The operational characteristics that influence the selection of a scale unit are driven by the technology innovations by hardware vendors to grow resource density while reducing the physical footprint. High unit densities in smaller areas require new and
innovative techniques to package and integrate resources. Modern server configurations include a chassis supporting multiple compute blades with storage and networking connections integrated into the backplane.
Including these high density chassis into traditional datacenter racks and deployment of racks within the datacenter have driven new designs for how power and cooling are implemented including the creation of cold and hot rows in the datacenter to provide efficient
and effective airflow distribution and exhaust.
These physical characteristics and deployment methods may influence scale unit selection since the built out and reconfiguration of datacenter environmental infrastructure maybe be prohibitive or impractical.
Another operational characteristic involves the platform vendors approach to exposing holistic management interfaces for performing low-level operations on the hardware platform that include:
Extending the Fabric Management down to the lowest levels of the physical platform provisioning enable a private cloud to respond elastically by dynamically bringing host resources online and offline as needed. This can only be accomplished if the host platform
supports a rich administration capability that can be integrated into private cloud management and operations.
Therefor the management capabilities provided by the host platform and the physical characteristics influence the selection of the private cloud scale unit at a given point in time and is influenced by the design and operational aspects of the datacenter.
The selection of scale unit configuration and size are left to the IT architect and implementer to decide based on a combination of business, technical and operational drivers. There is no one size that fits all. A scale unit may be as small as one integrated
blade chassis configured with redundancy to implement a physical fault domain or large container of racks and resources that implement multiple physical fault domains.
The server hosts that make up the physical hardware of a Private Cloud Infrastructure as a Service platform have two primary functions. The first is to physically host workload applications that are incorporated into services hosted on the private cloud.
While this is technically possible on modern hardware, realistically workloads are more likely to be hosted in a virtualized environment in the private cloud. That said this scenario is also used to host workloads that have some constraint, restriction or
other limitation that prevents the workload from being virtualized.
The second and most common use for the physical servers is running the core operating system platform that hosts virtual machine instance provisioned by the Management and Operations Layers of the Private Cloud. The primary goal of Fabric Management is to configure
and provision virtual machine instances in response to a service management request, those virtual machine instances must be hosted somewhere in the private cloud infrastructure. That somewhere is the virtual machine hosts.
Ideally in the most mature organizations the IT processes for configuration, management and operations of virtual machine hosts is encoded into automated scripts and workflows that can be integrated into the private cloud fabric management.
This is referred to as bare-metal provisioning of physical host resources. Consider the notion of capacity in the private cloud. Capacity is defined by the ability of fabric management to compose compute resources into pools classified by capability. These
pools are allocated into resources that make up a service and assigned into an upgrade domain for service deployment. Capacity is limited by the number of resource instances that can be hosted by the physical hosts in the private cloud infrastructure.
To dynamically grow (or collapse) capacity to meet demands implies an ability to add or remove physical host resources to the fabric. Minimally these resources can be configured and brought online manually and made available to the fabric. However to reach
the maturity level expected for a private cloud, physical host resource provisioning must be integrated into the private cloud Management and Operation Layer Fabric Management.
This dynamic capability allows a private cloud to respond to a service request for more resources than can be accommodated by the host capacity currently online. To accommodate the service request fabric management will initiate an automated workflow process
to perform bare-metal provisioning of a new host and configure the resource into fabric management thereby increasing the overall capacity of the private cloud. The process of dynamically growing the capacity of the private cloud is transparent to the user
or service owner such that capacity of the private cloud appears infinite.
We already touched on some of the host server characteristics required to realize this level of IT maturity and service management. In the next section we’ll expand on the host server architecture in more detail.
The host server architecture is a critical component of the private cloud infrastructure, as well as a key variable in the consolidation ratio and cost analysis. The ability of the host server to handle the workload of a large number of consolidation candidates
increases the consolidation ratio and helps provide the desired cost benefit. This section outlines the major host server architecture considerations and Section 7 provides three host server reference architectures.
Microsoft does not recommend specific server vendors or models in this section because there are many excellent candidates. Microsoft does recommend finding the best cost/performance ratio in the host server choice by focusing on mainstream Intel or AMD servers
toward the high end of the commodity server scale. The “sweet spot” is typically in the two- to four-socket servers running the highest or second highest CPU speed with multi-core processors.
The system architecture of the host server refers to the general category of the server hardware itself. Examples include rack mounted servers, blade servers, and large symmetric multiprocessor servers (SMP). The primary tenet to consider when selecting
system architectures is that each Virtual Server host will contain multiple guests with multiple workloads. Processor, RAM, Storage, and Network capacity are critical, as well as high I/O and low latency. It is critical to ensure that the host server is able
to provide the required capacity in each of these categories.
The most common system architecture is a standard rack mounted server. Typically found in 2U or 4U models, these servers typically contain 2 to 4 CPU sockets, 2 to 8 PCI-E or PCI-X slots, and 4 to 6 hard disk bays. Rack mounted servers are excellent choices
for Hyper-V hosts due the low cost of commodity 2- and 4-socket servers and their inherent scalability and expandability through additional network adapter and HBA slots.
Blade servers have dramatically increased in popularity and capability due to the ever increasing need for capacity and server density. Blade server architectures are a primary area of R&D for the server manufacturers, resulting in a significant amount of
innovation in this space. The downside of blade servers is limited standards and interoperability between manufacturers and, in some cases, within the same manufacturer when they change their blade chassis architecture.
The processor and space density benefits of blade servers initially came at the expense of expandability and the quantity of network adapters and HBAs that can be supported in the first several generations of blade servers.
Recently, the advent of blade architectures where each blade contains 8 to 16 cores, up to 64 GB of RAM, and most importantly, 6 or more IO interfaces, has eliminated many disadvantages that previously mitigated against using blade server architectures for
The network and storage I/O that is required to support the desired number of guests on each host server must be considered carefully to ensure that each host server is running on a blade and the blade chassis itself can support it.
The host server architecture must be considered when evaluating blade server system architectures. For the Two-Node and Server Farm patterns, dedicated network adapters are required for the cluster heartbeat network. If an iSCSI storage system is planned, two
additional dedicated network adapters are required for access to the storage and redundancy. Finally, at least two network adapters should be dedicated to network I/O. The number of network adapters required per host can easily expand from 4 to 6 or more network
adapters. This is frequently beyond the number supported by many blade servers.
For the purposes of this document, large SMP servers are defined as those that have 8 or more CPU sockets. At the very high end, Windows Server 2008 R2 Datacenter Edition on 64-bit hardware can support servers with up to 64 CPU sockets and 2 TB of RAM. Many
of these very high end servers include advanced features such as hardware partitioning, hot-add of resources, hot spare components, and so on. Hardware of this capacity has the potential to host hundreds of virtual machine guests.
A large SMP server or large SMP server cluster may be appropriate if a very large number of servers will be consolidated, if the organization has operational experience with large “mainframe” class servers, or if they have already standardized on large SMP
Windows Server 2008 R2 with Hyper-V requires an x64 processor architecture from Intel or AMD, as well as support for hardware execute disable and hardware virtualization such as Intel Virtualization Technology or AMD Virtualization.
Both Intel and AMD provide a wide range of processors that are appropriate for host servers. The industry competition between the two is very tight and at any one time; one may have a performance advantage over the other. Regardless of which manufacturer is
chosen, several performance characteristics are important.
The number of processor cores is a key performance characteristic. Windows Server 2008 R2 with Hyper-V makes excellent use of multi-core processors, so the more cores the better. Another important characteristic is the processor clock speed, which is the speed
at which all cores in the processor will operate. It’s important because it will be the clock speed of all of the guest virtual machines. This is a key variable in the consolidation ratio because it impacts the amount of candidates that the host server can
handle AND the speed at which those guests will operate. As an example, choosing 2 GHz processor rather than a 3 GHz processor on a server that will host 20 guests means that all of those guests will run only at 2 GHz.
At a lower level of detail, the server processor architectures make design choices in terms of the type and quantity of processor cache, memory controller architecture, and bus/transport architecture. A detailed analysis of these factors is beyond the scope
of this document.
Once the system architecture and processor architecture choices are made, there are relatively few options remaining for memory architecture because it is usually predetermined by the manufacturer/system/processor combination. The memory architecture choices
that remain are typically quantity, speed, and latency. For supporting virtualized workloads the most important memory architecture choice is the quantity of RAM.
The quantity of RAM is a more important factor than RAM speed or latency. Once the maximum amount of RAM that is cost effective is determined, if there is a remaining choice between speed and latency, choosing the memory with lower latency is recommended.
The network architecture of the host server is a frequently overlooked topic in host server sizing because GB Ethernet network adapters are now very inexpensive and most servers have at least two built in. The topic is important, however, because it is directly
impacted by the host server architecture pattern selected. If one of the two host server cluster patterns is selected, a dedicated network adapter per server is required for the cluster private (heartbeat) network. As mentioned previously, if an iSCSI storage
architecture is being utilized, network adapters will need to be dedicated to storage I/O traffic. Gigabit Ethernet is a high-speed network transport, though a host server with a large number of guests may require greater than gigabit speed, thus requiring
additional network adapters. Finally, it is recommended that each host server have a network adapter dedicated to the host itself for network I/O and management.
As described earlier, a fairly large number of network adapters per host server may be required. This is the one factor that can mitigate against blade servers in some instances. Recently, 10-GB Ethernet has become commonly available and is starting to drift
lower in price, similar to the way gigabit Ethernet has done over the years. The ability for servers to utilize 10-GB Ethernet network adapters is a significant factor in increasing the consolidation ratio.
Use multiple network adapters and multi-port network adapters on each host server.
Dedicate at least one network adapter/Port on each host server for guest virtual machine network I/O. For maximum consolidation ratio, utilize one or more 10-GB Ethernet network adapters to virtual machine network I/O.
The disk storage for all guest virtual machines is one or more VHD files housed on the storage system being utilized by the host server. Host storage I/O, in addition to the system, processor, memory, and network architectures described earlier, is the final
major component of host server sizing. Hyper-V I/O consists of a large number of read and write IOPS to the storage system due to the large number of guests running on each server and their various workloads.
If direct attached storage is being utilized, a SATA II or SAS RAID controller internal to the server is recommended as described earlier. If a storage array and SAN are being utilized, host bus adapters (HBAs) are required in the host server. The HBA provides
access to the storage array from the host server. The storage connectivity is a critical component for high availability and performance.