Published: June 26, 2013
Version: 1.1
Abstract: The Hybrid Cloud Infrastructure Design Considerations guide provides the enterprise architect and designer with a collection of critical design considerations that need to be addressed before beginning the design decisions process that will drive a hybrid cloud computing infrastructure implementation. This article can be used together with the Hybrid Cloud Solution for Enterprise IT reference implementation guidance set to create a core hybrid cloud infrastructure.
 


To provide feedback on this article, leave a comment at the bottom of the article or send e-mail to SolutionsFeedback@Microsoft.comTo easily save, edit, or print your own copy of this article, please read How to Save, Edit, and Print TechNet Articles When the contents of this article are updated, the version is incremented and changes are entered into the change logThe online version is the current version. See the bottom of this article for a list of technologies discussed in this article.

1.0 Introduction

Most enterprise information technology (IT) organizations have data centers that have limited IT staff, data center space, hardware, and budgets. To avoid adding more of these resources, or to more effectively use the resources they already have, many organizations now use external IT services to augment their internal capabilities and services. Examples of such services are Microsoft Office 365 and Microsoft Dynamics CRM Online. Services that are provided by external providers typically exhibit the five essential characteristics of cloud computing (on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service) that are defined in The NIST Definition of Cloud Computing

In the remainder of this document, the term “cloud services” refers to services that exhibit the United States National Institute of Standards and Technology (NIST) essential characteristics cloud computing. Services that do not exhibit these characteristics are referred to simply as “services.” Services often don’t exhibit many, if any of the essential characteristics. Another term that is used throughout this document is “technical capabilities.” Technical capabilities are the functionality that is provided by hardware or software, and when they are used together in specific configurations, they provide a service, or even a cloud service. 

For example, when providing a messaging service in your environment, you’d use an email server application, network, servers, name resolution, storage, authentication, authorization, and directory technical capabilities, at a minimum to provide it. If you wanted to provide that same messaging service as a cloud service in your environment, you’d add additional capabilities such as a self-service portal, and probably an orchestration capability (to execute the tasks that support the essential cloud characteristic of self-service). 

Rather than several people consuming a number of cloud services from external providers independently, typically a department within the IT organization establishes a relationship with an external provider at the organizational level. This department consumes the service at the organizational level, integrates the service with some of their own internal technical capabilities and/or services, and then provides the integrated hybrid service to consumers within their own organization. The consumers within the organization are often unaware of whether the service is owned and managed by their own IT organization or owned and managed by an external provider. And they don’t care who owns it, as long as the service meets their requirements. 

An important consideration when dealing with a hybrid cloud infrastructure is that while the in-house IT department will be seen as a provider of cloud services to the corporate consumer of the hybrid cloud solution, it is also true that the IT organization itself is a consumer of cloud services. That means that there are multiple levels of consumers. The corporate consumer might be considered a second-level consumer of the public cloud services, while the IT organization might be considered a first-level consumer of the service. This has important implications when thinking about the architecture of the solution. This issue will be covered later in this document. 

This document details the design considerations and configuration options for integrating Windows Azure Infrastructure Services (virtual machines (or “compute”), network, and storage cloud services) with the infrastructure capabilities and/or services that currently exist within typical organizations. This discussion will be driven by requirements and capabilities. Microsoft technologies are mentioned within the context of the requirements and capabilities and not vice versa. It is our expectation that this approach will resonate better with architects and designers who are interested in what problems must be solved and what approaches are available for solving these problems. Only then is the technology discussion relevant. 

1.1 Audience

The primary audience for this document is the enterprise architect or designer who is interested in understanding the issues that need to be considered before engaging in a hybrid cloud project, and the options available that enable them to meet the requirements based on the key infrastructure issues. Others that might be interested in this document include IT implementers who are interested in the design considerations that went into the hybrid cloud infrastructure they are tasked to build.

1.2 Document Purpose

The purpose of this document is two-fold. The first purpose is to provide the enterprise architect or designer a collection of issues and questions that need to be answered for each of the issues for building a hybrid cloud infrastructure. The second purpose is to provide the enterprise architect or designer a collection of options that can be evaluated and chosen based on the answers to the questions. While the questions and options can be used with any public cloud service provider's solution, examples of available options will focus on Windows Azure.

In addition, this document includes:

  • The relevant design requirements and environmental constraints that must be gathered in an environment before integrating Windows Azure Infrastructure Services into an environment.
  • Conceptual design considerations for integrating infrastructure cloud services into an existing environment, regardless of who is the external provider of the cloud services.
  • Physical design considerations to evaluate when integrating Windows Azure Infrastructure Services into an existing environment.

This document was conceived and written with the desire that enterprise IT should not want to replicate their current datacenter in the cloud. Instead, it is assumed that that enterprise IT would like to base a new solution on new architectural principles specific for a hybrid cloud environment. This document focuses on the hybrid cloud infrastructure because core infrastructure issues need to be addressed before even considering creating a single virtual machine for production. Issues revolving around security, availability, performance and scalability need to be considered in the areas of networking, storage, compute and identity before embarking on a production environment. We recognize that there is a tendency to want to stand up applications as soon as the public cloud infrastructure service account is created, but we encourage you to stem that urge and read this document so that you can avoid unexpected complications that could put your hybrid cloud project at risk. 

Note that the existing environment can be a private cloud or a traditional data center. The goal is to enable you to integrate your current environment with a public cloud provider of infrastructure services (an Infrastructure as a Service [IaaS] provider). 

While this document does explain design considerations and the relevant Microsoft technology and configuration options for integrating Windows Azure Infrastructure Services with the existing infrastructure of technical capabilities and/or services in an environment, it does not provide any example designs for doing so. A future document set will address a specific design example. You can find more information about this on the Cloud and Datacenter Solutions Hub at http://technet.microsoft.com/en-US/cloud/dn142895.  

If you’re also interested in guidance that includes lab-tested designs that integrate infrastructure cloud services into existing environments, it is available separately. For more information, see http://technet.microsoft.com/en-US/cloud/dn142895.

2.0 Hybrid Cloud Problem Definition

The following problems or challenges typically drive the need to integrate infrastructure cloud services from external providers into existing environments:

  • Existing hardware, software, or staff resources cannot meet the demand for new technical capabilities and/or services within the environment.
  • Periodic demand “spikes” require acquisition of hardware and software resources that sit idle during normal, non-spike usage periods.
  • On-premises cloud services are usually not as cost effective as consuming the services from an external provider. Private cloud solutions make sense for maximizing flexibility and efficiency on-premises, and provide a path to integrating with or migrating to public cloud services, we should not forget that extreme economies of scale are only going to be realized via public cloud services offerings. The Economics of the Cloud whitepaper from Microsoft estimated a 10 fold reduction in cost when fully utilizing public cloud.

Organizations with a large application portfolio will need to be able to determine hybrid cloud infrastructure requirements before starting new applications, or moving existing applications into a cloud environment. Different applications will have different demands in the areas of networking, storage, compute, identity, security, availability and performance. You will need to determine if the public cloud infrastructure service provider you choose is able to deliver on the requirements you define in each of these areas. In addition, you will need to consider are regulatory issues specific to your organization's geo-political alignment.        

3.0 Envisioning the Hybrid Cloud Solution

After clearly defining the problem you’re trying to solve, you can begin to define a solution to the problem that satisfies your consumer’s requirements and fits the constraints of the environment in which you’ll implement your solution.

3.1 Solution Definition

To solve the problems previously identified, many organizations are beginning to integrate infrastructure cloud services from external providers into their environments. In many organizations today, a department within the organization owns and manages network, compute (virtual machine), and storage technical capabilities. The people in this department may provide these technical capabilities for use by people in other departments within the organization, and/or, with additional technical capabilities, provide these capabilities as services, or even cloud services within their environment.  

The design considerations in this document are for a solution that enables an organization to: 

  • Set up an organization-level account and billing with an external provider of cloud infrastructure services, so that its consumers don’t do so at an individual level.
  • Allow its consumers to provision new virtual machines with the external provider that have capabilities similar to the capabilities of virtual machines that are provided on premises.
  • Allow its consumers to move existing applications that run on the organization’s on-premises network into a public cloud infrastructure as a service offering.
  • Allow consumers of applications in the organization to resolve names and authenticate to resources that are running on the external provider’s infrastructure cloud services, just as they do with resources that are running on premises.  
  • Enable core security, data access controls, business continuity, disaster recovery, availability and scalability requirements

3.2 Solution Requirements

Before integrating infrastructure cloud services from an external provider with existing infrastructure technical capabilities and/or services to solve the problems that were previously listed, you must first define a number of requirements for doing so, as well as the constraints for integrating the services. Some of the requirements and constraints are defined by the consumers of the capabilities, while others are defined by your existing environment, in terms of existing technical capabilities, services, policies, and processes.  

Determining the requirements, constraints, and design for integrating the services is an iterative process. Initial requirements, coupled with the constraints of your environment may drive an initial design that can’t meet all of the initial requirements, necessitating changes to the initial requirements and subsequent design. Multiple iterations through the requirements definition and the solution design are necessary before finalizing the requirements and the design. Therefore, do not expect that your first run through this document will be the last one, as you’ll find that decisions you make earlier will exclude more preferred options that you might want to select later.   

The answers to the questions in this section provide a comprehensive list of requirements for integrating infrastructure cloud services from an external provider with the existing infrastructure technical capabilities and/or services in your environment.

3.2.1 Service Delivery Requirements

Before integrating cloud infrastructure services from an external provider with existing infrastructure technical capabilities and/or services in your environment, you’ll need to work with the consumer(s) of these cloud services in your environment to answer the questions in the sections that follow. The questions are aligned to the Service Delivery processes that are defined in the CSFRM. The initial answers to these questions that you get from your consumer(s) are the initial Service Delivery requirements for your initial design.  

After further understanding the constraints of your environment and the products and technologies that you will ultimately use to extend your existing infrastructure technical capabilities to an external provider however, you will likely find that not all of the initial requirements can be met. As a result, you’ll need to work with your consumer to adjust the initial requirements and continue iterating until you have a final design that satisfies the requirements and the constraints of your environment.  

The outcome of this process is a clear definition of the functionality that will be provided, the service level metrics it will adhere to, and the cost at which the functionality will be provided. The service design applies the outcomes of the following questions. 

The following table contains questions that you’ll need to address in these areas.

Service delivery requirements

Questions to ask

Demand and capacity management

 

  • Do your customers have specific requirements for the infrastructure that is needed to support the services that you will provide them?
  • Do your customers have information on how much capacity they will require and for how long they will need it?
  • Do they have information on service use patterns or predictions on what use patterns might be in the future?

Availability and continuity management

 

  • What level of availability will your consumers require for their services?
  • Are they willing to pay a premium for higher availability?
  • Will you provide a tiered service that offers varying levels of reliability?
  • What type of business continuity approach do you plan to use?
  • Will you enable business continuity for just the hybrid cloud infrastructure, or will you provide business continuity for both the infrastructure and the tenants using the hybrid cloud infrastructure? 

Information security management

 

  • What is the business impact of the information your customers want to keep in your hybrid cloud infrastructure?
  • Will you have a mechanism that forces information and services that must be on premises to stay on premises?
  • Will your hybrid cloud infrastructure provide support for encryption of customer data both when it is at rest and in flight?
  • If so, will this be part of your service offering and charged as a premium service?

Regulatory and compliance management

 

  • What exposure will the information and services that your customers place into the hybrid cloud infrastructure have to current government and industry regulatory and compliance measures?
  • Will your hybrid cloud infrastructure support all those measures that are required by the consumers of the hybrid cloud services?
  • If you plan to host information that is subject to regulatory and compliance guidelines, how will you provide access to the hybrid cloud infrastructure so that auditors can evaluate your solution for certification purposes?

Financial management

 

  • Who will be responsible for assuming the costs of extending on-premises infrastructure into the public cloud?
  • Do you plan to provide chargeback billing to the consumers, or just “show back” so that management knows the relative costs incurred by different divisions that consume the hybrid cloud infrastructure?
  • How will you determine that the move from private cloud or traditional data center to hybrid cloud is more cost effective?
  • Do you have a good understanding of your current costs?
  • Do you have a functional cost model? 

3.2.2 Service Operations Requirements

You have a variety of operational processes that are applied to the delivery of all services and technical capabilities in your environment. As a result, you need to answer the questions in the following sections to determine how the hybrid cloud infrastructure you’re designing will apply to and comply with your operational processes. The questions are aligned to the service operations processes defined in the CSFRM. The answers to these questions become the service operations requirements for the design of your hybrid cloud infrastructure. Questions you need to ask to address these areas are included in the following table. 

Service operations requirements

Questions to ask

Request fulfillment

 

  • How will consumers of your hybrid cloud infrastructure request access to the service?
  • Will there be a self-service portal?
  • Who is responsible for the design and creation of the self-service portal?
  • Will consumers of the hybrid cloud infrastructure be able to create and manage their own virtual machines at the server level or will they have access only to the virtual machines themselves?
  • Will you use automation and orchestration to fulfill requests for services so that the service is deployed when the consumer completes the self-service form?
  • If you will not use comprehensive automation and orchestration, will you have the information gathered in the form generate a service ticket that is reviewed by a member of the hybrid cloud infrastructure group who then fulfills the request?

Service asset and configuration management

 

  • What level of latitude will you allow your customers for customization of the compute, networking, and storage elements?
  • Will you provide specific service levels or will the customers be able to come up with their own?
  • How will you manage the configuration of the on-premises side of the hybrid cloud infrastructure?
  • Will you use existing configuration management solutions or will you need to create a separate configuration management system to support the hybrid cloud infrastructure?
  • How will you manage the configuration of the public side of the hybrid cloud infrastructure?
  • Does the public cloud infrastructure service provider have tools that will integrate with your on-premises configuration management solution?
  • If not, is there a way that you can pool the configuration information from both sides and give the appearance that there is a unified configuration management and administration experience?

Change management

  • What change management system will you use for your hybrid cloud infrastructure?
  • Will the management system you use integrate with the public cloud infrastructure provider’s system?
  • If not, is there a way to consolidate the information to provide a unified change management solution for the hybrid cloud infrastructure?

Release and deployment management

  • What type of release management system will you use in the hybrid cloud infrastructure?
  • What type of deployment management system will you use in your hybrid cloud infrastructure?

Access management

  • What systems does the public cloud infrastructure service provider have in place to control what access members of the IT group have to the service?
  • Does the public cloud infrastructure provider enable any level of role-based access control over various components of the service?
  • Does the public cloud infrastructure  service provider maintain records of who accessed the service, when they accessed the service, and what they did when they accessed the service? Are they available to you the consumer of the service? If so, how quickly can they gain access to that information?
  • Will you support role-based access control to the service that you provide to your consumers?
  • Will you support the logging of who signed in to your Hybrid cloud infrastructure service, when they signed in to it, and what they did while they were there?
  • What process will you use to provision administrators of the service in the public cloud infrastructure service?
  • What deprovisioning process will you use?
  • Will you integrate your on-premises identity infrastructure with the cloud infrastructure service provider?

Systems administration

  • Who will be responsible for managing the public portion of your hybrid cloud infrastructure?
  • Will all administrators of the hybrid infrastructure have the same level of access to the private and public components?
  • If not, what is the workflow process for getting things done between the on-premises and public cloud infrastructure service administrators?

Knowledge management

  • How will you document the hybrid cloud infrastructure solution?
  • Who will be responsible for this documentation?
  • Who will review the documentation?
  • What will you include in the documentation?
  • What is the scheduled update cycle for the documentation?
  • Where will the documentation be stored?

Incident and problem management

  • How will you handle incident management for the hybrid cloud infrastructure?
  • Will you integrate your current incident management tools and processes?
  • Will you need new incident management tools and processes to integrate with the public cloud infrastructure service?
  • Will you communicate incidents on the public cloud infrastructure with your customers, and if so, how and when?
  • How will customers of your hybrid cloud infrastructure communicate issues regarding performance, availability, and functionality?
  • Will your consumers submit a service ticket?
  • Will you consumers call you on the telephone?
  • What service-level agreement (SLA) do you plan to provide consumers of your cloud service? Does the SLA include only uptime of the virtual machines that are hosted in your service, or the entire application stack across the hybrid cloud infrastructure?

3.2.3 Management and Support Technical Capability Requirements

Every organization uses a variety of technical capabilities to manage and support services in their environment. As the provider of this service, you need to work with the people in your organization who provide these technical capabilities to determine the answers to the questions in this section. The questions are aligned to the Management and Support Technical Capabilities that are defined in the CSFRM. The answers to these questions become the Management and Support Technical Capability Requirements and constraints for the design of your hybrid cloud infrastructure. 

While the introduction of this service may require unique changes to the existing capabilities in your environment, it’s assumed that because such changes start to de-standardize the existing capabilities, they should be avoided whenever possible. 

When thinking about management support and technical capabilities, you should ask the questions in the following table.

Management and support technical capability

Questions to ask

Service reporting

  • Will you provide service reporting to consumers of your hybrid cloud infrastructure?
  • What will you report on and at what level?
  • Will consumers be able to obtain historical information?
  • Will your public cloud infrastructure provider provide service level reporting to your IT organization? What do they report on and at what level?
  • Will you be able to obtain historical data from the provider?

Service Management

  • What type of service management system will you use to manage the hybrid cloud infrastructure as a whole? Will one system be able to manage both the on-premises and public cloud components?
  • If not, is there a way to integrate the two systems?
  • Will the service management system support the level of automation and orchestration that are required for your hybrid cloud infrastructure?
  • If not, are there ways that you can mitigate this or find an alternate system that meets your requirements?

Service Monitoring

  • What level of service monitoring will be conducted in the hybrid cloud infrastructure?
  • What aspects of the combined service will you monitor?
  • How will you prioritize which components are most important to monitor?
  • What instrumentation will you use for service monitoring for the hybrid cloud infrastructure?
  • Can you use your current monitoring toolset to monitor the public cloud infrastructure service?
  • If not, can you integrate with the public cloud provider’s service?
  • If not, is there a way to combine data and present it in a holistic view?
  • Will you provide any level of service monitoring for the services running in the hybrid cloud infrastructure?
  • If so, will it be limited to monitoring the virtual machine status, or will you also reach into monitoring the services that are running on the virtual machines that are hosted in the hybrid cloud infrastructure?

Configuration Management

  • What configuration management system will you use to ensure that the virtual machines running in the hybrid cloud infrastructure will conform to corporate security and compliance guidelines?
  • Will your on-premises configuration management system integrate with the cloud infrastructure service provider’s configuration management system?
  • If not, is there a way to insure that the cloud infrastructure service provider’s configuration management system can enforce corporately mandated configurations?

Fabric Management

  • On the cloud infrastructure service provider’s end of the hybrid cloud infrastructure, fabric management is the responsibility of the provider. However, you will need information on how the service provider’s fabric management system works in order to determine whether it will work with your fabric management system. This assumes that you are using an on-premises private cloud. If not, fabric management is less of an issue on the on-premises side.
  • Does the service provider’s fabric management system integrate with your on-premises system?
  • If not, are there methods or technologies that you can apply to create a hybrid cloud infrastructure fabric management overlay?

Deployment and Provisioning

  • What systems will you use to aid in the deployment and provisioning for the virtual machines that run in your hybrid cloud infrastructure?
  • What systems are available on the service provider’s end, and what systems do you already run on premises?
  • Will your existing deployment and provisioning infrastructure work with the service provider’s system?
  • If not, are there any products or technologies that you can use to get them to work with one another?
  • Are there systems available to let the consumers of your hybrid cloud infrastructure know that deployment and provisioning is complete?
  • Are there systems in place to inform them if there are errors or failures in the deployment? How will you surface this information?
  • Are there systems in place that will attempt to automatically remediate failures in the deployment before informing hybrid cloud infrastructure team and consumers of the hybrid cloud service that there was a failure? Does the service provider provide the corporate IT organization this information?

Data Protection

  • How will you implement data protection for the hybrid cloud infrastructure?
  • How will you insure that the virtual machines in both the on-premises and public cloud infrastructure service are backed up? We’re will these backups be stored?
  • How often will you back up?
  • What SLA do you plan to support?
  • Will you provide data protection services to consumers of the hybrid cloud infrastructure? If so, what products or technologies will you use?

Network Support

  • How will you connect your on-premises infrastructure to the cloud service provider’s network?
  • Will that connection provide the bandwidth that is required for the applications that have components both in the cloud infrastructure service provider cloud and in the on-premises cloud or data center?
  • What latencies will you be able to tolerate for applications that have components both on and off premises?
  • If there are applications that are designed for data center network latencies, is the option available to recode them to work in a hybrid cloud infrastructure?
  • What level of availability will you need to provide?
  • Will Internet-based site-to-site VPN provide that level of availability?
  • Do you need the equivalent of a dedicated WAN link to the cloud infrastructure service provider’s network?
  • Does the public cloud infrastructure service provider enable service monitoring at the network level?

Billing

  • Do you plan to bill consumers of the hybrid cloud infrastructure? If so, will it be actual chargeback, or will you begin with “show back”?
  • If you begin with “show back”, what factors will you use to determine when to cut over to chargeback?
  • What technology and tools will you use to generate chargeback reports?
  • What components of the hybrid cloud infrastructure should feed into the chargeback equation?
  • Will you bill for only the public cloud infrastructure service provider components, or will you bill for both the service provider and the in-house infrastructure? The cost model is already defined for the provider’s components, but you will need to define your own cost model for the on-premises infrastructure.
  • Do you have the tools and technology to determine your cost model?

Self-Service

  • What level of self-service does the cloud infrastructure service provider give you?
  • Is the level of self-service the public cloud infrastructure service provider gives you enough so that you do not need to call anyone in order to get the compute, networking, and storage resources that you require?
  • Does the cloud infrastructure service provider have “soft limits" that might require human intervention for you to exceed them?
  • If there are soft limits, do you anticipate running up against these soft limits?
  • Is there a way to increase these soft limits in advance so that future initiatives are not stalled by a call to the service provider?
  • What level of self-service do you plan to provide to the consumers of your hybrid cloud infrastructure? If you do not currently run a private cloud, then self-service isn’t applicable.
  • If you currently run a private cloud, then how will you integrate your current self-service mechanism with resources in the public cloud infrastructure service provider?

Authentication

  • How will the IT organization authenticate to access the public cloud infrastructure service provider?
  • Will user accounts be stored at the provider?
  • Will it be possible to leverage existing Active Directory accounts?
  • Is there a way to sync accounts or use some type of federation?
  • How will users authenticate to access self-service portals? Will they use an existing Active Directory account or will they need to create accounts that are dedicated to the hybrid cloud infrastructure?
  • What authentication capabilities do you want to make available to services set up by consumers of your hybrid cloud service?
  • Will you enable consumers to set up services that require Active Directory?
  • Will you force consumers of the hybrid cloud infrastructure to set up their own authentication systems?

Authorization

  • What authorization mechanisms are in place at the public cloud infrastructure provider?
  • Does the public cloud infrastructure service provider support role-based access control so that members of the IT organization have access only to components of the public cloud infrastructure service that they need access to?
  • How will you control account access in the event that a member leaves the IT organization that is responsible for the public cloud infrastructure service?
  • Is there an automatic account de-provisioning process that is connected to the HR database?
  • If not, is there a formal process for informing the IT organization when a member leaves, so that the exiting member does not have access after being released?
  • Will all members of the company be allowed to request resources from the hybrid cloud infrastructure?
  • If not, will you define a subset of users within the company who will be allowed access to the self-service portal?
  • If you limit access to self-service components, what are the how will you define the attributes of the users who will be qualified for access?

Directory

  • What directory services will be available at the public cloud infrastructure service provider?
  • Will you be able to tie the public cloud infrastructure service provider’s directory services infrastructure into your local identity repository?
  • What directory services will be used to support the on-premises side of the hybrid cloud infrastructure?
  • What directory services will be used to support users who will make requests for hybrid cloud infrastructure resources?
  • If the hybrid cloud infrastructure and the users who request resources from the infrastructure use different directory services, will you provide a way to enable single sign-on? Or will you require that they authenticate separately to access the hybrid cloud infrastructure?

Orchestration

  • What orchestration capabilities does the public cloud infrastructure provider support?
  • What orchestration capabilities does the on-premises side of the hybrid cloud infrastructure support?
  • Do the public cloud infrastructure service provider’s orchestration technologies integrate with your on-premises orchestration technologies? If not, are there any additional technologies or workarounds that you can use to get them to work with each other?

3.2.4 Infrastructure Services Capabilities Requirements

Every organization uses a variety of infrastructure technical capabilities, or infrastructure services, or some combination of the two to host IT services. As the provider of this service, you need to work with the people in your organization who provide these technical capabilities to determine the answers to the questions in this section. The questions are aligned to the Infrastructure Technical Capabilities that are defined in the CSFRM. The answers to these questions become the Infrastructure Technical Capability Requirements and constraints for the design of your hybrid cloud infrastructure. 

While the introduction of this service may require unique changes to the existing capabilities in your environment, it’s assumed that because such changes start to de-standardize the existing capabilities, they should be avoided whenever possible. 

When considering infrastructure capability requirements, you should start by asking the questions in the following table. 

Infrastructure services requirements

Questions to ask

Network

  • On the on-premises side of the hybrid cloud infrastructure, what network services do you plan to provide to consumers of your hybrid cloud infrastructure?
  • Do you plan to provide quality-of-service options?
  • Do you plan to provide options to encrypt consumers’ information while it is on the wire?
  • How do you plan to isolate each consumer’s network traffic from other consumers?
  • Will you create different security zones for network traffic? Will those security zones be in sync with your current data classification scheme? Or will they be scoped to services based on what regulatory and compliance guidelines it will need to meet?
  • Will you implement controls to prevent tenants from standing up rogue DHCP servers?
  • Will you implement controls to prevent exploits by tenants such as router advertisements?
  • Will you use port ACLs or a service level firewall to enable access controls between tenants, and between tenants and the on-premises cloud infrastructure or traditional data center?)
  • Will you use IPsec to control traffic moving through the on-premises infrastructure?
  • Will you provide IPsec support for tenants who want to use it? If so, will you enable hardware offloads for this?
  • Will you use hardware offloads to enhance networking performance?
  • How will you enable name resolution so that on-premises systems can resolve names of systems in the public cloud infrastructure provider’s cloud?
  • How will you connect the on-premises sites to the public cloud infrastructure provider’s site? Site-to-site VPN connection? Dedicated WAN link? Other?
  • How will you enable high availability for the connection between the on-premises site and the public cloud provider’s site?
  • How will you manage bandwidth for the connection between the on-premises network and the public cloud infrastructure service provider’s network?
  • Will you enable network level access controls between the on-premises network and the public cloud provider’s network? If so, do you have a good understanding of the protocols that are required in both directions? How will you discover what protocols are required? Do you have someone who is versed in network analysis?
  • Does your public cloud infrastructure services provider support load balancing of connections coming from the Internet to machines located on the service provider's network?

Virtual Machine

  • What types of virtual machines will you provide? Will you use “t-shirt sizes” or will you allow more granular customization?
  • How many processors per virtual machine will you support?
  • How much memory per virtual machine will you support?
  • How much storage per virtual machine will you support?
  • How much bandwidth per virtual machine will you support?
  • Will you place limits on the number of virtual machines that a user can provision? Will these be “hard” or “soft” limits? If the user wants to exceed these limits, what procedures do you plan to have in place to allow them to do so?
  • How will users release virtual machines? Will you make it a one-step process to remove the virtual machine and its associated storage from the pool? Or will the user have to remove the virtual machine and its storage separately?
  • When removing virtual machines, does your public cloud infrastructure service provider charge for processor time separately from storage and networking? Or, does your provider charge for compute and storage together?
  • Will you enable users to dynamically expand the amount of memory or the number of processors that are required, based on resource utilization of a specific virtual machine? Or will the user need to provision a new virtual machine and attempt to scale out instead of scale up?
  • Will users have the ability to upload virtual machines that are running on-premises to the public cloud provider’s network? Are there any special configuration requirements for established virtual machines before moving them to the cloud provider’s network?
  • Will users be able to create virtual machines from a library available on premises and/or in the cloud provider’s network? Will you mirror the image offerings on the public cloud infrastructure provider’s network on your own on-premises infrastructure?
  • Will your public cloud service provider support stateful services running on virtual machines? If so, will the on-premises side of the hybrid cloud infrastructure support technologies that enable stateful applications?
  • Does your cloud service provider enforce password complexity for virtual machines that run on its network?
  • Will you need to make any changes to account names based on the cloud infrastructure service provider’s policy for account naming?
  • Does you public cloud infrastructure service provider support auto-scaling of virtual machines so that machines can be automatically added or removed depending on service performance characteristics?

Storage

  • What storage limits will be placed on the virtual machines in the public cloud infrastructure provider’s network? What storage limits are placed on the operating system drives? What storage limits are placed on the data drives?
  • Does the public cloud infrastructure service provider enable read/write caching? Can you choose to turn read/write caching on or off?
  • Will there be access to a caching drive for virtual machines in the cloud infrastructure service provider’s network? If not, are there any other storage-based optimizations available?
  • Will there be access to storage tiers, so that high IOPS workloads can benefit from higher-performance storage?
  • Does the public cloud infrastructure service provider make commodity storage available for storing backups and disaster recovery files?
  • Does the cloud infrastructure service provider maintain redundant copies of your information in storage? If so, what do they do to insure high availability for you stored assets?
  • Does the cloud infrastructure service provider enable you to encrypt your data on disk? If not, what security measures does the provider apply to storage to protect it from being stolen (besides facilities-based security measures)?
  • Will your cloud infrastructure service provider charge you for storage separately from computers (processors and memory)? Will you be billed if you decommission a virtual machine and do not delete the storage?

3.2.5 Infrastructure Technical Capability Requirements

Every organization uses a variety of infrastructure services, or infrastructure technical capabilities, or some combination of the two to host IT services. As the provider of this service, you need to determine how best to use the existing infrastructure services in your environment. The infrastructure services that your environment uses may be provided by your own organization, by external organizations, or some combination of the two. If your environment uses existing internal or external infrastructure (and in almost all cases it will), then your organization’s technical capabilities will be driven by that infrastructure, which supports the infrastructure capabilities that are mentioned in the infrastructure component section.  

One of the core tenets of cloud computing is that the infrastructure should be completely transparent to the user. So the users of the cloud service should never know (nor should they care) what the infrastructure services are that support the cloud infrastructure. 

The questions in this section are aligned to the Infrastructure component in the CSFRM. The answers to these questions become the Infrastructure Requirements and constraints for the design of your hybrid cloud infrastructure. Note that the CSFRM does not assume that the infrastructure for your environment is provided by your own organization. The infrastructure might be provided by your organization, or they might be provided by an external organization. However, in a hybrid cloud infrastructure, infrastructure is provided by both the company’s IT organization and the public cloud infrastructure provider.  

While the introduction of public cloud infrastructure may require unique changes to the existing services in your environment, it’s assumed that because such changes start to de-standardize the existing services, they should be avoided whenever possible. 

The following table includes questions you should ask about infrastructure requirements. 

Infrastructure requirements

Questions to ask

Network

  • What on-premises networking equipment will you need to connect to the public cloud infrastructure services network?
  • What requirements does the public cloud infrastructure service provider have for your on-premises networking equipment?
  • Does the public cloud infrastructure service provider support high availability on the on-premises side of the connection to the provider’s network?

Compute

  • Does the public cloud infrastructure service provider enable you to acquire the number of processors you need to scale up applications?
  • Does the public cloud infrastructure service provider enable you to acquire enough memory to scale up applications?

Virtualization

  • What virtual disk types does the cloud service infrastructure service provider support? If they do not support the disk type that you using for your on-premises virtualization platform, will they provide disk conversion service for you?
  • Will the public cloud infrastructure service provider support the operating systems that you want to run?
  • Does your public cloud infrastructure service provider support NUMA-aware applications? Does their fabric management system place virtual machines based on available NUMA nodes?
  • Does your public cloud infrastructure service provider enable you to configure specific virtual machines to never be co-located or to always be co-located?
  • Does your public cloud infrastructure services provider enable you to put virtual machines into different fault domains?
  • Does your public cloud infrastructure service provider enable you to put virtual machines into different update domains?
  • Does your public cloud infrastructure service provider support automatic restarting of virtual machines on an alternative host in the event that the host the virtual machine is running on becomes disabled?

Storage

  • Does the public cloud infrastructure services provider enable you to integrate your storage systems with the provider’s system to support backup, snapshots and replication?

3.2.6 Platform Requirements

Some organizations provide platform services that are consumed by application developers and the software services that they develop for the organization. As the provider of this service, you need to determine whether your organization currently provides its own platform services, or uses platform services from external providers. If you do use external providers, you need to know how their service will use the platform services. If your environment has its own existing platform services or uses external services, then your organization’s self-service technical capability will include a service catalog, which includes the list of available services in the environment and the service-level metrics they adhere to.  

The questions in this section are aligned to the Platform Services that are defined in the CSFRM. The answers to these questions become the Platform Services Requirements and constraints for the design of your hybrid cloud infrastructure. Note that the CSFRM does not assume that the platform services for your environment are provided by your own organization. The platform services might be provided by your organization, or they might be provided by an external organization. 

While the introduction of this service may require unique changes to the existing services in your environment, it’s assumed that because such changes start to de-standardize the existing services, they should be avoided whenever possible. 

The following table includes questions you should ask about platform service requirements. 

Platform service requirements

Questions to ask

Structured data

  • Does your data already reside in a relational database? Do you plan to move any of that structured data into the public cloud infrastructure service provider’s system?
  • Does the public cloud infrastructure service provider support a PaaS-based cloud relational database that you can move your structured data into?
  • If so, are there any special requirements for moving your structured data into the cloud provider’s relational database system?
  • What is the size of the database set that you want to move into the service provider’s system?
  • Does the service provider’s system have database size limits that allow you to move your structured data into its system?

Unstructured data

  • Do you have unstructured data that you want to move into the public cloud infrastructure service provider’s network? Do you need to store data files that are typically hosted by a file server?
  • Do you need high-performance distributed-database computing to support unstructured data?
  • Does your cloud service provider support distributed-database computing, such as Hadoop?

Application server

  • Do you have applications that might work best in a cloud service provider’s PaaS offering, such as web servers?
  • Does your cloud service provider offer PaaS-based application servers?
  • Do you have the development resources to move existing applications to or install new applications in the cloud service providers PaaS offering?

Middleware server

  • Do you have applications that might work best in a cloud service provider’s PaaS offering, such as middleware servers?
  • Does your cloud service provider offer PaaS-based middleware servers?
  • Do you have the development resources to move existing applications to or install new applications in the cloud service providers PaaS offering?

Service bus

  • Do you need to move applications into the service provider’s network that need messaging capabilities between tiers?
  • Does your cloud service provider offer a messaging capability that can be used by your applications?

4.0  Conceptual Design Considerations

After determining the requirements and constraints for integrating cloud infrastructure services from a public cloud infrastructure provider into your environment, you can begin to design your solution. Before creating a physical design, it’s helpful to first define a conceptual model (commonly referred to as a “reference model”), and some principles that will work together as a foundation for further design.

4.1  Reference Model

A reference model is a vendor-agnostic depiction of the high level components of a solution. A reference model can provide common terminology when evaluating different vendors’ product capabilities. A reference model also helps to illustrate the relationship of the problem domain it was created for to other problem domains within your environment. As a starting point, we can use the previously mentioned Cloud Services Foundation Reference Model (CSFRM). 

We won’t include a detailed explanation of the CSFRM in this document, but if you’re interested in understanding it further, you’re encouraged to read the Microsoft Cloud Services Foundation Reference Model document. It will be available as part of the Microsoft Cloud Services Foundation Reference Architecture guidance set. To stay abreast of the work in this area, please see http://aka.ms/Q6voj9 

Although it is from Microsoft, this reference model is vendor-agnostic. It can serve as a foundation for hosting cloud services and can be extended, as appropriate, by anyone. If you decide to use it in your environment, you’re encouraged to adjust it appropriately for your own use. Figure 1 illustrates the CSFRM.  

Figure 1: Microsoft Cloud Services Foundation Reference Model

Recall from the Solution Definition section of this document, that the solution to the problems that are defined in the Problem Definition section of this document is to host virtual machines with an external provider such that the consumers within the organization can provision new virtual machines in a manner similar to how they provision virtual machines that are hosted on premises today.  

The solution also requires that the virtual machines that are hosted by an external provider have capabilities that are similar to the capabilities of the on-premises virtual machines. As mentioned previously, the components, or boxes, in the reference model either change the way existing technical capabilities and/or services are provided in an environment, or introduce new services into an environment.  

The Physical Design Considerations section of this document will discuss the design considerations for all of the black-bordered boxes in Figure 1.

4.2  Hybrid Cloud Architectural Principles

After you’ve defined a reference model, you can establish some principles for integrating infrastructure cloud services from an external provider. Principles serve as “guidelines” for physical designs to adhere to. You can use the principles that follow as a starting point for defining your own. They are a combination of both principles from the CSFRA, and principles unique to integrating infrastructure cloud services from an external provider. 

The Microsoft Private Cloud Reference Architecture (PCRA) provides a number of vendor-agnostic principles, patterns, and concepts to consider before designing a private cloud. Although they were defined with private clouds in mind, they are in fact applicable to any cloud based solution. You are encouraged to read through the document, Private Cloud Principles, Concepts and Patterns, in full, as the information in it contains valuable insight for almost any type of cloud infrastructure planning, including the hybrid cloud infrastructure that is discussed in this document. 

As mentioned previously, designing a cloud infrastructure may be different from how you’ve historically designed infrastructure. In the past, you often purchased and managed individual servers with specific hardware specifications to meet the needs of specific workloads. Because these workloads and servers were unique, automation was often difficult, if for no other reason than the sheer volume of variables within the environment. You may have also had different service level requirements for the different workloads you were planning infrastructure for, often causing you to plan for redundancy in every hardware component. 

When designing a cloud infrastructure for a mixture of workload types with standardized cost structures and service levels, you need to consider a different type of design process. Consider the following differences between how you planned for and designed unique, independent infrastructures for specific workloads in the past, and how you might plan for and design a highly standardized infrastructure that supports a mixture of workloads for the future. 

A hybrid cloud infrastructure introduces new variables, because even if you currently host a private cloud infrastructure on premises, you are not responsible for enabling the essential cloud characteristics in the public cloud infrastructure service provider’s side of the solution. And if you don’t have a private cloud on premises, you can still have a hybrid cloud infrastructure. In that case, you’re not at all responsible for providing any of the essential characteristics of cloud computing, because the only cloud you’re working with is the one on the public cloud infrastructure side. 

The following table provides some perspective on some specific design aspects of a cloud based solution versus how you have done things in a traditional data center environment. 

Design aspect

Non-cloud infrastructure

Cloud infrastructure

Hardware acquisition

Purchase individual servers and storage with unique requirements to support unique workload requirements.

Private Cloud: Purchase a collection (in a blade chassis or a rack) of servers, storage, and network connectivity devices pre-configured to act as one large single unit with standardized hardware specifications for supporting multiple types of workloads. These are referred to as scale units. Adding capacity to the data center by purchasing scale units, rather than individual servers, lowers the setup and configuration time and costs when acquiring new hardware, although it needs to be balanced with capacity needs, acquisition lead time, and the cost of the hardware.

Public Cloud: No hardware acquisition costs other than possible gateway devices that are required to connect the corporate network to the cloud infrastructure service provider’s network.

Hardware management

Manage individual servers and storage resources, or potentially aggregations of hardware that collectively support an IT service.

Private Cloud: Manage an infrastructure fabric. To illustrate this simplistically, think about taking all of the servers, storage, and networking that support your cloud infrastructure and managing them like one computer. While most planning considerations for fabric management are not addressed in this guide, it does include considerations for homogenization of fabric hardware as a key enabler for managing it like a fabric.

Public Cloud: No need to manage new in-house servers or storage devices—host servers and storage infrastructure are managed by the public cloud infrastructure service provider.

Hardware utilization

Acquire and manage separate hardware for every application and/or business unit in the organization.

Private Cloud: Consolidate hardware resources into resource pools to support multiple applications and/or business units as part of a general-purpose cloud infrastructure.

Public Cloud: Set up virtual machines and virtual networks for specific applications and business units. No hardware acquisition required.

Infrastructure availability and resiliency

Purchase infrastructure with redundant components at many or all layers. In a non-cloud infrastructure, this was typically the default approach, as workloads were usually tightly coupled with the hardware they ran on, and having redundant components at many layers was generally the only way to meet service level guarantees.

Private Cloud: With a fabric that is designed to run a mixture of workloads that can move dynamically from physical server to physical server, and a clear separation between consumer and provider responsibilities, the fabric can be designed to be resilient, and doesn’t require redundant components at as many layers, which can decrease the cost of your infrastructure. This is referred to as designing for resiliency over redundancy. To illustrate, if a workload running in a virtual machine can be migrated from one physical server to another with little or no downtime, how necessary is it to have redundant NICs and/or redundant storage adapters in every server, as well as redundant switch ports to support them? To design for resiliency, you’ll first need to determine what the upgrade domain (portion of the fabric that will be upgraded at the same time) and physical fault domain (portion of the fabric that is most likely to fail at the same time) are for your environment. This will help you determine the reserve capacity necessary for you to meet the service levels you define for your cloud infrastructure.

Public Cloud: Infrastructure resiliency is built into the public cloud infrastructure service provider’s offering. You don’t need to purchase additional equipment or add redundancy.

A hybrid cloud infrastructure shares all of the principles of a private cloud infrastructure. Principles provide general rules and guidelines to support the evolution of a cloud infrastructure. They are enduring, seldom amended, and inform and support the way a cloud fulfills its mission and goals. They should also be compelling and aspirational in some respects because there needs to be a connection with business drivers for change. These principles are often interdependent, and together they form the basis on which a cloud infrastructure is planned, designed, and created. 

After you’ve defined a reference model, you can then define principles for integrating infrastructure cloud services from a public provider with your on-premises services and technical capabilities. Principles serve as “guidelines” for physical designs to adhere to, and are oftentimes, inspirational, as fully achieving them often takes time and effort. The Microsoft Cloud Services Foundation Reference Architecture - Principles, Concepts, and Patterns article lists several principles that can be used as a starting point when defining principles for both private and hybrid cloud services. While all of the Microsoft Cloud Services Foundation Reference Architecture principles are relevant to designing hybrid cloud services, the principles listed below are the most relevant, and are applied specifically to hybrid cloud services:

4.2.1 Perception of Infinite Capacity

Statement:

From the consumer’s perspective, a cloud service should provide capacity on demand, only limited by the amount of capacity the consumer is willing to pay for.

Rationale:

The rationale for applying each of the following principles is the same as the rationale for each principle listed in the Cloud Services Foundation Reference Architecture - Principles, Concepts, and Patterns, so each rationale is not restated in this article.

Implications:

Combining capacity from a public cloud with your own existing private cloud capacity can typically help you achieve this principle more quickly, easily, and cost-effectively than by adding more capacity to your private cloud alone.  Among other reasons, this is because you don't need to manage the physical acquisition process and delay, this process is now the public provider's responsibility.

4.2.2 Perception of Continuous Service Availability

Statement:

From the consumer’s perspective, a cloud service should be available on demand from anywhere, on any device, and at any time.

Implications:

Designing for availability and continuity often requires some amount of normally unused resources.  These resources are utilized only in the event of failures.  Utilizing on-demand resources from a public provider in service availability and continuity designs can typically help you achieve this principle more cost-effectively than with private cloud resources alone.   To illustrate this point, if your organization doesn't currently have it's own physical disaster recovery site, and is evaluating whether or not to build one, consider the costs in real-estate, additional servers, and software that a disaster recovery site would require.  Compare that cost against utilizing a public provider for disaster recovery.  In most cases, the cost savings of using a public provider for disaster recovery could be significant.

4.2.3 Optimization of Resource Usage

Statement:

The cloud should automatically make efficient and effective use of infrastructure resources.

Implications:

Some service components may have requirements that allow them only to be hosted within a private cloud.  Specific security or regulatory requirements are two examples of such requirements. Other service components may have requirements that allow them to be hosted on public clouds.  Individual service components may support several different services within an organization.  Each service component may be hosted on a private or public cloud. According to Microsoft's The Economics of the Cloud whitepaper, hosting service components on a private cloud can be up to 10X more than hosting the service components with a public cloud provider. As a result, utilizing public cloud resources can help organizations optimize usage of their private cloud resources by augmenting them with public cloud resources.

4.2.4 Incentivize Desired Behavior

Statement:

Enterprise IT service providers must ensure that their consumers understand the cost of the IT resources that they consume so that the organization can optimize its resources and minimize its costs.

Implications:

While this principle is important in private cloud scenarios, it's oftentimes a challenge to adhere to if the actual costs to provide services are not fully understood by the IT organization or if consumers of private cloud services are not actually charged for their consumption, but rather, only shown their consumption.  When utilizing public cloud resources however, consumption costs are clear, consumption is measured by the public provider, and the consumer is billed on a regular basis.  As a result, the actual cost to an organization for consuming public cloud services may be much more tangible and measurable than consuming private cloud services is.  These clear consumption costs may make it easier to incent desired behavior from internal consumers as a result. 

4.2.5 Create a Seamless User Experience

Statement:

Within an organization, consumers should be oblivious as to who the provider of cloud services are, and should have similar experiences with all services provided to them.

Implications:

Many organizations have spent several years integrating and standardizing their systems to provide seamless user experiences for their users, and don't want to go back to multiple authentication mechanisms and inconsistent user interfaces when integrating public cloud resources with their private cloud resources. The myriad of application user interfaces and authentication mechanisms utilized across various applications has made achieving this principle very difficult.  The user interfaces and authentication mechanisms utilized across multiple public cloud service providers can make achieving this principle even more difficult.  It's important to define clear requirements to evaluate public cloud providers against.  These requirements may include specific authentication mechanisms, user interfaces, and other requirements that public providers must adhere to before you incorporate their services into your hybrid service designs.

4.3 Hybrid Cloud Architectural Patterns

Patterns are specific, reusable ideas that have been proven solutions to commonly occurring problems. The Microsoft Cloud Services Foundation Reference Architecture - Principles, Concepts, and Patterns article lists and defines the patterns below.  In this article, the definitions are not repeated, but considerations for applying the patterns specifically to hybrid infrastructure and service design are discussed for each pattern.

4.3.1 Resource Pooling

Problem: When dedicated infrastructure resources are used to support each service independently, their capacity is typically underutilized. This leads to higher costs for both the provider and the consumer.

Solution: When designing hybrid cloud services, you may have pools of resources on premises and may treat the resources at a public provider as a separate pool of resources.  Further, you may separate public provider resources into separate resource partition pools for reasons such as service class, systems management, or capacity management, just as you might for your on-premises resources. For example, and organization may define two separate service class partition resource pools, one within its private cloud, which might host medium and high business impact information, and one in its public cloud, which might host only low business impact information.

4.3.2 Scale Unit

Problem: Purchasing individual servers, storage arrays, network switches, and other cloud infrastructure resources requires procurement, installation, and configuration overhead for each individual resource.

Solution: When designing physical infrastructure, application of this pattern usually encompasses purchasing pre-configured collections of several physical servers and storage.  While a public provider's scale unit definition strategy is essentially irrelevant to its consumers, you may still choose to define units of scale for the resources you utilize with a public cloud provider.  With a public provider, since you typically pay for every resource consumed, and you have no wait time for new capacity like you do when adding capacity to your private cloud, you may decide that your compute scale unit, for example, is an individual virtual machine.  As you near capacity thresholds, you can simply add and remove individual virtual machines, as necessary

4.3.3 Capacity Plan

Problem: Eventually every cloud infrastructure runs out of physical capacity. This can cause performance degradation of services, the inability to introduce new services, or both.

Solution: The capacity plan in a hybrid solution design incorporates all the same elements of a capacity plan for an on-premises-only solution design.  Service designers however, will likely find it to be much less effort to add/remove capacity on-demand when utilizing resources from a public provider, if for no other reason than doing so doesn't require them to order and wait for the arrival of new hardware. Meeting spikes in capacity needs will often prove to be more cost-effective when using public provider resources over using only dedicated on-premises resources too, since when the spike is over, you no longer need to pay for the usage of the extra capacity required to meet the demand spike.  Some public providers also offer auto-scaling capabilities, where there systems will auto-scale service component tiers based on user-defined thresholds.

4.3.4 Health Model

Problem: If any component used to provide a service fails, it can cause performance degradation or unavailability of services.

Solution: Initially, you might think that the definition of health models for hybrid services will be more difficult than defining health models for services that only include components hosted on your private cloud.  Part of the reason for this may be a fear of the unknown.  You understand your private cloud systems, and can do deep troubleshooting on them, if necessary.  When using a public provider however, you have little understanding of the underlying hardware configuration, and no troubleshooting capability.  While this might initially be concerning, after your confidence in a public provider grows, you'll likely find that defining health models for service components hosted on a public cloud are even easier than when the components are hosted on your private cloud, since all of the hardware configuration and troubleshooting responsibility is now the public provider's, not yours.  As a result, your health models will have significantly less failure or degradation conditions, which also means less conditions that your systems must monitor for and remediate.  Some public cloud providers offer service level agreements (SLAs) that include an availability level that they commit to meet each month.  As long as your service provider meets this SLA, you no longer need be concerned with how the provider meets the SLA, only that it did meet it.  While this is true when consuming infrastructure as a service functionality from a public provider, it's even more true when consuming platform as a service (PaaS) capabilities from public providers.

4.3.5 Application

Problem: Not all applications are optimized for cloud infrastructures and may not be able to be hosted on cloud infrastructures.

Solution: Not all public cloud service providers support the same application patterns.  For example, if you have an application that relies upon Microsoft Windows Server Failover Clustering as its high-availability mechanism, this application can be thought of as using the stateful application pattern.  This application could be deployed with some public service providers, but not with others.  Among other reasons, Windows Server Failover Clustering requires some form of shared storage, a capability that few public service providers currently support.  It's important to understand which application patterns are used within the organization.  It's also important to identify which application patterns a public provider supports.  It's only possible to migrate applications that were designed with patterns supported by the public service provider.

4.3.6 Cost Model

Problem: Consumers tend to use more resources than they really need if there's no cost to them for doing so.

Solution: While a public provider will charge your organization based on consumption, you must decide what costs you'll show or charge your internal consumers for the resources.  You will likely show or charge a higher cost to your internal consumers than you were charged by the public provider.  This is largely due to the fact that you will probably integrate the public cloud provider's functionality with your private cloud functionality, and that integration most likely has a cost.  For example, you probably currently show or charge your internal consumers when they use a virtual machine on your private cloud.  You may provide some type of single sign-on capability to your internal consumers, and offer that capability with the virtual machines that are hosted on your private cloud.  As a result, some portion of the cost that you show or charge internal consumers for that virtual machine is the cost to provide the single sign-on capability.  A similar cost should be added to the virtual machines that are hosted with a public provider, if you also offer the same single sign-on capability for them.  You may add further costs to support additional capabilities such as monitoring, backup, or other capabilities for public cloud virtual machines too.

5.0  Physical Design Considerations

With an understanding of the requirements detailed in the Envisioning the Hybrid Cloud Solution section of this document, and the reference model and principles, you can select appropriate products and technologies to implement the hybrid cloud infrastructure design. The following table lists the hardware vendor-agnostic and Microsoft products, technologies, and services that can be used to implement various entities from the reference model that is defined in this document. 

Reference model entity

Product/technology/external service

Network (support and services)

  • Windows Azure Virtual Networks
  • Windows Server 2012 DNS services

Authentication (support and services)

  • Active Directory Domain Services (AD DS)
  • Windows Azure Active Directory (WAAD)

Directory (support and services)

  • Active Directory Domain Services (AD DS)
  • Windows Azure Active Directory (WAAD)

Compute (support and services)

  • Windows Azure Infrastructure Service Virtual Machines

Storage (support and services)

  • Azure Infrastructure Services blob storage

Network infrastructure

  • On premises VPN gateway
  • Existing on-premises network infrastructure
  • Windows Azure Virtual Networks

Compute infrastructure

  • Existing on-premises compute infrastructure—could be traditional data center, virtualized datacenter, or private cloud
  • Windows Azure Windows Server 2012–based compute infrastructure and virtualization platform

Storage infrastructure

  • Existing on-premises storage infrastructure
  • Azure Infrastructure Services storage infrastructure

After selecting the products, technologies, and services to implement the hybrid cloud infrastructure, you can continue the design of the hybrid cloud infrastructure solution. The sections that follow outline a logical design process for the service, but, as mentioned in the Envisioning the Hybrid Cloud Solution section of this document, the design and requirements definition process is iterative until it’s complete. As a result, after you make some design decisions in earlier sections of this document you may find that decisions you make in later sections require you to re-evaluate decisions you made in earlier sections. 

The primary sub-sections of this section are the “functional” design for the service, and align to entities in the reference model. Lower-level sub-sections then address specific design considerations which may vary from functional to service-level considerations. 

The remainder of the document addresses design considerations and the products, technologies, and services listed in the preceding table. In cases where multiple Microsoft products, technologies, and services can be used to address different design considerations, the trade-offs between them are discussed. In addition to Microsoft products, technologies, and services, relevant vendor-agnostic hardware technologies are also discussed. 

5.1  Overview

The physical design of the hybrid cloud infrastructure brings together the answers to the questions that were presented earlier in the document and the technology capabilities and options that are made available to you. The physical design that is discussed in this document uses a Microsoft–based, on-premises infrastructure and a Windows Azure Infrastructure Services–based public cloud infrastructure component. With that said, the design options and considerations can be applied to any on-premises and public cloud infrastructure provider combination. 

When considering the hybrid cloud infrastructure from the physical perspective, the primary issues that you need to address include:

  • Public cloud infrastructure service account acquisition and billing considerations
  • Public cloud infrastructure server provider authentication and authorization considerations
  • Network design considerations
  • Storage design considerations
  • Compute design considerations
  • Application authentication and authorization considerations
  • Management and support design considerations

We will discuss each of these topics in detail and will discuss the advantages and disadvantages of each of the options. In many cases, you will find that there is a single option. When this is true, we will discuss capabilities and possible limitations and how you can work with or around the limitations.  

5.2  Service Account Acquisition and Billing Considerations for Public Cloud Infrastructure

When designing a hybrid cloud infrastructure, the first issue you need to address is how to obtain and provision accounts with the public cloud infrastructure service provider. In addition, if the public cloud infrastructure service provider supports multiple payment options, you will need to determine which payment option best fits your needs now, and whether, in the future, you might want to reconsider the payment options that you’ve selected. 

For example, Windows Azure offers several payment plans:

  • Pay as you go—no up-front time commitment and you can cancel at any time
  • 6-months—pay monthly for six months
  • 6-months—pay for six months up front
  • 12-months—pay monthly for twelve months
  • 12-months—pay for twelve months up front

Pay as you go is the most expensive. Discounts are offered for each of the other four plans. You also have the choice to have the service billed to your credit card or your organization can be invoiced. 

For more information on Windows Azure pricing plans, see Windows Azure Purchase Options

You also need to consider whether you want to have the same person who owns the account (and therefore is responsible for paying for the service) to also have administrative control over the services that are running the public side of your hybrid cloud infrastructure. In most cases, the payment duties and the administrative duties will be separate. Determine whether your cloud service provider enables this type of role-based access control. 

For example, Windows Azure as the notions of accounts and subscriptions. The Windows Azure subscription has two aspects:

  • The Windows Azure account, through which resource usage is reported and services are billed.
  • The subscription itself, which governs access to and use of the Windows Azure services that are subscribed to. The subscription holder manages services (for example, Windows Azure, SQL Azure, Storage) through the Windows Azure Platform Management Portal.

A single Windows Azure account can host multiple subscriptions, which can be used by multiple teams responsible for the hybrid cloud infrastructure if you need additional partitioning of your services. 

It’s important to be aware that using a single subscription for multiple projects can be challenging from an organizational and billing perspective. The Windows Azure management portal provides no method of viewing only the resources used by a single project, and there is no way to automatically break out billing on a per-project basis. While you can somewhat alleviate organizational issues by giving similar names to all services and resources that are associated with a project (for example, HRHostedSvc, HRDatabase, HRStorage), this does not help with billing. 

Due to the challenges with granularity of access, organization of resources, and project billing, you may want to create multiple subscriptions and associate each subscription with a different project. Another reason to create multiple subscriptions is to separate the development and production environments. A development subscription can allow administrative access by developers while the production subscription allows administrative access only to operations personnel.  

Separate subscriptions provide greater clarity in billing, greater organizational clarity when managing resources, and greater control over who has administrative access to a project. However this approach can be more costly than using a single subscription for all of your projects. You should carefully consider your requirements against the cost of multiple subscriptions. 

For more information on Windows Azure accounts and subscriptions, see What is an Azure Subscription

For more information on account acquisition and subscriptions, see Provisioning Windows Azure for Web Applications

You need to determine how the public cloud service provider partitions the services for which you will be billed. For example:

  • Does the public cloud infrastructure service provider surface all cloud infrastructure services that are part of a single offering?
  • Does the public cloud infrastructure service provider require that you purchase each infrastructure service separately?
  • Does the public cloud infrastructure provider provide their entire range of cloud infrastructure services as a single entity, but also make available some value-added services that you can purchase separately?  

Note that in all three of these cases, the public cloud service provider would bill based on usage, because metered services is an essential characteristic of cloud computing.  

For example, Windows Azure Infrastructure Services is a collection of unique service offerings within the entire portfolio of Azure service offerings. Specifically, Azure Infrastructure Services includes Azure Virtual Machines and Azure Virtual Networks. In addition, Azure Virtual Networks takes advantage of some of the PaaS components of the system to enable the site-to-site and point-to-site VPN gateway. However, when you obtain an Azure account and set up a subscription, all of the Windows Azure services are available to you with the exception of some additional value-added services that you can purchase separately. 

5.3  Network Design Considerations


In most cases, a hybrid cloud infrastructure requires you to extend your corporate network to the cloud infrastructure service provider’s network so that communications are possible between the on-premises and off-premises components. There are several primary issues that you need to consider when designing the networking component to support the hybrid cloud infrastructure. These include:
  • On-premises physical network design
  • Inbound connectivity to the public infrastructure service network
  • Load balancing inbound connections to public infrastructure service virtual machines
  • Name resolution for the public infrastructure service network

This section expands each of these issues.

5.3.1  On-Premises Physical Network Design

You need to consider the following issues when deciding what changes you might need to make to the current physical network:

  • How will you connect the on-premises network to the public infrastructure services network?
  • What path should the on-premises users take to access resources in the public cloud infrastructure provider’s network?
  • What network access controls will you use to control access between on-premises and off-premises resources?

5.3.1.1  Network Connection Between On-Premises and Off-Premises Resources

There are typically three options available to you to connect on- premises and off-premises resources:
  • Site-to-site VPN connection
  • Dedicated WAN link
  • Point-to-site connection

Site-to-Site VPN
A site-to-site VPN connection enables you to connect entire networks together. Each side of the connection hosts at least one VPN gateway, which essentially acts as router between the on-premises and off-premises networks. The routing infrastructure on the corporate network is configured to use the IP address of the local VPN gateway to access the network ID(s) that are located on the public cloud provider’s network that hosts the virtual machines that are part of the hybrid cloud solution. 

For more information about site-to-site VPNs, see What is VPN? 

Windows Azure Virtual Networks
Windows Azure enables you to put virtual machines on a virtual network that is contained within the Windows Azure infrastructure. Virtual Networks enable you to create a virtual network and place virtual machines into the virtual network. When virtual machines are placed into an Azure Virtual Network, they will be automatically assigned IP addresses by Windows Azure, so all virtual machines must be configured as DHCP clients. However, even though the virtual machines are configured as DHCP clients, they will keep their IP addressing information for the lifetime of the virtual machine.       

Note:
The only time when a virtual machine will not keep an IP address for the life of the virtual machine on an Azure Virtual Network is when a virtual machine might need to be moved as a consequence of “service healing.” If a virtual machine is created in the Windows Azure portal, and it then experiences service healing, that virtual machine is assigned a new IP address. You can avoid this by creating the virtual machine by using PowerShell instead of creating it in the Windows Azure portal. For more information on service healing, please see Troubleshooting Deployment Problems Using the Deployment Properties.

Virtual machines on the same Azure Virtual Network will be able to communicate with one another only if those virtual machines are part of the same cloud service. If the virtual machines are on the same virtual network and are not part of the same cloud service, those virtual machines will not be able to communicate with one another directly over the Azure Virtual Network connection. 

You can use an IPsec site-to-site VPN connection to connect your corporate network to one or more Azure Virtual Networks. Windows Azure supports several VPN gateway devices that you can put on your corporate network to connect your corporate network to an Azure Virtual Network. The on-premises gateway device must have a public address and must not be placed behind a NAT device.  

For more information on which VPN gateway devices are supported, see About VPN Devices for Virtual Network

Note:
While you can connect your on-premises network to multiple Azure Virtual Networks, you cannot connect a single Azure Virtual Network to multiple on-premises points of presence. 

A single Azure Virtual Network can be assigned IP addresses in multiple network IDs. You can obtain a summarized block of addresses that represents the number of addresses you anticipate you will need and then you can subnet that block. However, connections between the IP subnets are not routed, and therefore there are no router ACLs that you can apply between the IP subnets.  

However, you should still consider whether you will want multiple subnets. One reason for multiple subnets is for accounting purposes, where virtual machines that match certain roles within your hybrid cloud infrastructure are placed on specific subnets that are assigned to those roles. However, you can use Network ACLs to control traffic between virtual machines in an Azure Virtual Network. For more information on Network ACLs in Azure Virtual Networks, please see setting an Endpoint ACL on a Windows Azure VM

You should also consider the option of using multiple Azure Virtual Networks to support your hybrid cloud infrastructure. While different Azure Virtual Networks can’t directly communicate with each other over the Azure network fabric, they can communicate with each other by looping back through the on-premises VPN gateway. Keep in mind that there are egress traffic costs that are involved with this option, so you need to assess cost issues when considering this option. This is also the case when you host some virtual machines in the Windows Azure PaaS services (which are part of a different cloud service than the virtual machines). The virtual machines in the PaaS services need to loop back through the on-premises VPN gateway to create the machines in the Azure Infrastructure Services Azure Virtual Networks.  

You should decide on the IP addressing scheme, whether to use subnets, and the number of Azure Virtual Networks you will need before creating any virtual machines. After these decisions are made, you should create or move virtual machines onto those virtual networks.  

Another important consideration is that Azure site-to-site VPN uses pre-shared keys to support the IPsec connection. Some enterprises may not consider pre-shared keys as an enterprise ready approach for supporting IPsec site to site VPN connections, so you will want to confer with your security team to determine if this approach is consistent with corporate security policy. For more information on this issue, please see Preshared Key Authentication. Note that your IT organization may consider the security and management issues for pre-shared keys to be an remote access VPN client problem only. 

For more information on Azure Virtual Networks and how to configure and manage them, see Windows Azure Virtual Network Overview

Dedicated WAN Link
A dedicated WAN link is a permanent telco connection that is established directly between the on-premises network and the cloud infrastructure service provider’s network. Unlike the site-to-site VPN, which represents a virtual link layer connection over the Internet, the dedicated WAN link enables you to create a true link layer connection between your corporate network and the service provider’s network.  

For more information on dedicated WAN links, see Wide Area Network

At the time this document was written, Windows Azure did not support dedicated WAN link connections between the on-premises network and Azure Virtual Networks.  

Point-to-Site Connections
A point-to-site connection (typically referred to as a remote access VPN client connection) enables you to connect individual devices to the public cloud service provider’s network. For example, suppose you have a hybrid cloud infrastructure administrator working from home from time to time. The administrator could establish a point-to-site connection from his computer in his home to the entire public cloud service provider’s network that hosts the virtual machines for his organization. 

For more information on remote access VPN connections, see Remote Access VPN Connections

Windows Azure supports point-to-site connectivity that uses a Secure Socket Tunneling Protocol (SSTP)–based remote access VPN client connection. This VPN client connection is done using the native Windows VPN client. When the connection is established, the VPN client can access any of the virtual machines over the network connection. This enables administrators to connect to the virtual machines using any administrative web interfaces that are hosted on the virtual machines, or by establishing a Remote Desktop Protocol (RDP) connection to the virtual machines. This enables hybrid cloud infrastructure administrators to manage the virtual machines at the machine level without requiring them to open publically accessible RDP ports to the virtual machines.  

In order to authenticate VPN clients, certificates must be created and exported. If you have a PKI, you can use an X.509 certificate issued by your CA. If you don’t have a PKI, you must generate a self-signed root certificate and client certificates chained to the self-signed root certificate. You can then install the client certificates with private key on every client computer that requires connectivity. 

For more information on point-to-site connections to Windows Azure Virtual Networks, see About Secure Cross-Premises Connectivity.  

The following table lists the advantages and disadvantages of each of the approaches that are discussed in this section. 

Connectivity options

Advantages

Disadvantages

Site-to-site VPN

  • Low cost.
  • Mature technology.
  • Easy to configure.
  • Existing on-premises equipment typically supports site-to-site VPNs.
  • Depends on Internet connectivity.
  • Performance may be limited.
  • May not support high availability technologies/arrays.
  • Protocol overhead from VPN and encryption protocols.

Dedicated WAN Link

  • High performance.
  • High availability.
  • Link layer connection.
  • Routing protocol support.
  • Costly.
  • Cloud infrastructure service provider might not offer this option. 
  • May require telco support and equipment to be installed.

Point-to-site connection (remote access VPN client connection)

  • Easy to configure.
  • Enables remote administration without enabling publicly accessible RDP connectivity.
  • Forces authenticated access to the Virtual Network before connecting to virtual machines.
  • Can be used by VPN clients both on and off premises.
  • Can traverse web proxies and firewalls.
  • May not be viable solution to support on-premises components of hybrid applications.
  • Does not connect entire networks, only single VPN clients to the virtual network.
  • Is not designed for server-to-server communications in a hybrid cloud infrastructure

5.3.2  Inbound Connectivity to the Public Cloud Infrastructure Service Network

Inbound connectivity to the public cloud infrastructure provider’s network is about how users will connect to the services that are hosted by the virtual machines within the provider’s network. Important options to consider include:

  • All access to services that are hosted on the public cloud infrastructure provider’s network will be done over the Internet.
  • All access to services that are hosted in the public cloud infrastructure provider’s network will be done over the corporate network and any site-to-site VPN or dedicated WAN link connection that connects the corporate network to the public cloud infrastructure service provider’s network.
  • Some access to services that are hosted on the public cloud infrastructure provider’s network will be done over the Internet and some will be done from within the corporate network.

All Access to Cloud Hosted Services is Through the Internet
With the first option, all connections to services in the public cloud infrastructure service provider’s network will be made over the Internet. It doesn’t matter whether the client system is inside the corporate network or outside the corporate network. With this configuration you need to maintain only a single DNS entry for inbound access to the service, because all client machines will be accessing the same IP address, In Windows Azure, this is the address of the VIP that is assigned to the front-ends of the service that is hosted in the Azure Infrastructure Services Virtual Network.  

All Access to Cloud Hosted Services is Through Site-to-Site VPN or WAN Link
The second option represents the opposite of the first, in that all clients that need to connect to parts of the service that are hosted in the public cloud infrastructure provider’s network will need to do it from within the confines of the corporate network. The service will not be available to users on the Internet “at large” and client systems will have to take a path through the corporate network to reach the services.  

That doesn’t mean that the client systems must be physically attached to the corporate network (or attached through the corporate wireless). A client system could be off-site, but connected to the corporate network over a remote-access VPN client connection or similar technology, such as Windows DirectAccess. The DNS configuration in this case would require just a single entry, because all access to the resources in the public cloud infrastructure service provider’s network will be to the IP address that is assigned to the virtual machine in the public cloud infrastructure service provider’s network. In an Azure Virtual Network, this would be the DIP that is assigned to the front-end virtual machines of the service. 

For more information on DirectAccess in Windows Server 2012, see Remote Access (DirectAccess, Routing and Remote Access) Overview

Access to Cloud Hosted Services Varies with Client Location
The third option allows for hosts that are not connected to the corporate network to connect through the Internet to the service that is hosted in the public cloud infrastructure service provider’s network. Clients that are connected to the corporate network can access the service by going through a site-to-site VPN or dedicated WAN link that connects the corporate network to the public cloud infrastructure service provider’s network.  

This option requires that you maintain a DNS record that client systems can use when they are not on the corporate network, which in Azure represents the VIP that is used to access the virtual machine. It also requires a DNS record that clients will use when they are connected to the corporate network, which in Azure represents the DIP that is assigned to the virtual machine. This design requires that you create a split DNS infrastructure.  

For more information on a split DNS infrastructure, see You Need A Split DNS! 

The following table describes some of the advantages and disadvantages of the three options for inbound connectivity. 

Inbound connectivity option

Advantages

Disadvantages

All inbound access is done over the Internet.

  • Requires only a single DNS entry per service.
  • Reduces the amount of traffic over the link that connects the corporate network to the public cloud infrastructure service provider’s network.
  • Doesn’t require clients to connect to the corporate network to access the service.
  • Can support both managed and unmanaged clients, depending on the service.
  • The service is potentially accessible to any Internet connected device.

All inbound access is done through the corporate network.

  • Requires only a single DNS entry per service.
  • Service is more secure because it is not exposed to all Internet connected devices.
  • Able to use corporate network access controls, choke points, and auditing for access to the service.
  • Requires all client systems to be on the corporate network or have remote access to the corporate network.
  • Connectivity requirements might cause some users of the service to avoid using it when they are out of the office.
  • Not a viable option when the service needs to be accessible to users who are not authorized to connect to the corporate network, or authorized users who are not using machines that are authorized to connect to the corporate network.
  • Client traffic moving through the inter-site interface (site-to-site VPN or WAN link) may adversely impact network performance of the service as a whole.

Some inbound access is over the Internet and some is over the corporate network.

  • Is the most flexible solution.
  • Supports both managed and unmanaged devices.
  • Supports users who are authorized and not authorized to connect to the corporate network.
  • Reduces the amount of client traffic pressure on the inter-site link.
  • Requires two DNS entries and a split DNS infrastructure, which is more complex to manage.

5.3.3  Load Balancing of Inbound Connections to Virtual Machines of a Public Infrastructure Service

Services that you place in the public cloud infrastructure service provider’s network may need to be load balanced to support the performance and availability characteristics that you require for a hybrid application running on a hybrid cloud infrastructure. There are several ways you can enable load balancing of connections to services that are hosted on the public cloud infrastructure service provider’s network. These include:

  • Use a load balancing mechanism that is provided by the public cloud infrastructure service provider that’s integrated with the service provider’s fabric management system.
  • Use some form of network load balancing that is enabled by the operating systems, or use an add-on product that runs on the virtual machines themselves.
  • Use an external network load balancer to perform load balancing of the incoming connections to the service components that are hosted in the public cloud infrastructure provider’s network.

Load Balancing Mechanism Provided by the Public Cloud Infrastructure Service Provider
The first option requires that the service provider has a built-in load balancing capability that is included with its service offering. In Windows Azure, external communication with virtual machines can occur through endpoints. These endpoints are used for various purposes, such as load-balanced traffic or direct virtual machine connectivity, like RDP or SSH.  

Windows Azure provides round-robin load balancing of network traffic to publicly defined ports of a cloud service that is represented by these endpoints. For virtual machines, you can set up load balancing by creating new virtual machines, connecting them under a cloud service, and then adding load-balanced endpoints to the virtual machines. 

For more information on load balancing for virtual machines in Windows Azure, see Load Balancing Virtual Machines 

Load Balancing Enabled on the Virtual Machines
The second option requires that the operating system running on the virtual machines in the public cloud infrastructure service provider’s network must run some kind of software-based load-balancing system.  

For example, Windows Server 2012 includes the Network Load Balancing feature, which can be installed on any virtual machine that runs that operating system. There are other load balancing applications that can be installed on virtual machines. The service provider must be able to support these guest-based load balancing techniques, because they often change the characteristics of the MAC address that is exposed to the network. At the time this paper was written, Azure Virtual Networks did not support this type of load balancing. 

For more information about Windows Server Network Load Balancing, see Network Load Balancing Overview 

Use an External Network Load Balancer
The third option is a relatively specialized one because it requires that you can control the path between the client of the service that is hosted in the public cloud infrastructure service provider’s network and the destination virtual machines. The reason for this is that the clients must pass through the dedicated hardware load balancer so that the hardware load balancer can perform the load balancing for the client systems.  

This option is likely not going to be available from the public cloud infrastructure service provider’s side, because public providers in general do not allow you to place your own equipment on their network. This method would work if you are hosting an application in the service provider’s network that is accessible only to clients on the corporate network. Because you have control of what path internal clients will use to reach the service, you can easily put a load balancer in the path.  

For more information on external load balancers, see Load Balancing (computing) 

The following table describes the advantages and disadvantages of each of these three approaches. 

Load-balancing mechanism

Advantages

Disadvantages

Public cloud infrastructure service provider load balancing solution

  • Potentially fully supported by the public cloud infrastructure provider’s fabric management system.
  • Managed by the service provider.
  • May be more cost effective than other solutions.
  • May have limited functionality compared to other solutions.

OS-based or add-on load-balancing solution

  • Mature technology.
  • IT department personnel have experience with these solutions that are typically used on premises.
  • Can be highly customizable.
  • Requires support by the public cloud infrastructure service provider.
  • May not work on all public cloud infrastructure service provider networks.
  • If used by the service when on premises, may need to be replaced with alternate method if service components are moved into the public cloud infrastructure service provider network.

External load-balancing solution

  • High performance.
  • Network personnel are well acquainted with the technology.
  • Large number of options to allow for bandwidth control and load balancing methods.
  • Available only for a scenario where internal clients are accessing resources in service provider’s cloud.
  • External load balancers may be cost prohibitive.

5.3.4  Name Resolution for the Public Infrastructure Service Network

Name resolution is a critical activity for any application in a hybrid cloud infrastructure. Applications that span on-premises components and those in the public cloud infrastructure provider’s network must be able to resolve names on both sides in order for all tiers of the application to work easily with one another.

There are several options for name resolution in a hybrid cloud infrastructure:

  • Name resolution support that is provided by the public cloud infrastructure service provider
  • Name resolution support that is based on your on-premises DNS infrastructure
  • Name resolution support that is based on external DNS infrastructure

Name Resolution Services Provided by the Public Cloud Infrastructure Service Provider
The public cloud infrastructure service provider may provide some type of DNS services as part of its service offering. The nature of the DNS services will vary. For example, Azure Virtual Networks provide basic DNS services for name resolution of virtual machines that are part of the same cloud service. Be aware that this is not the same as virtual machines that are on the same Azure Virtual Network. If two virtual machines are on the same Azure Virtual Network, but as not part of the same cloud service, they will not be able to resolve each other’s names by using the Azure Virtual Network DNS service.  

For more information on Azure Virtual Network DNS services, see Windows Azure Name Resolution.  

Name Resolution Services Based on On-Premises DNS Infrastructure
The second option is the one you’ll typically use in a hybrid cloud infrastructure where applications span on-premises networks and cloud infrastructure service provider’s networks. You can configure the virtual machines in the service provider’s network to use DNS servers that are located on premises, or you can create virtual machines in the public cloud infrastructure service provider’s network that hosts corporate DNS services and are part of the corporate DNS replication topology. This makes name resolution for both on-premises and cloud based resources to be available to all machines that support the hybrid application. 

Name Resolution Services External to Cloud and On-Premises Systems
The third option is less typical, as it would be used when there is no direct link, such as a site-to-site VPN or dedicated WAN link, between the corporate network and the public cloud infrastructure services network. However, in this scenario you still want to enable some components of the hybrid application to live in the public cloud and yet keep some components on premises. Communications between the public cloud infrastructure service provider’s components and those on premises can be done over the Internet. If on-premises components need to initiate connections to the off-premises components, they must use Internet host name resolution to reach those components. Likewise, if components in the public cloud infrastructure service provider’s network need to initiate connections to those that are located on premises, they would need to do so over the Internet by using a public IP address that can forward the connections to the components on the on-premises network. This means that you would need to publish the on-premises components to the Internet, although you could create access controls that limit the incoming connections to only those virtual machines that are located in the public cloud infrastructure services network. 

The following table describes some advantages and disadvantages of each of these approaches. 

Name resolution approach

Advantages

Disadvantages

Public cloud infrastructure service provider supplies DNS.

  • Simple to use and configure.
  • Cost effective.
  • Integrated with public cloud infrastructure fabric management system.
  • Limited scope in some instances.
  • May not integrate with on-premises DNS infrastructure.
  • May not enable name resolution among all cloud services.

DNS is integrated with on-premises DNS infrastructure.

  • Enables name resolution across public and private tiers of the hybrid cloud infrastructure.
  • Mature technology.
  • IT staff has experience with current DNS infrastructure design and architecture.
  • Can be integrated with Active Directory domain controllers that you place on the service provider’s network.
  • Requires that you install one or more DNS servers in the public cloud infrastructure service provider’s network.

DNS is based on public/external DNS infrastructure.

  • Can be used when there is no direct link between the on-premises network and the service provider network.
  • Supports public host name resolution without requiring access to on-premises DNS resolvers.
  • Limited number of scenarios where this might be useful.
  • Requires that you publish on-premises resources that are part of the hybrid application.

5.4  Storage Design Considerations

When considering options for storage in a hybrid cloud infrastructure scenario, you will need to assess current storage practices and storage options that are available with your public cloud infrastructure service provider.  

Storage issues that you might consider include:

  • Storage tiering options
  • IaaS database options
  • PaaS database options

5.4.1  Storage Tiering

Storage tiering enables you to place workloads on storage that support the IOPS requirements of a particular workload. For example, you might have a database bound application that needs to handle a large number of transactions per second. You would want the public cloud infrastructure service provider to have an option for you to host your database on fast storage, perhaps Solid State Disk (SSD) storage. On the other hand, you may have other applications that do not require ultra-fast storage, in which case you could put those applications in a slower storage tier. The assumption is that the public cloud infrastructure service provider will charge more for the high performance storage and less for the less performant storage. 

At the time this document was written, Azure Infrastructure Services does not provide an option for tiered storage. However, the service constantly evolves. Make sure to refer to the Windows Azure documentation pages on a regular basis during your design process.

5.4.2  IaaS Database

There are scenarios in a hybrid cloud infrastructure where the front-end and application tiers will be hosted in the public cloud infrastructure service provider’s network and the database tier is hosted on premises. Another possibility is that the front-end, application and database tiers are hosted on the public cloud infrastructure service provider’s network. In this scenario, you will need to investigate whether or not the public cloud infrastructure service provider supports running database applications on virtual machines hosted on its network. 

Windows Azure supports placing SQL Server on Azure Infrastructure Services. For applications that need full SQL Server functionality, Azure Infrastructure Services is a viable solution. SQL Server 2012 and SQL Server 2008 R2 images are available and they include standard, web and enterprise editions. If you have an existing SQL Server license with software assurance, you can move your existing license to Windows Azure and only pay for compute and storage.

Running SQL Server in Azure Infrastructure Services is an viable option in the following scenarios:

  • Developing and testing new SQL Server applications quickly
  • Hosting your existing Tier 2 and Tier 3 SQL Server applications
  • Backing up and Restoring your On-Premises databases
  • Extending On-Premises Applications
  • Create Multi-Tiered Cloud Applications

5.4.3  PaaS Database and Storage

While the focus of this document is on core IaaS functionality and considerations in a hybrid cloud infrastructure, there may be scenarios where you will want to take advantage of a PaaS database offering provided by your public cloud infrastructure service provider. Sometimes referred to as “Database as a Service”, you can take advantage of this option to simplify your design by allowing the service provider to manage the infrastructure that supports the database application so that you can focus on the front-end and application tiers. 

Windows Azure has a PaaS database as a service offering. For applications that need a full featured relational database-as-a-service, Windows Azure offers SQL Database, formerly known as SQL Azure Database. SQL Database offers a high-level of interoperability, enabling you to build applications using many of the major development frameworks.  

Table storage is another option that your public cloud service provider might offer. This can be used to store large amounts of unstructured data. Windows Azure offers table based storage that is a ISO 27001 certified managed service which can auto scale to meet massive volume of up to 200 terabytes. Tables are accessible from virtually anywhere via REST and managed APIs. 

Finally, your public cloud infrastructure service provider may offer blob storage for your applications and virtual machines. Blobs are easy way to store large amounts of unstructured text or binary data such as video, audio and virtual machine images. Like table storage, Windows Azure Blobs are an ISO 27001 certified managed service which can auto scale to meet massive volume of up to 200 terabytes a. Blobs are accessible from virtually anywhere via REST and managed APIs. 

For more information on these storage options, please see Azure Data Management

5.5 Compute Design Considerations

Compute design considerations center on the virtual machines that will be hosted on premises and in the public cloud service provider’s network. In some cases, the only virtual machines that are participating in a hybrid cloud infrastructure will be on the public cloud infrastructure service provider’s network, since the on-premises resources will be hosted on physical hardware instead of being virtualized. Whether current services are run on physical or virtualized hardware, you will need to take into account issues related to the virtual machine offering made available by the public cloud service provider.

Consider the following issues when designing the hybrid cloud infrastructure’s compute components:

  • Does your public infrastructure service provider make operating system images available?
  • Can you port on-premises images into the public cloud service provider’s network?
  • What types of disks does the public infrastructure service provider make available?
  • What level of customization for virtual machine virtual hardware is available?
  • How will you access the virtual machines on the public cloud infrastructure service provider’s network?
  • What virtual machine availability options does your public cloud infrastructure service provider support?
  • What are your backup and disaster recovery options?  

5.5.1  Operating System and Service Images

An image is a virtual disk file that you use as a template to create a new virtual machine. An image is a template because, unlike a running virtual machine, it doesn't have specific settings such as the computer name and user account settings. When you create a virtual machine from an image, an operating system disk is automatically created for the new virtual machine.

Some public cloud infrastructure service providers will provide images that not only contain operating systems, but also contain services that run on top of the operating system. These are sometimes referred to as “service templates” and such templates can enable you to stand up services more quickly than it would be if you had to first install the operating system and then install the services that you want to run.  

Windows Azure makes both operating system and service images available to you. You can either use an image provided by Windows Azure in the Image Gallery, or you can create your own image to use as a template. For example, you can create a virtual machine from an image in the Image Gallery. Windows Azure provides a selection of Windows and Linux images, as well as images that have BizTalk and other applications already installed.  

For more information on operating system and service images in Windows Azure, please see Manage Disks and Images

5.5.2 On-Premises Physical and Virtual Service Images and Disks

Another option available to you when designing a hybrid cloud infrastructure is to create your own images and post them to the public cloud infrastructure service provider s network This enables you to:
  • Create your own operating system images that contain your own customizations
  • Create your own service images, that contain the services that you want to be ready to run
  • Perform physical to virtual conversions so that you can move applications running on physical hardware to virtual hardware on the public cloud infrastructure service provider’s network
  • Move virtual disks that host services in your own datacenter to the public cloud infrastructure service provider’s network

Windows Azure enables you to not only use images provided by Azure, but also images that you create on premises. To create a Windows Server image, you must run the Sysprep command on your development server to generalize and shut it down before you can upload the .vhd file that contains the operating system.  

For more information about using Sysprep, see How to Use Sysprep: An Introduction.  

To create a Linux image, depending on the software distribution, you must run a set of commands that are specific to the distribution and you must run the Windows Azure Linux Agent. 

For more information on creating and moving on premises disk images, please see Manage Disks and Images.

5.5.3  Virtual Disk Formats and Types

Virtual Disk Formats
You will need to consider what virtual disk formats are supported by your public cloud infrastructure services provider. Each virtualization platform vendor typically supports its own virtual disk container format. You will need to determine which virtual disk formats are supported by the public cloud infrastructure service provider. If the provider you choose does not support the disk formats you currently have in production for the services you want to move to the infrastructure service provider’s network, then you will need to perform a disk format conversion prior to posting those disks into your public cloud infrastructure service provider’s network. 

For example, Windows Azure currently supports only the .vhd file format. If you have virtual machines running on a non-Hyper-V virtualization infrastructure, or if you have virtual machines running on a Windows Server 2012 virtualization infrastructure that use the .vhdx format, you will need to convert those disk formats to .vhd. There are a number tools available for converting disk formats. For one example, please see How to Deploy a Virtual Machine by Converting a Virtual Machine (V2V).  

Virtual Disk Types
Some public cloud infrastructure service providers will make different virtual disk types available to you that you can use in your hybrid cloud infrastructure. These virtual disk types might be useful in different scenarios, such as disks that can be used as operating system disks or storage disks. 

Azure Infrastructure Services supports an operating system disk VHD that you can boot and mount as a running version of an operating system. Any VHD that is attached to virtualized hardware and that is running as part of a service is an operating system disk. After an image is provisioned, it becomes an operating system disk. An operating system disk is always created when you use an image to create a virtual machine. The VHD that is intended to be used as an operating system disk contains the operating system, any operating system customizations, and your applications. Azure Infrastructure Services operating system disks are read-write cache enabled. 

Azure Infrastructure Services also supports a VHD can be used as a data disk to enable a virtual machine to store application data. After you create a virtual machine, you can either attach an existing data disk to the machine, or you can create and attach a new data disk. Whenever you use a data-intensive application in a virtual machine, it’s highly recommended that you use a data disk to store application data, rather than using the operating system disk. Azure Infrastructure Services data disks by default have read-write caching disabled. 

A third type of disk known as a "Caching Disk" it automatically included with any virtual machine created in Azure Infrastructure Services. This disk is used for the pagefile by default. If you have other temporary data that you want to save to local storage, you can place that data on the Caching disk. The information on the Caching Disk is not persistent and does not survive reboots of the virtual machine.  

For more information about Azure Infrastructure Services operating system and data disks, please see Azure Virtual Machines

5.5.4  Virtual Machine Customization

Different public cloud infrastructure service providers will provide various levels of customization for your virtual machines. Typical customizations at the infrastructure layer include how much memory, how many and speeds of processors, and how much storage you can make available to a virtual machine. In some cases the public cloud infrastructure service provider will allow granular options for provisioning memory, processors and storage, and in some cases the provider will require you to select from a set of “t-shirt” sized virtual machines with each size defining the amount of processing, memory and storage resources available for that size. Windows Azure Infrastructure Services uses this “t-shirt” size model.  

For more information on the types of virtual hardware available to you, please see Virtual Machines

The amount you pay for virtual machines on the public cloud infrastructure service provider’s network is typically proportional to the size and number of virtual machines you choose. Consider what virtual machines you require to support your hybrid cloud infrastructure in advance. Investigate whether or not the public cloud infrastructure service provider has a price calculator that will assist you in estimating the costs of running the virtual machines you require, in advance. 

Windows Azure Infrastructure Services has a pricing calculator to help you assess what your costs will be. Please see Windows Azure Pricing Calculator

5.5.5  Virtual Machine Access

You will need to consider how you will access the virtual machines running on the public cloud infrastructure service provider’s network. The method of access will vary with the operating system running within the virtual machine. For Windows based operating systems, you have the option to use the Remote Desktop Protocol (RDP) to connect to the virtual machine so that you can manage it. You also have the option of using remote PowerShell commands. If the virtual machine is running a Linux-based operating system, you can use the SSH protocol.  

For more information about logging on to a virtual machine running Windows Server in Azure Infrastructure Services, see How to Log on to a Virtual Machine Running Windows Server 2008 R2.  

For more information about logging on to a virtual machine running Linux in Azure Infrastructure Services, see How to Log on to a Virtual Machine Running Linux.

5.5.6  Virtual Machine and Service Availability

Service Availability
When designing a hybrid cloud infrastructure you will need to consider how you will make the virtual machines running in the public cloud infrastructure service provider’s network highly available. You will need to consider how to make the application highly available as well as the virtual machines that run the application. 

Load balancing incoming connections to the virtual machines running the application can help increase application availability. Incoming connections can be spread across multiple virtual machines. These virtual machines typically host the front-end stateless component of the application. If one of the virtual machines hosting the front-end component becomes disabled, connections can be load balanced to other front-end virtual machines. Different public cloud service providers will likely use different load balancing algorithms, so you will want to consider the load balancing algorithm used by the provider when designing application high availability into your hybrid cloud infrastructure. 

Azure Infrastructure Services supports load balancing connections to virtual machines on an Azure Virtual Network. For more information about this, please see Load Balancing Virtual Machines

Virtual Machine Availability
The hardware that supports the virtual machines needs to be maintained on a periodic basis. Your public cloud infrastructure service provider will need to schedule times when software and hardware is serviced and upgraded. In order to make sure that the services that run on those virtual machines continue to be available during maintenance and upgrade windows, you need to consider options that the public cloud service provider makes available to you to prevent downtime during these cycles. 

For example, Windows Azure periodically updates the operating system that hosts the virtual machines. A virtual machine is shut down when an update is applied to its host server. An update domain is used to ensure that not all of the virtual machine instances are updated at the same time. When you assign multiple virtual machines to an availability set, Windows Azure ensures that the machines are assigned to different update domains. The previous diagram shows two virtual machines running Internet Information Services (IIS) in separate update domains and two virtual machines running SQL Server also in separate update domains. 

For more information on availability for Azure Infrastructure Services virtual machines, please see Manage the Availability of Virtual Machines.  

5.6  Management and Support Design Considerations

From the perspective of basic cloud infrastructure considerations, there are some basic issues around management and support design that you'll want to consider.  The primary areas include, but are not limited to, the following:

  • Consumer and providers portals
  • Usage and billing
  • Service reporting
  • Public cloud infrastructure service provider authentication and authorization
  • Application authentication and authorization
  • Backup services and disaster recovery
The remainder of this section discusses the options and considerations in these areas.

5.6.1 Consumer and Provider Portal

If you’re providing cloud services to your consumers today, then you already provide them a consumer portal.  When your users interact with Windows Azure services, they use the Windows Azure Management Portal as a consumer portal.  How similar, or different, is the Windows Azure Management Portal experience to the consumer portal experience you provide to your consumers for your private cloud services?  Recall the Create a Seamless User Experience principle mentioned earlier in this document. 

A few options are available to you to provide a seamless user experience across both your private cloud services and the Windows Azure public cloud services.

  • System Center 2012 App Controller App Controller provides a common self-service experience that can help you configure, deploy, and manage virtual machines and services across your private cloud, the Windows Azure public cloud, as well as public clouds provided by some public hosting service providers.  If your consumers use App Controller to provision new capacity, they could use it instead of using the Windows Azure Management Portal for many of their tasks, though some tasks would still need to be completed through the Windows Azure Management Portal.
  • Windows Azure Services for Windows Server: Windows Azure Services for Windows Server includes a consumer portal that you can install on-premises.  It integrates with System Center 2012 Virtual Machine Manager, and provides an almost-identical experience to the Windows Azure Management portal experience.  It does not however, enable you to provision services both on-premises and on Windows Azure, as App Controller does.  So while it provides a similar experience to the Windows Azure Portal, your consumers will still have to use your on-premises portal for provisioning on-premises services, and the Windows Azure Management Portal to provision Windows Azure services.
Note
Windows Azure Services for Windows Server integrates with Windows Server 2012 and System Center 2012.  The next version of Windows Azure Services for Windows Server is the Windows Azure Pack for Windows Server.  It will integrate with Windows Server 2012 R2 and System Center 2012 R2. 

5.6.2  Usage and Billing

If you’re providing cloud services to your consumers today, then you are already able to track resource consumption by your consumers.  You use this data to either charge your customers for their consumption, or simply report back to them on their consumption.  Public cloud service providers each have their own pricing and billing options.   

Windows Azure Virtual Machines pricing is publicly available, and provides purchase options by credit card or invoicing. Purchase options are connected to a Microsoft Account.  When using Windows Azure Services, you’ll need to determine which purchase option you’ll choose, and how those costs will either be charged back or shown back to the individuals within the organization that consumed the resources.  As of the writing of this document, Windows Azure billing is provided at the subscription level, and doesn’t provide much granularity for the individual resources consumed within a subscription.  Thus, you may decide to setup multiple subscriptions to track resource consumption, or strategies for tracking resource consumption through a single subscription.  

5.6.3 Service Reporting

If you’re providing cloud services to your consumers today, then you are already provide reports to your consumers as to whether services met their service level agreements (SLAs) in areas such as performance and availability.   Public services providers offer SLAs for the services they provide, as well as service reporting so you know whether or not they met their SLAs.   

Windows Azure provides availability SLAs for its various services.  An example of the availability SLA offered with the Windows Azure Virtual Machines service is defined in the Virtual Machines article. You’ll need to decide whether it’s possible to integrate the service reporting offered by the public provider with your own service reporting capability.  If it is possible, you’ll need to determine whether you want to integrate the public providers’ service reporting with your own or not.  If you’re providing a service to your consumers which has some components running on-premises, and others running on a public provider’s cloud, you’ll have to integrate the service reporting capability so that you can provide service level reporting to your consumers.  

5.6.4  Public Cloud Infrastructure Service Provider Authentication

When working with a public cloud infrastructure service provider’s system, you need to understand what authentication and authorization/access control options are available to you. In addition, you’ll need to understand how authentication and authorization come together to support your overall account management requirements. In this section we will cover these issues.

5.6.4.1  Authentication

Users need to authenticate to the provider’s system to gain access to system resources. When designing your hybrid cloud infrastructure, you need to determine what authentication options are available to you and what the advantages and disadvantages might be to each approach. 

There are several options that might be possible for your authentication to the service provider’s system design:

  • The service provider maintains an authentication system completely separate from yours. The service provider requires you to create accounts on that system and those accounts are managed separately from accounts that you maintain on premises.
  • The service provider and the enterprise IT group can create a direct federation between their systems or use some method of directory synchronization.
  • The service provider and the enterprise IT group can create an indirect federation by leveraging a third-party federation service.  

The following table describes the advantages and disadvantages of each of these options by using Active Directory Federation Services and Windows Azure Active Directory as examples of technologies that you can use for direct and indirect federation, respectively.  

Option

Advantages

Disadvantages

You authenticate to the service provider’s proprietary authentication mechanism, separately from any you have already on premises.

  • You don’t have to manage any federation or account synchronization.
  • You have to log on to the service provider’s on-premises systems, and log on separately to your systems.
  • You need to manage accounts in both your system and the service provider’s system.

You can federate your on-premises authentication mechanism with the service provider or use some form of directory synchronization.

  • You can log on to your on-premises systems and seamlessly manage your public cloud infrastructure without needing to log on separately to the service provider’s system.
  • You only need to manage accounts on your own system. If an employee joined/left the organization, you wouldn’t need to update both the on-premises and public cloud infrastructure systems separately.
  • You and your service provider must manage a federation with each other. 
  • You must manage separate federations with each of your service providers and each of your service providers needs to manage a separate federation with you. Or with directory synchronization, you need manage the synchronization technologies.

You can federate your on-premises authentication mechanism with the service provider’s through a federation service such as Windows Azure Active Directory.

  • You can log on to your on-premises systems and seamlessly manage the public side of the hybrid cloud infrastructure without needing to log on separately to the service provider’s system. You only need to manage accounts in your own system.
  • Your customers only need to manage accounts in their own system. If an employee joined/left the organization, you wouldn’t need to update your own system and the service provider’s.
  • You may already be federated with Windows Azure Active Directory, particularly if you use Microsoft Office 365 or Windows Azure today. This eliminates your need to set up a direct federation with the service provider (or any other service providers you may use that are also federated with Windows Azure Active Directory).
  • Doesn’t require you to set up direct federations with individual service providers that are federated with Windows Azure Active Directory.
  • You must manage a federation with Windows Azure Active Directory.

For more information on how to integrate your on-premises Active Directory domain with Windows Azure Active Directory, see Windows Azure, now with more enterprise access management.

5.6.4.2  Authorization and Access Control

In a hybrid cloud infrastructure, you need to determine what authorization capabilities your public cloud infrastructure provider makes available to you and also what authorization capabilities that you already have, or plan to enable, in your on-premises components of the solution. Important issues that you need to consider include:
  • Have you enabled, or do you plan to enable, role-based administrative access control on the on-premises side of your hybrid cloud infrastructure? If so, how will you define the roles? Will you separate roles between service owners and account owners?
  • Does the public cloud infrastructure service provider enable role-based access control? If so, how does it define the roles? Does the public cloud infrastructure service provider enable you to separate service management and account management roles?
  • Are all employees in the company authorized to request resources from the hybrid cloud infrastructure? If not, how will you determine which employees will have access to the services acquisition portal? Will you add more granularity and allow certain groups of authorized users to have access to specific components of the hybrid cloud infrastructure?
  • How do you plan to reflect your current IT organizational structure on the public cloud infrastructure component of the hybrid cloud solution? Do you plan to mirror your IT organization the way it is now? Will you assign members of the current IT organization to the hybrid cloud infrastructure? Will you let the service owners who consume hybrid cloud resources have access to management of the infrastructure that their service runs on—in effect mirroring in the cloud components of their on-premises siloed IT infrastructure?

The following table shows advantages and disadvantages of each of these options in authorization and access control.

AuthN and access control option

Advantages

Disadvantages

On-premises role-based administrative access control.

  • Enables you to control which members of the IT organization will be able to manage specific components of the hybrid cloud infrastructure on premises.
  • Requires you to define the roles and implement the roles in the management systems that you use in the on -premises components of the hybrid cloud infrastructure
  • May require you to reflect your on-premises access control to the public cloud infrastructure components of the hybrid cloud infrastructure
  • Requires you to define the roles and implement the roles in the management systems that you use in the on-premises components of the hybrid cloud infrastructure. May require you to reflect your on-premises access control to the public cloud infrastructure components of the hybrid cloud infrastructure.

Public cloud infrastructure service role-based administrative access control.

  • Enables you to control which members of the IT organization will be able to manage specific components of the public cloud infrastructure component of the hybrid cloud infrastructure.
  • Requires the public cloud infrastructure service provider to surface the roles that your IT organization is interested in 
  • You will need to determine how to integrate on-premises role-based administrative access controls with the public cloud infrastructure service provider’s role-based administrative access controls.
  • You will need to determine how to integrate on-premises role-based administrative access controls with the public cloud infrastructure service provider’s role-based administrative access controls

Authorized employees are allowed to acquire hybrid cloud infrastructure resources.

  • Enables you to control which employees are allowed to requisition resources from the hybrid cloud infrastructure.
  • Help reduces the chance that the hybrid cloud infrastructure becomes oversubscribed.
  • Requires that you determine which employees are authorized to acquire hybrid cloud infrastructure resources
  • Requires that you perform account or group maintenance for authorized users

Dedicated hybrid cloud infrastructure group.

  • Specially trained members of the IT organization run the infrastructure components of the hybrid cloud infrastructure.
  • Hybrid cloud infrastructure team members allowed to focus attention only on hybrid cloud infrastructure assets.
  • Requires certain members of the IT organization to be moved to new role, or creates additional duties for the team members, or requires the organization to acquire new employees to fill the role.
  • Hybrid cloud infrastructure team could possibly be a bottleneck for service owners who want to acquire cloud services quickly.
  • Hybrid cloud infrastructure team could possibly be a bottleneck for service owners who want to acquire cloud services quickly.

Reflect IT organizational structure to hybrid cloud infrastructure.

  • Allows the entire IT organization to work with the hybrid cloud infrastructure.
  • Enables members of each team within the IT organization to use the same core competencies that they use for the company’s on-premises components of the hybrid cloud infrastructure.
  • The natural division of infrastructure responsibilities for on-premises infrastructure (storage, compute, networking, and identity/access control) might not fit as well on the public cloud infrastructure service side of the hybrid cloud infrastructure.
  • Limiting various infrastructure teams to specific components of the public cloud infrastructure service may create unnecessary overhead.

Allow consumers of the hybrid cloud infrastructure to mirror on-premises siloed infrastructure.

  • Allows for division of responsibilities that the IT organization is accustomed to.
  • Reduces the overhead for the hybrid cloud team by allowing service owners to manage their own assets both on premises and in the public cloud infrastructure (replicates current service-specific siloes).
  • Replicates the siloed infrastructure that the IT organizations currently have on premises.
  • Requires all service owners to have access to public cloud infrastructure management interfaces.
  • Requires all service owners to have access to public cloud infrastructure management interfaces.

For more information on role-based access control in Hyper-V, see Configure Hyper-V for Role Based Access Control 

For more information on role-based access control in System Center Virtual Machine Manager, see Private Cloud in System Center Virtual Machine Manager 2012 - Part 2 – Delegate Control

At this time granular role-based access control is not available in Azure Infrastructure Services.

5.6.4.3  Account Management

You need to consider workflow issues regarding who has access to both the public cloud service account that is used for billing services and any sub-accounts that might be used for administration of the public infrastructure service components.

For example, suppose there is a manager who is responsible for the infrastructure service account. What might happen if that manager were released from the company? It’s possible that if the former manager left on bad terms, that person could potentially cancel the account and thereby block access to all the services. Similarly, what might happen if a member of the hybrid cloud infrastructure team were released from the company, and that person’s administrative account were still active? If the administrator who was released left the company on bad terms, that person could delete virtual machines, leave an exploit on the service, and any number of other things that a person with administrative access could achieve. 

For these reasons and more, it’s critical that you have a workflow or account provisioning and deprovisioning process that can prevent these problems from happening. You may already have a workflow and account management system in place that performs these actions for you for on-premises accounts. If that is the case, you can investigate the possibilities of connecting your on-premises account management system with the management system that is used by your public cloud infrastructure service provider.  

For example, as mentioned in the table in section 5.3.1 Authentication, you may have the option to federate your on-premises account system with the service provider’s system. If that is the case, user accounts that are provisioned and de-provisioned on premises will automatically be managed for access to the service provider’s system. You might consider an on-premises solution that is based on Forefront Identity Manager (FIM) to help you with this type of account management and tie it into the federated environment.  

At this time in Windows Azure, you have the option of assigning an account to be a Service Administrator or Service Co-Administrator. The difference between these two roles is that the Service Co-Administrator cannot delete the Service Administrator account for a subscription. Only the Windows Azure account owner can delete a Service Administrator. 

For more information on administrative roles in Windows Azure, see Provisioning Windows Azure for Web Applications.  

5.6.5 Application Authentication and Authorization

In a hybrid cloud infrastructure, you will need to consider options available for authentication and authorization. While there are a number of authentication and authorization options available for the applications that you’ll run in the public cloud infrastructure service provider’s network, in the majority of cases those applications will be dependent to a certain degree on Active Directory. For this reason, it’s important to consider your design options for applications run some or all of their components in the public cloud infrastructure service provider’s network. 

Key issues for consideration include:

  • Active Directory domain controllers in the public cloud infrastructure service provider’s network Considerations
  • Read-only domain controller considerations
  • Domain controller locator considerations
  • Domain, Forest and global catalog considerations
  • Active Directory name resolution and geo-location considerations
  • Active Directory Federation Services (ADFS) considerations
  • Windows Azure Active Directory Considerations

The remainder of this section will detail considerations in each of these areas.

5.6.5.1  Active Directory Domain Controllers in the Public Cloud Infrastructure Provider's Network Considerations

Historically the recommendation has been not to virtualize domain controllers. Many virtualization infrastructure designers have virtualized domain controllers only to experience a failure related to a virtualized domain controller. 

For example, backing up and restoring domain controllers can roll back the state of the domain controller and lead to issues that are related to inconsistencies in the Active Directory database. Restoring snapshots from a virtualized domain controller would have the same effect as restoring from backup—the previous state would be restored and lead to Active Directory database inconsistencies. The same effects are seen when you use more advanced technologies to restore a domain controller, such as creating SAN snapshots and restoring those, or creating a disk mirror and then breaking the mirror and using the version on one side of the mirror at a later time as part of a restore process.

Update Sequence Number (USN) “bubbles” create the problems that are most commonly encountered with virtualized domain controllers. USN bubbles can lead to a number of problems, including:

  • Lingering objects in the Active Directory database
  • Inconsistent passwords
  • Inconsistent attribute values
  • Schema mismatch if the Schema Master is rolled back
  • Duplicated security principles

For these reasons and more, it is critical to avoid USN bubbles.  

For more information on USN bubbles, see How the Active Directory Replication Model Works

VM Generation ID
Virtualization makes it easier to create a USN bubble scenario, and therefore the recommendation in the past has been that you should not virtualize domain controllers. However, with Windows Server 2012, virtualizing domain controllers is now fully supported.  

Full support for virtualizing domain controllers is enabled by a feature in the hypervisor which is called the VM Generation ID. When a domain controller is virtualized on a supported virtualization platform, the domain controller will wait until replication takes place to be told what its state and role is. If the virtualized domain controller is one that was restored from a snapshot, it will wait to be told what the correct state is instead of replicating a previous state and causing a USN bubble. 

For more information on VM Generation IDs, see Introduction to Active Directory Domain Services Virtualization

Note:
VM Generation IDs must be supported by both the hypervisor and the guest operating system. Used together, Windows Server 2012 Hyper-V and the Windows Server 2012 operating system acting as a guest will support VM Generation IDs. VMware also supports VM Generation ID when running Windows Server 2012 domain controller guests. Windows Azure Infrastructure Services also supports VM Generation ID and therefore also supports virtualization of domain controllers.

When creating domain controllers in Azure Infrastructure Services, you have the option to create them new on an Azure Virtual Network, or to use one that you created on premises and move it to an Azure Virtual network.  

Note:
Do not sysprep domain controllers—sysprep will generate an error when you try to run it on a domain controller.  

Instead of using sysprep, consider moving the VHD file to Azure storage and then create a new virtual machine by using that VHD file. If your on-premises domain controller is running on physical hardware, you have the option to do a physical to virtual conversion and move the resultant .vhd file to Azure storage. Then you can create the new virtual machine from that .vhd file.  

You also have the option to create a new domain controller in Azure Infrastructure Services and enable inbound replication to the domain controller. In this case, all the replication traffic is inbound, so there are no bandwidth charges due to egress traffic during the initial inbound replication, but there will be egress traffic costs for outbound replication. 

Active Directory Related File Placement
When designing an Active Directory design to support hybrid application authentication, you will need to consider the disk types that are available from the public cloud infrastructure service provider. There may be some disk types and caching schemes that are more or less favorable to specific Active Directory domain controller data types.

For example, Windows Azure supports two disk types where you can store information for virtual machines:

  • Operating System Disks (OS Disks)—used to store the operating system files
  • Data Disks—used to store any other kind of data

As mentioned earlier in this paper, Windows Azure Infrastructure Services also supports a “temporary disk,” but you should avoid storing data on a temporary disk because the information on the temporary disk is not persistent across reboots of the virtual machine. In Windows Azure, the temporary disk is primarily used for the page file and it helps speed up the virtual machine boot process. 

In Windows Azure, the main difference between a data disk and an OS disk relates to their caching policies. The default caching policy for an OS disk is read/write. When read/write activity takes place, it will first be performed on a caching disk. After a period of time, it will be written to permanent blob storage. The reason for this is that for the OS disk, which should contain only the core operating system support files, the reads and writes will be small. This makes local caching a more efficient mechanism than making the multiple and frequent small writes directly to permanent storage.  

Note:
The OS Disk size limit at the time this was written was 127 GB. However, this might change in the future, so watch the support pages on the Windows Azure website for updates. 

The default caching policy for Data Disks is “none,” which means that no caching is performed. Data is written directly to permanent storage. Unlike OS Disks, which are currently limited to 127 GB, Data Disks currently support up to 1 TB. If you need more storage for a disk, you can span up to 16 disks for up to 16 TB, which is available as part of the current Extra Large, A6 and A7 virtual machine’s disk offering.  

Note:
These are current maximum Data Disk sizes and numbers. This might change in the future. Please check the Windows Azure support pages for updates. 

With all this in mind, consider where you want to place the DIT/Sysvol location. Would it be where caching could lead to a failure to write, or would it be where Active Directory related information is immediately written to disk? The latter is the preferred option.

The main reason for this is that write-behind disk caching invalidates some core assumptions made by Active Directory:

  • Domain controllers assert forced unit access (FUA) and expect the I/O infrastructure to honor that assumption.
  • FUA is intended to ensure that sensitive writes make it to permanent media (not temporary cache locations).
  • Active Directory seeks to prevent (or at least reduce the chances of) encountering a USN bubble.

For more information related to Active Directory and FUA, see Things to consider when you host Active Directory domain controllers in virtual hosting environments

The following table describes some of the advantages and disadvantages of Azure Infrastructure Services disk types in the context of Active Directory domain controllers.

Windows Azure disk type

Advantages in domain-controller scenario

Disadvantages in domain-controller scenario

OS Disk

  • Caching enabled by default
  • Provides superior boot performance
  • Caching is suboptimal for Active Directory related files due to possible data loss
  • Caching is suboptimal for Active Directory related files due to possible data loss

Data Disk

  • No caching enabled by default
  • Active Directory related data will be immediately written to disk
  • Not appropriate for operating system
  • Not appropriate for operating system

Temporary Disk

  • Increases available disk space for operating system
  • Speeds operating system boot
  • Data is temporary
  • Information lost after reboot
  • Possibility for putting important information on temporary disk
  • Possibility for putting important information on temporary disk

5.6.5.2  Read-Only Domain Controller Considerations

There are several options available to you for putting Active Directory in the Azure Infrastructure Services cloud: 

  • Full read/write domain controllers in a production domain
  • Full read/write domain controllers in a trusting domain or forest
  • Read-only domain controllers in the production domain
  • Read-only domain controllers in a trusting domain or forest
  • No domain controllers at all, and you use Active Directory Federation Services  

In a hybrid cloud environment, you might consider the public cloud infrastructure service provider’s network as being similar to a branch office, or as an off-premises hosted data center. So it would make sense to take advantage of read-only domain controllers, because they were designed for a branch office deployment. 

However, while a public cloud infrastructure service provider’s network may be treated as similar to a branch office, there are some significant differences between the branch office environment that was envisioned by the creators of the read-only domain controller role and the environment seen in a public cloud infrastructure service provider’s network. The main difference is that the branch office scenario is seen as a low security environment, where the domain controller might not be in a physically secure location, which make it vulnerable to theft or tampering. Because of this, the read-only domain controller was designed as a good alternative for branch offices, providing the following benefits:

  • Faster authentication.
  • Authentication even when the link between the branch office and main office goes down.
  • Limited damage if a read-only domain controller is compromised.

The following table describes the advantages and disadvantages of deploying a read-only domain controller (RODC) in a public cloud infrastructure provider’s network, such as Windows Azure. 

Advantages

Disadvantages

  • RODCs support Windows integrated authentication.
  • RODCs do not replicate outbound, which is good because outbound traffic is what you pay for.
  • RODCs do not support inbound replication, which reduces bandwidth on the site-to-site VPN link.
  • RODCs do not possess every secret—only those that are cached, if you choose to use that model.
  • RODCs enable you to filter attributes, so if there is PII included, you can filter those attribute that you do not want available on the RODC.

 

  • RODCs do not support all Active Directory dependent applications. That means you’ll need to do some testing to determine whether they are the right solution for the applications that you want the RODC to support. This could potentially influence the cost effectiveness of the RODC solution.
  • If the site-to-site VPN that connects Azure Virtual Machines and Virtual Networks to your on-premises network goes down, you will not be able to authenticate users.
  • There are costs related to running the Azure Infrastructure Services gateway. If the application always needs to be available and if it depends on a RODC located on the Azure Virtual Network, then the gateway needs to be available too and that will incur additional costs.

For more information on attribute filtering and credential caching, see RODC Filtered Attribute Set, Credential Caching, and the Authentication Process with an RODC.

5.6.5.3  Domain-Controller Locator Considerations

When putting Active Directory domain services in a public cloud infrastructure service provider’s network, you need to think about how to correctly define and connect Active Directory subnets and sites to the off-premises components—as the choices you make here will influence the cost of the overall solution.

Sites, site links, and subnets affect where authentication takes place and also the topology of domain controller replication. To begin with, here are some definitions:

  • A collection of subnets defines a site.
  • You connect the sites together using site links.
  • You can then create replication policies.

When creating replication policies, consider the following:

  • How frequently do you want replication to take place? If you put a read/write domain controller into Azure Virtual Machines and Virtual Networks, then inbound replication events are going to increase the cost because these are seen as egress traffic by Azure Infrastructure Services.
  • What days of the week do you want replication to take place? Fewer days mean lower costs that are related to egress traffic.
  • Consider creating a replication policy that is based on when you want them to take place, and not have them be event-driven, as this can also run up egress traffic costs.

One option is to define the Azure Virtual Network (or any public cloud service provider’s network) network ID as a subnet in Active Directory, and then machines on that subnet will use the local domain controller for authentication (assuming that they are available). This means that services that are situated in Azure Infrastructure Services won’t have to reach out to on-premises domain controllers for authentication services. This also reduces cost, because if the service in Azure Infrastructure Services had to authenticate using on-premises domain controllers, that would generate egress traffic, which you must pay for.  

For more information on Active Directory sites, see Active Directory Sites

Also consider what costs you want to set on the links. For example, the Azure Infrastructure Services connection represents a much higher-cost link. You’ll also want to consider that when the issue of “next closest site” occurs, the domain controllers in the Azure Infrastructure Services are not considered to be the next closest (unless that’s what you want to intend, such as in the case of remote offices that use a domain controller in Azure Infrastructure Services as a backup).  

For more information on this issue, see Enabling Clients to Locate the Next Closest Domain Controller

Active Directory replication also supports compression. The more compressed the data is, the lower the egress costs will to be.  

For more information on Active Directory compression, see Active Directory Replication Traffic

Finally, consider putting together your replication schedule based on anticipated latency issues. Remember that domain controllers replicate only the last state of a value, so slowing down replication saves cost if there's sufficient churn in your environment. 

5.6.5.4  Domain, Forest, and Global Catalog Considerations

Domain and Forest Considerations
A read-only domain controller is not the only option for placing a domain controller into a public cloud infrastructure service provider’s network. Another viable option is to place full read/write domain controllers into the off-premises side of a hybrid cloud infrastructure.  

When considering putting a full read/write domain controller on to the public cloud infrastructure service provider’s network, you’ll first want to ask yourself about their security model and operational principles. Azure Infrastructure Services is a public cloud offering, which means that you’re using a shared compute, networking, and storage infrastructure. In such an environment, isolation is a key operating principle, and the Azure team has insured that isolation is enforced to the extent that placing a domain controller in Azure Infrastructure Services is a supported and secure deployment model.  

For more information on Azure security, see Windows Azure Security Overview

The next step is to consider what kind of domain/forest configuration you want to deploy. Some of the options are:

  • Deploy domain controllers that are part of the same domain in the same forest.
  • Deploy domain controllers that are part of a different domain in the same forest, and configure a one-way trust.
  • Deploy domain controllers that are of a different domain in a different forest, and configure a one-way trust.

The first option might represent the least secure option of the three, because if the domain controller in the cloud is compromised, the entire production directory services infrastructure would be affected. The second and third options can be considered incrementally more secure, because there is only a one-way trust, but the overhead of maintaining trusts might not fit organizational requirements.  

The last option might be considered be the most secure, but there is administrative overhead that you need to take into account, and not all deployment scenarios will support this kind of configuration. You need to consider these issues before deciding on a domain and forest model. 

Given the Azure security model, the consensus is that the first option is the preferred option when you weigh the options for application compatibility, management overhead, and security.  

Another important consideration is regulatory and compliance issues. A lot of PII can be stored in these read/write domain controllers, and there may be regulatory issues that you need to consider. There are also cost considerations. You’ll end up generating a lot more egress traffic (depending on authentication load), and there will also be egress replication traffic that you’ll need to factor into the cost equation. 

For detailed information about Active Directory security considerations, see Best Practice Guide for Securing Active Directory Installations

The following table describes some of the advantages and disadvantages of each of the domain and forest models. 

Domain/forest model
Advantages
Disadvantages
Deploy domain controllers that are part of the same domain in the same forest.
  • Easy to manage
  • Easy to deploy
  • Greatest application compatibility
  • Potentially less secure
  • Potentially less secure
Deploy domain controllers that are part of a different domain in the same forest, and configure a one-way trust.
  • Relatively easy to manage
  • Can limit the scope of damage in case of compromise
  • Does not completely isolate the on premises domain
  • Does not completely isolate the on premises domain
Deploy domain controllers that are of a different domain in a different forest, and configure a one-way trust.
  • Probably the most secure option
  • Isolates production forest from off-premises forest
  • Difficult to manage
  • May not be compatible with all applications
  • May not be compatible with all applications

Global Catalog Considerations
When designing a hybrid cloud infrastructure, you need to consider whether you want to put a Global Catalog domain controller into the off-premises component of your infrastructure. A Global Catalog server is a domain controller that keeps information about all objects in its domain and partial information about objects in other domains.  

To learn more about Global Catalog servers, see What is the Global Catalog

A Global Catalog enables an application to ask a single domain controller one question that might refer to multiple domains, even though that domain controller is not a member of the domain for which the question is being asked. A Global Catalog server contains a partial copy of the rest of the forest, and this information is a defined attribute set that is filtered to a Global Catalog server. This is also known as the Partial Attribute Set or PAS. 

For more information on the Partial Attribute Set, see How the Global Catalog Works.

There are some reasons why you might not want your domain controller in the Azure Infrastructure Services to be a Global Catalog server. These reasons include:

  • Size—inbound replication will start and might contain information you don’t necessarily need. However, there is no cost to you for inbound replication, so there’s no negative impact in that regard. However, depending on the size of your organization, you might need a larger VM that has the requisite storage—this is going to cost more for the larger VM instance.
  • Global catalog servers replicate Partial Attribute Set content between themselves—which means that this information will be passed to Global Catalog servers that you put in the Azure Infrastructure Services cloud.
  • There’s a chance that the Global Catalog in the Azure Virtual Machines and Virtual Networks cloud will become a preferential source to another Global Catalog somewhere in the world. This is not optimal, as it results in sending updates from the cloud that happen in a domain that you don’t really care about. This subsequently requires a lot of outbound bandwidth, which you have to pay for.

Those are some reasons why you wouldn’t want to put a Global Catalog in the cloud. With those reasons in mind, when would you put a  Global Catalog in the cloud? One answer would be, when you have a single-domain forest.  

What should you do if you have two domains in the same forest? For example, suppose that one domain is on premises and the second domain is in the Azure Infrastructure Services cloud. The answer is to make the domain controllers in the cloud into Global Catalogs. The reason for this is that authentication (as the user logs on) requires access to a group type in Windows Active Directory called a Universal Group, and Universal Groups require a Global Catalog in order to populate. This means that a Global Catalog is a required step in all authentication scenarios where you have more than a single domain.  

Also, consider whether you want the domain controllers in Azure Infrastructure Services to require a round trip to the on-premises network in order to access a Global Catalog at every single authentication attempt. This is a tradeoff, and the decision depends on what the replication requirements would be versus how many authentication attempts are made. You probably don’t think so much about these issues when Active Directory is on premises only, but when you design a hybrid cloud infrastructure in which egress traffic is billable, your design considerations must take this factor into account. 

Workloads in the cloud that authenticate against a domain controller in the cloud will still generate outbound authentication traffic if you don’t have a Global Catalog in the cloud. It’s difficult to provide hard and fast guidance because this scenario is fairly new, and you’re likely going to have to figure out the relative costs of the different options (authentication traffic versus replication traffic) or wait until we have something that is based on our experiences that we might be able to share with you in the future. 

What we do know is that the Global Catalogs are used to expand Universal Group membership, which is likely going to lead to even less predictable costs for Global Catalogs because they host every domain (in part). However, something that might complicate issues even more, or at least require more study, is the effect of creating an Internet-facing service that authenticates with Active Directory. 

One option is to take advantage of Universal Group Membership Caching, but there are issues with this solution and you probably will want to consider those.  

For more information on Universal Group Membership Caching, see Enabling Universal Group Caching for a Site

Finally, most replication for the Global Catalogs in the Azure Infrastructure Services cloud is going to be inbound, so cost is not an issue there. Outbound replication is possible, but this can be avoided by configuring the right site links.  

The following table summarizes some of the advantages and disadvantages of putting a Global Catalog server in the public cloud infrastructure service provider’s network.

Advantages of a Global Catalog in the cloud

Disadvantages of a Global Catalog in the cloud

  • Enables log on for multi-domain forests
  • Reduces egress traffic costs
  • Can take advantage of Universal Group Member caching
  • Outbound replication costs
  • Possible exposure of PII and other proprietary information
  • Possibility that the Global Catalog in the Azure Virtual Machines and Virtual Networks cloud will become a preferential source to another Global Catalog somewhere in the world
  • Possibility that the Global Catalog in the Azure Virtual Machines and Virtual Networks cloud will become a preferential source to another Global Catalog somewhere in the world
  • Outbound replication costs
  • Possible exposure of PII and other proprietary information

5.6.5.5  Active Directory Name Resolution and Geo-Distribution Considerations

You will need to consider both Active Directly name resolution and geo-distributon issues when designing your hybrid cloud infrastructure.
 
Active Directory Name Resolution Considerations
As mentioned earlier, Azure Virtual Networks has its own DNS services that it enables when you put new virtual machines in a Virtual Network. This is very basic name resolution that allows machines in the same cloud service to resolve each other’s names. While this is useful if all your machines and all the services running on those machines are dependent only on each other, it’s not enough to support an Active Directory environment. If you include in your design some level of support for Active Directory authentication for services that are running on the Azure Virtual Network, you will need to deploy a name-resolution infrastructure that exceeds the capacity of the Azure Virtual Network DNS services. The reason for this is that Azure Virtual Networks do not meet the complex name-resolution requirements for Active Directory domains (dynamic DNS registration, SRV record support, and others). 

Domain controllers and their clients must be able to register and resolve resources within their own domains and forest, as well as across trusts. And because static addressing isn’t supported in Azure Virtual Networks, these settings must be configured within the Virtual Network definition.  

There are several ways to approach the name resolution requirements for Active Directory in a hybrid cloud infrastructure. The following is one suggested approach: 

  • Create an Azure Virtual Network.
  • Use DHCP for IP addressing assignments for domain controllers you plan to put in a Virtual Network.
  • Install and configure Windows Server DNS on the domain controller(s) you’ve placed in Windows Azure.
  • Configure the domain controllers and the domain members’ DNS client resolver settings so that:
    • The on-premises DNS server is set as the primary preferred DNS server.
    • There is an alternate DNS server that has the IP address of a domain controller that is also a DNS server on the Azure Virtual Network.  

Geo-Distribution Considerations
Your hybrid cloud infrastructure design might include geo-distributed, Azure Virtual Network hosted domain controllers. Azure Infrastructure Services can be an attractive option for geo-distributing domain controllers. They can provide:

  • Off-site fault tolerance
  • Lower latency for branch offices where you don’t want to house the domain controller on premises

However, keep in mind that virtual networks are isolated from one another. If you want different Virtual Networks to communicate, you must establish site-to-site links with each of them and then have them loop back through the corporate network to reach other Azure Virtual Networks. This means that all replication traffic will route through your on-premises domain controllers, which is going to generate some egress traffic. You will want to consider piloting such a configuration to see what your egress numbers look like before deploying a full blown geo-distributed architecture. 

5.6.5.6  Active Directory Federation Service (ADFS) Considerations

Another Active Directory function that might be appropriate to consider when constructing a hybrid cloud infrastructure is Active Directory Federation Services, or ADFS. While the scenarios might not be as broad as those for Active Directory Domain Services, there are some scenarios where you will want to consider this option.
The three primary advantages of deploying ADFS in a public cloud infrastructure services network are:

  • It enables you to provide high availability for ADFS by using the native server load-balancing capabilities of the public cloud infrastructure services network (if the provider makes them available, as Azure Infrastructure Services does).
  • It enables you to more simply deploy a set of federated applications to employees and partners without the complexities and requirements inherent in deploying ADFS in a perimeter network on your corporate network.
  • You can deploy corporate domain controllers alongside ADFS in a public cloud infrastructure service provider’s network, which provides additional guarantees of service availability in the event of unforeseen failures such as natural disasters. This is especially true for online services such as Microsoft Office 365, which can authenticate users directly from their on-premises corporate Active Directory.

Deploying Windows Server ADFS in a public cloud infrastructure service provider’s network is very similar to doing so on premises; however, differences do exist. Any Windows Server ADFS requirement to connect back to the on-premises network depends upon the relative placement of the roles. If Windows Server ADFS is running on a public cloud infrastructure service provider’s network and its domain controllers are deployed only on-premises, then the off-premises side of the solution must connect the virtual machines back to the on-premises network by using the link that connects the public and private sides of the hybrid cloud solution.
Important issues to consider when designing a hybrid cloud infrastructure to support ADFS include:

  • If you deploy a Windows Server ADFS proxy server on a public cloud infrastructure services network, connectivity to the ADFS federation servers is needed. If they are on premises, you will need a connection between the on-premises and off-premises networks, by using a site-to-site VPN or dedicated WAN link.
  • If you deploy a Windows Server ADFS federation server on the public cloud infrastructure services network, then connectivity to Windows Server Active Directory domain controllers, Attribute Stores, and Configuration databases are required.
  • If you deploy Windows Server ADFS (or any other workload) on a virtual machine on the public cloud infrastructure service provider’s network so that it can be reached directly from the Internet, you must configure the cloud service to expose public-facing ports that map to the ADFS http (80 by default) and https (443 by default) ports.
  • If you deploy and configure Windows Server ADFS on a virtual machine in a public cloud infrastructure service provider’s network so that it can be reached directly from the Internet, it is also advisable to treat the cluster as though it were deployed on an on-premises perimeter network. This includes additional considerations such as server hardening or deploying Windows Server ADFS proxy instead of the Windows Server ADFS federation server role itself.
  • Charges may be applied to all virtual-machine egress traffic, such as when the virtual machine is placed in Azure Infrastructure Services. If cost is the driving factor, it is advisable to deploy Windows Server ADFS proxy on Windows Azure, leaving the Windows Server ADFS federation servers on premises. If Windows Server ADFS federation is deployed on Windows Azure virtual machines instead of Windows Server AD FS proxy, it could create unnecessary costs to authenticate intranet users.
  • If you choose Azure Infrastructure Services for your public cloud infrastructure service provider, we recommend that you use Windows Azure native server load-balancing capabilities for high availability of Windows Server ADFS servers in your deployment. Windows Azure software load balancing is supported only by VIPs, not by DIPs. The load balancing provides probes that are used to determine the health of the virtual machines within the cloud service. In the case of Windows Azure Virtual Machines, you configure the type of probe you would like to use, such as TCP, UDP or ICMP. For simplicity, you might use a custom TCP probe. This requires only that a TCP connection (a SYN-ACK) be successfully established to determine virtual machine health. You can configure the custom probe to use any TCP port that is actively listening on your virtual machines.

Note:
Machines that need to expose the same set of ports directly to the Internet (such as port 80 and 443) cannot share the same cloud service. Therefore, we recommend that you create a dedicated cloud service for your Windows Server ADFS servers to avoid potential overlaps between port requirements for an application and for Windows Server Active Directory.

 For more information on Active Directory Federation Services and Active Directory Domain Services in Azure Infrastructure Services, see Guidelines for Deploying Windows Server Active Directory on Windows Azure Virtual Machines

5.6.5.7  Windows Azure Active Directory Considerations

This document does not discuss the use of Windows Azure Active Directory, which is a REST-based service that provides identity management and access control capabilities for cloud applications. Windows Azure Active Directory and Windows Server Active Directory Directory Services are designed to work together to provide an identity and access management solution for today’s hybrid cloud environments and modern cloud-based applications. The scope of this paper is on the core infrastructure requirements for a hybrid cloud infrastructure that does not include cloud-based PaaS and SaaS applications, which is the key scenario for which Windows Azure Active Directory applies. 

To help you understand the differences and relationships between Windows Server AD DS and Windows Azure AD, consider the following: 

  • You might run Windows Server AD DS in the cloud on Azure Infrastructure Services when you’re using Windows Azure to extend your on-premises datacenter into the cloud.
  • You might use Windows Azure Active Directory to give your users single sign-on to Software-as-a-Service (SaaS) applications. Microsoft’s Office 365 uses this technology, for example, and applications running on Windows Azure or other cloud platforms can also use it.
  • You might use Windows Azure Active Directory (its Access Control Service) to let users log in using identities from Facebook, Google, Microsoft, and other identity providers to applications that are hosted in the cloud or on-premises.  

For more information about Windows Azure Active Directory, please see Identity

5.6.6  Backup Service and Disaster Recovery

When designing your hybrid cloud infrastructure you will want to consider backup and disaster recovery options.

5.6.6.1  Backup Services Consider asking your public cloud infrastructure service provider if it offering backup services so that you can use it as an off-site backup for on premises data. This is a useful option because in the event of a disaster at the primary datacenter, there will be a backup copy of information on the public cloud service provider’s network.  

Windows Azure offers a backup service that you can use to back up on-premises data. Backup can help you protect important server data offsite with automated backups to Windows Azure, where they are available for data restoration. 

You can manage cloud backups from the backup tools in Windows Server 2012, Windows Server 2012 Essentials, or System Center 2012 Data Protection Manager. These tools provide similar experiences when configuring, monitoring, and recovering backups whether to local disk or Windows Azure storage. After data is backed up to Windows Azure, authorized users can recover backups to any server.  

Windows Azure backup also supports incremental backups, where only changes to files are transferred to the cloud. This helps ensure efficient use of storage, reduced bandwidth consumption, and point-in-time recovery of multiple versions of the data. Configurable data retention policies, data compression and data transfer throttling also offer you added flexibility and help boost efficiency. Backups are stored in Windows Azure and are "offsite," which reduces the need to secure and protect onsite backup media. 

For more information on Windows Azure Backup, please see Windows Azure Backup Overview

5.6.6.2  Disaster Recovery

Another important option to consider is the role a public cloud infrastructure service provider can play in disaster recovery and business continuity. Some public cloud infrastructure service providers will make various disaster recovery options available to you.  

For example, Windows Azure currently offers Recovery services. If you are using Hyper-V Recovery Manager you will create Hyper-V Recovery Manager vaults to orchestrate failover and recovery for virtual machines managed by System Center 2012 Virtual Machine Manager (VMM). You configure and store information about VMM servers, clouds, and virtual machines in a source location that are protected by Windows Azure recovery services; and about VMM servers, clouds, and virtual machines in a target location that are used for failover and recovery. You can create recovery plans that specify the order in which virtual machines fail over, and customize these plans to run additional scripts or manual actions. 

For more information about Windows Azure Recovery services, please see Recovery Services Overview

6.0   Summary

After identifying the requirements and constraints in your environment and then evaluating each of the design considerations that are detailed within this document, you can create a hybrid cloud infrastructure design that best meets your unique needs. Then, you can implement it in a test environment, test it, and deploy it into production. 

To complement this document, Microsoft has created reference implementation (RI) guidance sets for hybrid cloud infrastructure solutions that are designed for specific audiences. Each RI guidance set includes the following documents:

  • Scenario Definition: For a particular domain, different audiences generally have different requirements and constraints. This document describes a fictitious example organization that is implementing a hybrid cloud infrastructure solution. It provides answers to the questions in the Envisioning The Hybrid Cloud Solution section of this document—answers that relate to the fictitious organization. Many organizations within this audience type will find that they have requirements and constraints similar to those of the fictitious organization. This document is most helpful to people responsible for designing hybrid cloud infrastructure solutions at organizations similar to those that are defined by the RI’s audience type.  
  • Design: This document details which specific products, technologies, and configuration options were selected, out of the hundreds of individual available options, to meet the unique requirements for the example organization that is defined in the Scenario Definition document. This document also explains the rationale for why specific design decisions were made. For organizations that have requirements and constraints similar to the example organization, the lab-tested design and rationale in this document can help decrease both the implementation time and the risk of implementing a custom hybrid cloud solution. This document is most helpful to those responsible for designing a hybrid cloud infrastructure or implementing solutions within enterprise IT organizations, because it details an example design, and the rationale for the design.

Note:
The Design document within a Reference Implementation (RI) guidance set uses one combination of the almost infinite number of combinations of design and configuration options that are presented in this Hybrid Cloud Infrastructure Design Considerations article. The specific design options from this Hybrid Cloud Infrastructure Design Considerations document that are chosen in an RI Design document are based on the unique requirements from the Scenario Definition document in the RI guidance set. As a result, many people who read this Hybrid Cloud Infrastructure Design Considerations document will find it helpful to also read the RI guidance set for this domain that is targeted at an audience type similar to their own. The RI guidance set shows which design options from this document were chosen for the example organization, and helps the reader to better understand why those options were chosen. Other people will decide that reading an RI guidance set is unnecessary for them, and that this Hybrid Cloud Infrastructure Design Considerations document provides all the information they need to create their own custom design.

Although the Design document in an RI guidance set is related to this Hybrid Cloud Infrastructure Design Considerations document, there are no dependencies between the documents.

  • Implementation: This document provides a step-by-step approach to implementing the design in your environment. While this document lists implementation steps to install and configure the solution, the steps are written at a level that assumes you already have some familiarity with the technologies that are used in the design that is detailed in the Design document. In cases where new technologies are used, more detailed implementation steps are included in the document. To review lower-level implementation steps than those that are provided in this document, you’re encouraged to read the information found at the hyperlinks that are included throughout this document. This document is most helpful to those responsible for implementing hybrid cloud infrastructure solutions within types of organizations that are identified by the audience type for the RI.  

7.0 Technologies Discussed in this Article

Windows Server 2012 DNS services
Active Directory Domain Services
Windows Azure Active Directory
Windows Azure Virtual Machines
Windows Azure Cloud Services
Windows Azure Storage
Windows Azure Storage Services
Windows Azure Recovery Service
Windows Azure Virtual Network  

8.0 Authors and Reviewers

Authors:
Thomas W. Shinder - Microsoft
Jim Dial - Microsoft

Reviewers:
Yuri Diogenes - Microsoft
John Dawson - Microsoft
Cheryl McGuire - Microsoft
Kathy Davies - Microsoft
John Morello - Microsoft
Jamal Malik - Microsoft

This article is maintained by the Microsoft DDEC Solutions Team.

9.0 Change Log

Version

Date

Change Description

1.0

7/1/2013

Initial posting and editing complete.

1.1

8/22/2013

New hybrid cloud principles and patterns were added. Fixed table entries in multiple tables so that disadvantage are all moved to the disadvantages columns.