Introduction

For most enterprises, facilitating Fault Tolerance and Disaster Recovery is a challenge. It involves cost, complexity, and management overhead. 

In this context, we would also like to discuss RTO and RPO as these are the two most important business parameters associated with DR and Fault Tolerance planning.

Recovery Time Objective


While planning for the DR, one of the most important question is “ In case of any disaster/outage, how much time it would take to bring the service back to the operational state ?"

Recovery Time Objective (RTO) is the time which is required to bring the service back to normal/ agreed state, in case of any disaster. 

As you can understand, the lower this RTO value would be, the better it is for application. If RTO value is zero, that indicates there will be no outage if primary location is down and the service or application would be immediately available from secondary region. At the same time, it requires more planning, complexity and investment to meet lower RTO value.

Recovery Point Objective


Another important point related to DR is Recovery Point Objective or RPO.

Consider that the primary location is down and you are invoking DR. While invoking DR, you noticed that the last backup for an application occurred 4 hours before the outage, and there is no backup after that. So when you restore data from the latest backup, last 4 hours data is lost. So in this case, the RPO for this application is 4 hours.

In other words, the RPO value indicates the time (and amount) of data loss in case of an outage.

The RPO value would depend on the application criticality and business need. For example, critical applications cannot afford to lose any data, so for them, the RPO value should be zero and the DR design should be done accordingly.

Different application and service have a different level of Service Level Agreements (SLAs), accordingly the RPO and RTO value changes. 

A general thumb rule is More critical application > Less RTO and RPO > More cost and complexity in DR planning.


Introduction to Azure Site Recovery


Azure Site Recovery (ASR) is the BCDR (Business Continuity and Recovery Service) service offering from Microsoft Azure. In cloud computing terminology, we refer this DR as a Service (DRaaS).

Technically speaking, ASR is a software component which orchestrates DR.

There are numerous solutions and products available in the market to design DR. However, ASR is having few advantages which are highlighted below:

  • Lower Cost compared to other solutions which are available in the market.
  • Easy to deploy and configure. No need to install complex tools and software.
  • Being a managed service offering from Azure, there is no management overhead.
  • Supports a wide variety of scenarios. You can replicate your workload from almost anywhere to Azure. The source location can be another Azure region, on-premise datacenter, AWS etc.
  • Supports replication of Azure VMs, Azure Stack VMs, AWS VMs, VMware VMs, Hyper-V VMs and also Physical Servers.
  • Supports multiple combinations of source and destination. For example, if you want to replicate between two on-premise data centers you can use ASR, which will act as an orchestrator. The destination location is not necessarily to be Azure.
  • If the destination location is Azure, then the organization does not need to invest a huge amount of money on the DR location setup and maintenance.
  • Supports a wide range of automation tools, example: PowerShell.


Features of ASR

As mentioned ASR supports replication for:

  • Azure VMs replicating between Azure regions.
  • On-premises VMs, Azure Stack VMs, and physical servers.

ASR can be used in below scenarios:
  • Configure and Manage Replication between primary and secondary location.
  • Test (simulate) failover during periodic DR Test.
  • On a real outage at the primary location, initiate failover to the secondary location.
  • When secondary location becomes primary, configure replication in opposite direction.
  • When primary location is up again, initiate failback.

Another key usage of ASR is server migration. ASR is the recommended solution from Microsoft to migrate on-premises VMs and Physical Servers to Azure.
  

ASR vs Azure Backup


ASR and Azure backup both offers recovery as a service, so it is important that we should understand the differences between these two services. It is also important to understand the use cases where these products should be used.

  • Azure Site Recovery (ASR) is a Business Continuity and Disaster Recovery (BCDR) solution primarily focused on recovering the Regional / Data Center level outages. When one site is completely down due to environmental issue or network outages  or due to any other reason, we can leverage ASR to invoke manual or automatic failover.
  • Azure Backup, on the other hand, is a backup and restore solution which is not targeted to sustain the regional / site level outages. The most common use cases for Azure Backup is to restore a folder / file after accidental deletion or modification.  Another use case might be to restore an entire server when the OS or application is corrupted due to some reason.
  • Azure Backup offers Folder / File level recovery which ASR does not support. ASR only supports VM level recovery. However, we can exclude specific data disk from ASR replication.
  • ASR supports cross-region disaster recovery. Azure Backup does not support cross region backup and restoration.

As you can see, these two products are not compliment of one another. While you can quickly restore a folder / file / VM using Azure Backup, it should not be treated as the Disaster Recovery solution for your organization. For Disaster recovery, the recommended solution is Azure Site Recovery.

On the other hand, Azure Site Recovery cannot complement Azure Backup, because you cannot restore a folder / file using ASR. You cannot restore a VM in the same region using ASR.


Workload Support

One of the points which we have to pay utmost attention is the role of the server which we are migrating. Although ASR can migrate almost any type of Hyper-V VM, there are few workloads which demand additional considerations. 

Examples of this kind of workloads are Active Directory, MS Exchange, MS SharePoint, MS SQL to name a few.

Microsoft has partnered with many organizations to ensure that ASR is compatible with most of the workloads which are widely used in the industry. You can refer this article to check the workload summary that ASR supports. In addition, Microsoft has published separate whitepapers to migrate each type of supported workload using ASR. You must follow Microsoft guidelines for these types of critical workloads.

There is another point to consider here. ASR is probably not the best solution for all kind of workloads. We can take Active Directory (AD) as an example. It is better to deploy Domain Controller in secondary datacenter and configure Active Directory replication with primary, rather than using ASR for AD replication. Similarly, there are many applications which are having their own replication and failover mechanism. So it is always better to evaluate which approach is better.


ASR Pricing


ASR is free for first 31 days, after that billing is started.
No matter how many instances you are protecting and where you are replicating (on premise or Azure), first 31 days is free.

Site recovery will charge you for below components:

• Site Recovery license
• Azure storage 
• Storage transactions 
• Outbound data transfer

The Site Recovery license is per protected instance, where an instance is a virtual machine or a physical server.

One of the key advantages of ASR is, you will not incur any compute cost in Azure until you initiate the failover. This is because ASR stores the only disks in Azure Storage account, and at the time of failover it creates Azure VMs and attaches those disks. As soon as you initiate failover to Azure, you will start incurring compute cost in addition to existing charges.

To get more information about ASR pricing, please refer to this article.


ASR SLA

ASR helps us to meet our SLA,, but what is the SLA of ASR itself??

In other words:
  1. How can we guarantee that ASR is always available to orchestrate the replication, failover, and failback? 
  2. Also, if we are replicating to Azure, how can we ensure that the Azure region would be available during a DR scenario? 
Regarding the first point, Microsoft claims below SLA for ASR:

  • On-premise to on-premise: 99.9% availability of Site Recovery Service
  • On-premise to Azure planned and unplanned failover: RTO is 2 hours
  • Azure to Azure Failover: RTO is 2 hours

For more details of ASR SLA, please referthis article.

Regarding the second point, if we need to ensure that replicated data is still available even if an entire Azure region is down, we need GRS Storage Account.


Security, Compliance and Data Privacy with ASR


As Microsoft claims:

  • Site Recovery is ISO 27001:2013, 27018, HIPAA, DPA certified and is in the process of SOC2 and FedRAMP JAB assessments.
  • Site Recovery doesn't intercept replicated data, and doesn't have any information about what's running within virtual machines or physical servers. Site Recovery has no ability to intercept replication data.
  • Only the metadata needed to orchestrate replication and failover is sent to the Site Recovery service.
  • For more details, please consult ASR FAQ documentation.
  • ASR supports Encryption-in-transit and Encryption-at-rest if the data is replicating to Azure. If data is replicating between two on-premises data center where ASR is acting as an orchestrator, it only supports data-in-transit but not data-at-rest.
  • Replication happens over Port 443.


Permission and RBAC Roles


ASR provides following built-in roles to configure, manage and view Replication:

  • Site Recovery Contributor: Has all admin permission related to ASR, except deletion of recovery Services Vault.
  • Site Recovery Operator: Has all permission to execute failover and failback. This role does not have admin right.
  • Site Recovery Reader: Has read access to view and monitor ASR related configuration.
Apart from these built-in roles, you can create custom roles based on organization requirement.

To enable replication for a new virtual machine, a user must have:
  • Permission to create a virtual machine in the selected resource group• Permission to create a virtual machine in the selected virtual network
  • Permission to write to the selected Storage account

For more information on RBAC and permission, please refer to this article.


Connectivity

  • ASR replicates data to an Azure storage account, over a public endpoint. 
  • ExpressRoute can be used to replicate on-premises virtual machines to Azure.
  • Microsoft peering is the recommended routing domain for replication. Replication is not supported over private peering.
However, after the virtual machines have been failed over to an Azure virtual network you can access them using the private peering setup with the Azure virtual network.
  • ASR replication does not support site-to-site VPN. However, site-to-site VPN can be present in the environment and there will be no interference with ASR.


Automation of ASR Workflows

We can automate Site Recovery workflows using any one of these tools:
  • Rest API
  • PowerShell
  • Azure SDK


Recovery Services Vault


Recovery Services Vault is one of the most important components related to ASR. As the name suggests, it is a vault or storage entity which holds data related to Site Recovery.

RSV stores all the configuration data and metadata which is vital for ASRs functionality. In addition, RSV also stores the disks of replicated instances, if those are replicating to Azure.

Recovery Services vaults support System Center DPM, Windows Server, Azure Backup Server, and more.
While configuring ASR, you must create a Recovery Services Vault.

For more information on recovery Vault, please refer to  this article.


Disk Support


  • Maximum disk size supported by ASR is 4 TB (4095 GB). If the disk size of an on-premises VM or physical server is up to 4 TB, ASR would be able to migrate it or failover it. 
  • Depending on your need, you can use both Standard and Premium storage to store replicated data.
  • You need an LRS or GRS storage account.
  • ASR can preserve the drive letter after Failover or Migration. Some configuration is required to achieve this.
  • You can exclude specific disks for replication.
  • Until now (Nov 2018), ASR did not support migration of Encrypted Disk. There is a recent announcement from Microsoft that ASR would now support Azure Encrypted Disks. 


IP Address Retention


In general, the IP addresses of the systems would change after failover / migration to Azure. But in some cases, the IP addresses are hardcoded in configuration so it needs to be same after migration.

It is possible to retain same IP address while migrating a system through ASR, although some configuration is required to achieve that.

Let’s first consider the case of Azure to Azure failover. For Azure VMs configured with static IP addresses, Site Recovery tries to provision the same IP address for the target VM, if that IP address is available in the new subnet and not in use. 

If the IP address is not statically assigned, then post migration the VM would receive IP address from Azure DHCP pool.

It is also important to note that when Replication is configured for Azure VMs, ASR automatically configured Network Mapping between source to target region, and between target to source region. You can also setup Network Mapping manually before configuring Replication.

For more information in this topic, please refer this article.

Now let us discuss how to retain IP addresses when we are migrating from on-premises to Azure, for example from VMware/Hyper-V/Physical Server to Azure.

ASR offers a feature called Subnet Failover, which gives the ability to retain the same statically configured IP address during a server failover.

With subnet failover, a specific subnet is present at Site 1 or Site 2, but never at both sites simultaneously.
The subnet failover must be configured in the Router, to move the subnet during failover.
Once configured properly, the subnet moves along with the associated VMs during failover. However, this may add additional complexity and time for an unplanned failover.

 For more details, please refer this article.


Replication Scenarios


As we mentioned before, ASR supports multiple scenarios. In this article, we are going to cover the four most common scenarios which are as follows:

• Azure to Azure Replication
• Hyper-V to Azure Replication
• VMware to Azure Replication
• Physical Server to Azure Replication 


Scenario 1: Azure to Azure Disaster Recovery


Using ASR, it is possible to configure Disaster Recovery, Failover and Failback of Azure VMs to another region. The Primary region and DR region must be different, and ASR will not allow selecting primary region as DR region.

The High Level approach for Replication and Failover is as follows:

Step 1: As soon as we enable replication, the following resources would be automatically created in the target (DR) region:

  • Target Resource Group: VMs will be placed under this Resource Group after failover
  • Target Virtual Network : VMs will be located under this VNET after failover. A network mapping is created between Primary and DR VNET, and vice versa.
  • Cache Storage Accounts (Created at Source location): Replicated data is sent to this cache location first and from here it goes to DR location. This approach causes minimum impact on production VMs and the application running on top of this.
  • Target Storage Account: Created at target if source VM does not use managed disk.
  • Replica Managed Disk: Created at target only if source VM is using managed disk.
  • Target Availability Sets: Availability sets in which the replicated VMs are located after failover.


Step 2: When we enable replication, Site Recovery extension Mobility service is automatically installed on that VM, and post installation the VM is registered with ASR.

Step 3: Data is getting replicated to the secondary storage account, through the cache storage account.

Step 4: Once data is written and processed in the target location, ASR creates Recovery Points. The Recovery Points are created in every few minutes.

Please note that no VM is created at DR location during replication. It is only the disks which are replicated.

Step 5: During the failover, VMs are created at target (DR) location, within the pre-created VNET and Storage Account. We can choose any recovery point during failover, which were created during data replication.



Scenario 2: Hyper-V to Azure Disaster Recovery


Azure Site Recovery uses the same underlying technology as Hyper-V replica, to replicate Hyper-V VMs.

  • ASR Supports migration of VHD and VHDX. More Info
  • Supports migration of Hyper-V Generation 1 and Generation 2 VMs. During failover, Site Recovery converts from generation 2 to generation 1. During failback, the machine is converted back to generation 2.
  • Supports migration of Dynamic Disks. The Operating System Disk must be basic.
  • You can run a planned or unplanned failover from on-premises Hyper-V VMs to Azure. If you run a planned failover, then source VMs are shut down to ensure no data loss. If the Primary Site is down, you have to run unplanned failover. 

  • You can also failback to on-premise once the primary site is up. There are 3 different approaches of failback, and we have to choose any one of those:
  1. Minimize Downtime: ASR Syncs Data Before Failback. Takes more time but minimizes downtime.
  2. Full Download: Downloads the entire disk from Azure to Hyper-V. Faster but increases downtime.
  3. Create VM: If the VM does not exist in Hyper-V, you can ask ASR to create a new VM during failback, rather than restoring to original VM.

Architecture: Hyper-V to Azure Disaster Recovery


As you might be aware, Hyper-V hosts and VMs can be managed by VMM (Virtual Machine Manager), just like we use VCenter to manage VMware Hosts and VMs. However, deployment of VMM is not mandatory to manage Hyper-V Infrastructure.

If you want to replicate to a secondary datacenter, then Hyper-V VMs must be on Hyper-V host servers which are managed by VMM. But if you want to replicate to Azure, then you can replicate VMs with or without VMM clouds. 

In both the cases (With or without VMM) , no agent installation is required on Hyper-V VMs for ASR to function.

There are two components which are essential here:
  • Site Recovery Provider: Orchestrate Replication.
  • Recovery Services Agent: Handles data Replication .
During installation, you will install both the components together from the same package, which can be downloaded from Azure Portal.

  • If VMM is present, the provider and agent would be installed in VMM server, and you do not need to install it in Hyper-V Hosts and VMs. Once done, you have to register the VMM server to Azure Recovery Service Vault so that they can communicate with each other over a secure channel.
  • If VMM is not present, then provider and agent need to be installed in each Hyper-V Host, but not in VMs. Once done, you have to register each Hyper-V Host to Azure Recovery Service Vault so that they can communicate with each other over a secure channel.

At the Azure end, you need to setup below three components:
  • An Azure Subscription
  • Storage Account and Recovery Services Vault
  • Azure Network
   
For more information on Hyper-V to Azure Disaster Recovery architecture, please refer this article.

I have published an article, which shows step by step process for Hyper-V to Azure VM migration.


Scenario 3: VMWare to Azure Disaster Recovery


To migrate VMware VMs to Azure, you need to deploy a configuration server on-premises. This configuration server is a VMware VM, which can be downloaded from Azure in OVF format and can be imported in a VMware environment.

This VM appliance holds 3 roles which are as follows:

Configuration Server: Coordinates communications between on-premises and Azure, and manages data replication.

Process Server: Manages Replication data. The process server is also responsible for installing ASR Mobility Service on each VMware VM which needs to be replicated. For the large environment, Process Server role can be installed in a different server other than the configuration server.

Master Target Server: Handles replication data during failback from Azure.

In the case of Hyper-V, no agent needs to be installed within VM. But for VMware VM, the mobility agent needs to be installed in each VM which needs to be replicated. 

Important points regarding VMware to Azure migration


  • By default, replication happens over the Internet. However, we can also use Express route with Microsoft Peering. Replication over VPN is not supported.
  • Supports Failover and Failback. During Failback, the Master Target Server is being used.
  • VCenter Server is recommended but not mandatory.
  • Data replicates to Azure storage. When you initiate a failover, ASR creates Azure VMs from the storage account.
  • Replication is continuous when replicating VMware VMs to Azure.
  • Dynamic Disk is supported. The Operating System Disk must be basic.
  • Can exclude selected disk for replication.



Scenario 4: Physical Server to Azure Disaster Recovery


ASR supports replication and failover of Physical Servers to Azure, however failback to physical server is not supported. If failback is required, that must be done to a VMware VM.

The architecture of Physical Server to Azure replication is very similar to the architecture of VMware to Azure replication. Like VMware, we need a Configuration Server at on premises which contains the role of Configuration Server, Process Server and the Master Target Server. We can use a Physical Server or a VMware VM for Configuration Server.

The Inbound and Outbound Port requirement is same as VMware to Azure architecture.

Please note that for Physical Server to Azure DR, Planned Failover is not supported. Failback is only possible to an on-premises VMware VM. If there is no VMware Infrastructure on-premises, Failback is not possible.

For more information regarding Physical Server to Azure failover , please refer this article.


Summary

We have discussed lot of points, now let's try to summarize. Below table will act as a quick reference.

Item

Azure to Azure Failover

Hyper-V to Azure Failover

VMware to Azure Failover

Physical to Azure Failover

Azure Components Required

1) Target Resource Group
2) Target Storage Account
3) Target Network
4) Cache Storage Account
5) Target Availability Sets

1) Azure Subscription
2) Azure Storage Account
3) Azure Network

1) Azure Subscription
2) Azure Storage Account
3) Azure Network

1) Azure Subscription
2) Azure Storage Account
3) Azure Network

On-Premises Components Required

Not Applicable

1) Hyper-V Node or VMM
2) Azure Site Recovery provider
3) Microsoft Azure Recovery Services agent

1) Vcenter Server is recommended, or at least ESXi host
2) Site Recovery Configuration Server (Configuration  Server+Process Server+ Master Target Server)

1) Site Recovery Configuration Server (Need to setup manually)
(Configuration  Server+Process Server+ Master Target Server)

Agent installed in Replicated VM

Site Recovery extension Mobility service

No agent installed within replicated VMs

Mobility service

Mobility service

Supports Failover

Yes

Yes

Yes

Yes

Supports Failback

Yes

Yes

Yes

To a VMware VM only

Retain Source IP Address

If the destination subnet has matching IP address

Yes

Yes

Yes

Data in transit is encrypted

Yes

Yes

Yes

Yes

Data at rest is encrypted

Yes

Yes

Yes

Yes

OS Disk Maximum Size

2 TB

1 TB for Generation 1 VM
300 GB for Generation 2 VM

2 TB

2 TB

Data Disk Maximum Size

4 TB

4 TB

4 TB

4 TB

Supports Dynamic Disk for OS Disk

Not Applicable

No

No

No

Supports Dynamic Disk for non-OS disks

Not Applicable

Yes

Yes

Yes

Maximum Data Disk Supported

64

16

64

64

Add disk on replicated VM

No

No

No

No

Add disk on replicated VM

No

No

No

No

Can exclude specific disk from replication

No

Yes

Yes

Yes

Can retain Drive Letter after migration

Not Applicable

Yes

Yes

Yes

Inbound Connectivity Requirement

No inbound connectivity required towards VM.

No inbound connectivity required towards VM , Hyper-V Host or VMM server.

1) Inbound Port 443 : VM to Configuration Server communication
2) Inbound 9443: VM to Process Server communication
3) In multi-VM consistency, machines in the replication group communicate with each other over port 20004
1) Inbound Port 443 : VM to Configuration Server communication
2) Inbound 9443: VM to Process Server communication
3) In multi-VM consistency, machines in the replication group communicate with each other over port 20005

Outbound Connectivity Requirement

1) Site Recovery service URLs/IP addresses
2) Office 365 authentication URLs/IP addresses
3) Cache storage account IP addresses
4) VMs in Replication Group should talk to each other over port 20004 for multi-VM consistency.

Outbound connectivity on Port 443 is required from Hyper-V Host / VMM Server, where the Site Recovery Provider would be installed.

1) Outbound 443: Configuration Server to Azure Communication

2) Outbound 443: Process Server to Azure communication

1) Outbound 443: Configuration Server to Azure Communication

2) Outbound 443: Process Server to Azure communication

Default Replication Path

Public Endpoint of Storage Account, over the Internet.

Public Endpoint of Storage Account, over the Internet.

Public Endpoint of Storage Account, over the Internet.

Public Endpoint of Storage Account, over the Internet.

Supports ExpressRoute as an alternate path for Replication

Not Applicable

Yes (Only with Public / Microsoft Peering)

Yes

Yes

Recovery Points  Interval / Replication Interval

Every few minutes

Hyper-V VMs can be replicated every 30 seconds (except for premium storage), 5 minutes or 15 minutes

Continuous Replication

Continuous Replication

Supports Site-2-to VPN for Replication

Supported [Site to Site VPN with on-premises (with or without ExpressRoute)]

No

No

No

Architecture URL

Click Here

Click Here

Click Here

Click Here

Support Matrix URL

Click Here

Click Here

Click Here

Click Here

Deployment URL

Click Here

Click Here

Click Here

Click Here




See Also


I recommend to refer these Microsoft documents for additional reference and better understanding: