none
Exchange CAS 2013 NLB High Availability in DR RRS feed

  • Question

  • Just discussing ....  I am not finding much information on the internet regarding CAS 2013 NLB High availability in DR
    scenario so planned to discuss and get some ideas/guidance 

    Site-A

    Domain Controllers x2
    3 CAS servers with hardware LB
    3 mbx servers in DAG1
    FSWitness Server
    SAN Wildcard Certificate
    Exernal URL: webmail.domain.com/owa
    Internal URL: autodiscover.domain.com

    Site-B  (DR) with Stretched DAG
    Domain Controllers x3
    3 CAS Servers with wNLB
    2 mbx server in the same DAG1  (Stretched)
    Alt FSWitness server
    External URL: webmail.domain.com/owa
    Internal URL: autodiscoverDR.domain.com

    Question:

    When Site-A goes down ..... 
    I understand the other steps which need to be taken but my question is 

    in DR Site, do I point my internal URLs to the mbx servers using the same
    commands to set the virtual directories as usual and point them to the new mbx servers dbs?

    And when the primary site comes back up follow the same steps to revert back?  I am bit unclear on this part?

    Thank you


    • Edited by WildPacket Monday, June 3, 2019 1:03 PM
    Monday, June 3, 2019 1:02 PM

Answers

  • If you want high availability, the FSW should be in a third site that has independent network connectivity to the other two sites, or as a cloud instance, such as Azure.

    Ed Crowley MVP "There are seldom good technological solutions to behavioral problems."
    Celebrating 20 years of providing Exchange peer support!

    • Marked as answer by WildPacket Tuesday, June 11, 2019 12:50 PM
    Wednesday, June 5, 2019 10:24 PM
    Moderator

All replies

  • You're not finding much information because it doesn't exist.  There's no such thing.  Buy load balancers that have cross-site capabilities if you want true high availability.

    Basically, when one site goes down, any hostnames that point to the down site must be changed to point to the site that's up.  There are no URLs that point to "dbs", assuming you mean databases.  And, yes, when the down site comes back up, you change the URLs back.

    Instead of changing URLs, it's generally easier to change the host name in DNS.  That's only feasible if you don't use server names in virtual directory URLs.


    Ed Crowley MVP "There are seldom good technological solutions to behavioral problems."
    Celebrating 20 years of providing Exchange peer support!

    Monday, June 3, 2019 5:57 PM
    Moderator
  • Hi.

    First of all, it's your name space.

    Namespace Planning in Exchange 2016

    Second, Enterprise HLB can understand, when DC1 down. for example NetScaler ENT very clean identification, when DC down and can forward request to DC 2 DR. 

    Return can be automatically or manual, what situation do you have.

    Datacenter switchovers 


    MCITP, MCSE. Regards, Oleg

    Monday, June 3, 2019 6:03 PM
  • Thank you all.

    @Oleg - Thank you it was a good read "namespace planning in Exchange 2016" - I never came across this one before.

    The Datacenter Switchover link I have read before but it does not cover in detail on "Activating Client Access Services".  I am just trying to understand a scenario for now ....

    other question I have is about the Alt FSW server in Site B.   Site A have 3 Nodes and Site B has 2 Nodes ... in case of Site A going down will I need to activate the Alt FSW in Site-B ?

    Tuesday, June 4, 2019 3:31 AM
  • You're going to have to force quorum to active site B, which means to tell the cluster that the only two nodes that count for the time being are those in site B.  It's very good idea to activate the alternate FSW in case one of the two nodes goes down after that.

    Ed Crowley MVP "There are seldom good technological solutions to behavioral problems."
    Celebrating 20 years of providing Exchange peer support!

    Tuesday, June 4, 2019 7:34 AM
    Moderator
  • Hi.

    I can propose Site A planing 3 node and Alt witness, Site B planing two node with witness.

    When Site A down, you don't need configure nothing on Site B. You need wait, when Site A up and sync.When Site A down, You can't restart any node on Site B. 

    When Site B Down, you can't restart any node on Site A, before when you re-config ALT witness to regular.

    Site A - 3 node majority cluster, Site B two node + witnesses majority cluster. 

    PS. Network latency for Cluster Service best ~250ms and less 500ms. IF more 500ms cluster down.

    PS. Microsoft has Validation Architecture program for Exchange heavy infrastructure.
    PS. Remember, if you planing exchange without backup, every site must be have minimum 3 copy DB.


    MCITP, MCSE. Regards, Oleg

    Tuesday, June 4, 2019 10:13 AM
  • If there are three nodes in site A and two in site B, it doesn't matter where the witness is because there will always be a node majority in site A.  Activating in site B will require you to force quorum.  This is not a high availability solution.

    Ed Crowley MVP "There are seldom good technological solutions to behavioral problems."
    Celebrating 20 years of providing Exchange peer support!

    Tuesday, June 4, 2019 3:46 PM
    Moderator
  • It's true Ed, it's need 3 node + witness and 3 node + Alt witness. 3 member, or 5 member, or 7 member.

    We can manually change node vote, but it's not supported for Exchange.  


    MCITP, MCSE. Regards, Oleg

    Tuesday, June 4, 2019 4:16 PM
  • And even that won't be high availability because witness doesn't switch dynamically.  To get HA in this case, you'd need the witness in a third site or in the cloud.

    Ed Crowley MVP "There are seldom good technological solutions to behavioral problems."
    Celebrating 20 years of providing Exchange peer support!

    Tuesday, June 4, 2019 4:17 PM
    Moderator
  • If this question is about Exchange 2016, then it's posted in the wrong forum as this one is for Exchange 2013, and I don't think your link applies to Exchange 2013.


    Ed Crowley MVP "There are seldom good technological solutions to behavioral problems."
    Celebrating 20 years of providing Exchange peer support!


    Tuesday, June 4, 2019 7:11 PM
    Moderator
  • Thank you guys.  This seems to be a very insight conversation we are having here and hopefully others can also benefit too.

    @Oleg:   Is it not preferred that the FSW should be in the site where most users reside?  In this case all the users are in SiteA/Primary Site and no users in SiteB/Secondary Site?

    Another question comes to mind .... What if the link between siteA and siteB goes down for 30minutes and the admins don't know (let's say there are no monitoring/alerting tools here)  - What happens then, or to the nodes in both sites?  (DAC is enabled already)

    Cheers!


    • Edited by WildPacket Wednesday, June 5, 2019 4:44 PM
    Wednesday, June 5, 2019 2:49 PM
  • @WildPacket

    Primary Site, it's point connection for users?

    Primary Site is public CAS to Internet? 

    Secondary Site Public CAS to Internet?

    If you have Primary Site as primary point connection users and public to internet, on this case FSW local on this site.

    Secondary site: Use for Backup DB, prepare for ALT FSW. You can save disk space if DB 50/50 for both Exchange server. But need testing connection with Cross Site connection from External and Internal client  


    MCITP, MCSE. Regards, Oleg

    Wednesday, June 5, 2019 4:26 PM
  • Thank you Oleg

    Yes - Primary Site, it's point connection for users - Currently

    Yes- Primary Site is public CAS to Internet - Currently

    Secondary Site Public CAS to Internet? - Not yet ... but when we cut over/Test only then it will be (when primary site/All exchange Servers is ALL down and not accessible).  When we cut over then will take care of the DNS externally and internally,  on the secondary site firewall,  CAS URLs, dns ttl will be set to 5 mnts or less on internal/external DNS,MX etc.   Anything else I need to cover? 

    Questions:

    - What is the most effective and least downtime way to test a DR scenario?

    What if the link between siteA and siteB goes down for 30minutes and the admins don't know (let's say there are no monitoring/alerting tools here)  - What happens then, or to the nodes in both sites?  (DAC is enabled already)




    • Edited by WildPacket Wednesday, June 5, 2019 4:55 PM
    Wednesday, June 5, 2019 4:44 PM
  • Secondary Site Public CAS to Internet? - Not yet ... but when we cut over/Test only then it will be (when primary site/All exchange Servers is ALL down and not accessible).  When we cut over then will take care of the DNS externally and internally,  on the secondary site firewall,  CAS URLs, dns ttl will be set to 5 mnts or less on internal/external DNS,MX etc.   Anything else I need to cover? 

    Internal it's split DNS setting. On Secondary Site must be have two AD domain controller (not RODC).

    If you planing  full DR on Second Site with real time switch user connection to second site, you need two public/internal DNS record for both Site. You can have DB active\Passive, but user connection must be connect to both Site for DR 30 sec switches. 

    About DR, it's many question and need discus with your business.

    Primary user access to internal Exchange or external? How many user connect form external?  

    Do you use Exchange online protection? or similar for save message on any cloud for DR?

    How much cost support/control FW and Internet on Second Site? 

    DR for full destroy Site A. DR for destroy storage, but CAS available. Need planning different scenario for DR, and cost for realize scenario.  

    :) What is the most effective and least downtime way to test a DR scenario?

    It's interesting question for your business. One bank testing on real time shutdown Data Center, when test finish contract for DR Exchange with 25K users.

    What if the link between siteA and siteB goes down for 30minutes and the admins don't know (let's say there are no monitoring/alerting tools here)  - What happens then, or to the nodes in both sites?  (DAC is enabled already)

    Monitoring system, for example SCOM.

    Usually link down, and site with lost majority, DB lock for replication. After restore connection cluster with primary site. Replication restore and Sync delta messages. 

    - How can we check /test the latency is under 500ms?

    Question for Network Engineers and Monitoring Services quality control connection.  I always send request for report status network connection to network team , when we have issue switch cluster services between  DataCenters. 


    MCITP, MCSE. Regards, Oleg

    Wednesday, June 5, 2019 5:11 PM
  • If you want high availability, the FSW should be in a third site that has independent network connectivity to the other two sites, or as a cloud instance, such as Azure.

    Ed Crowley MVP "There are seldom good technological solutions to behavioral problems."
    Celebrating 20 years of providing Exchange peer support!

    • Marked as answer by WildPacket Tuesday, June 11, 2019 12:50 PM
    Wednesday, June 5, 2019 10:24 PM
    Moderator
  • Hello,

    Any further query here? Please feel free to post back if you have any further questions. 

    BTW, please also remember to mark the replies that helped as answers. This could help other community members to quickly find the valuable replies when they encounter the same issue and come across this post.

    Regards,
    Steve Fan


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.


    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    Friday, June 7, 2019 9:04 AM
    Moderator