locked
DAG Question RRS feed

  • Question

  • I have been handed a project to deploy a stretch DAG between two data centres with 3 mailbox roles at one and two at the other, the design suggests that we would have active databases at both locations with a passive copy held at the opposing site (Hope that makes sense) the local active databases provide local client access.

     

    What I am struggling to work out is what happens to all 5 active databases if the communications link between the two sites goes down ? as although this is a likely situation the business must continue to run at both sites, can you advise please  

     

     

    Site A with three mailbox roles each with an active database with passive copy at site B

     

    Site B with two mailbox roles each with an active database with passive copy at site A

     

    There is a LAN and DAG replication network but the scenario assumes both fail.

     

    Cheers

    Wednesday, March 31, 2010 9:58 AM

Answers

  • Databases will not be mounted on servers that do not have quorum. If a server notices that it no longer has quorum the Active Manager will dismount the database copies on itself, just to make sure that no split brain happens for the database.
    • Marked as answer by Haitham1977 Friday, April 2, 2010 2:37 PM
    Friday, April 2, 2010 2:34 PM

All replies

  • To handle this kind of split-brain syndrome between multiple datacenter Microsoft has developed Datacenter Activation Coordination (DAC). When a connection between two datacenters comes back after an outage the DAC controls the database activation and before adatabase can be activated DAC leverage a protocol called Datacenter Activation Coordination Protocol (DACP) to detemine the current state of the DAG making sure that there is no active copy on another server.

    I hope this makes sense? Let me know if you got any other questions!

    More information on this can be found here: http://technet.microsoft.com/en-us/library/dd979790.aspx

    Wednesday, March 31, 2010 10:11 AM
  • my question was around the active/active component and would the databases at both sites remain active if the lines between the sites disappeared, the configuration had a 3/2 split between sites ?

    Wednesday, March 31, 2010 3:02 PM
  • not sure if I fully get your question but if it's only the link between your sites that you lose your active databases will stay active but they just won't replicate to the far site which would just mean no failover if you lost something on the active server.

    If you lose the links aswell as the LAN nobody can access anything because you won't have a lan

    give us more info if this isn't answering your questions !

    • Proposed as answer by Anbu Selvan Wednesday, March 31, 2010 3:59 PM
    Wednesday, March 31, 2010 3:30 PM
  • As of My knowledge, if communication goes down the DAG will not support for single database active/active at both the sites.

    In case if the messaging is a prime requirement for the business, I would suggest and recommend that you have to look at alternative (different route) connectivity for the link between sites. In addition, you can also suggest Microsoft Exchange On-Line options for the site which has two servers.

    Kindly revert if require more details.

    Wednesday, March 31, 2010 4:06 PM
  • Do you mean that there are active copies of different databases split between the two sites and if they will stay active if the link goes down? This should be handled by DAC since before activating a database copy the server must try to communicate with all other members of the DAG that it knows before activating the passive copy. If database1 are active on site1 when the link goes down it will stay active, the same goes for database2 on site2.

    Does this answer you question? If not I think I need more information...

    Thursday, April 1, 2010 10:02 AM
  • If the link between the two sites goes down the side with 3 voters has majority. It will concider the other two machines as down and behave accordingly. The site with 2 voters will see that it only has 2/5 of a vote and without majority will stop active service.

    So all of the databases will mount in the site with the 3 servers.

    • Proposed as answer by Xiu Zhang Friday, April 2, 2010 6:43 AM
    • Unproposed as answer by Haitham1977 Friday, April 2, 2010 9:57 AM
    Thursday, April 1, 2010 5:04 PM
  • So all of the databases will mount in the site with the 3 servers.


    In this kind of scenario if you wish to prevent automatic activation in the other datacenter you can to put an activation block on the DBs in the other datacenter. You can use the Suspend-MailboxDatabaseCopy command to do this.

    http://technet.microsoft.com/en-us/library/dd351074.aspx

    Example...

    Suspend-MailboxDatabaseCopy -Identity DB3\MBX2 -ActivationOnly


    Brian Day, Overall Exchange & AD Geek
    MCSA 2000/2003, CCNA
    MCITP: Enterprise Messaging Administrator 2010
    Microsoft MVP, Exchange Server
    Thursday, April 1, 2010 6:43 PM
  • how does it work in a 5 node dag with 3 nodes on one site and two on another each node has an active copy ? would the active copies stay active and would any of the nodes mount a passive copy ? because when the link is restored how does the system know which of the two copies is real if the passive node goes active ? does this make sense ?

    Friday, April 2, 2010 9:58 AM
  • maybe this will help explain it, please have a read

    http://technet.microsoft.com/en-us/library/dd979790.aspx

    DAC mode is used to control the activation behavior of a DAG when a catastrophic failure occurs that affects the DAG (for example, a complete failure of one of the datacenters). When DAC mode isn't enabled, and a failure affecting multiple servers in the DAG occurs, when a majority of servers are restored after the failure, the DAG will restart and attempt to mount databases. In a multi-datacenter configuration, this behavior could cause split brain syndrome, a condition that occurs when all networks fail, and DAG members can't receive heartbeat signals from each other. Split brain syndrome also occurs when network connectivity is severed between the datacenters. Split brain syndrome is prevented by always requiring a majority of the DAG members (and in the case of DAGs with an even number of members, the DAG's witness server) to be available and interacting for the DAG to be operational. When a majority of the members are communicating, the DAG is said to have a quorum.

    For example, consider a scenario where the first datacenter contains two DAG members and the witness server, and the second datacenter contains two other DAG members. If the first datacenter loses power and you activate the DAG in the second datacenter (for example, by activating the alternate file share witness in the second datacenter), if the first datacenter is restored without network connectivity to the second datacenter, the DAG may enter a split brain syndrome.

    DAC mode is designed to prevent split brain syndrome from occurring by including a protocol called Datacenter Activation Coordination Protocol (DACP). After a catastrophic failure, when the DAG recovers, it won't automatically mount databases even though the DAG has a quorum. Instead DACP is used to determine the current state of the DAG and whether Active Manager should try to mount the databases.

     How DAC Mode Works

    You might think of DAC mode as an application level of quorum for mounting databases. To understand the purpose of DACP and how it works, it's important to understand the primary scenario it's intended to deal with. Consider the two-datacenter scenario. Suppose there is a complete power failure in the primary datacenter. In this event, all of the servers and the WAN are down, so the organization makes the decision to activate the standby datacenter. In almost all such recovery scenarios, when power is restored to the primary datacenter, WAN connectivity is typically not immediately restored. This means that the DAG members in the primary datacenter will power up, but they won’t be able to communicate with the DAG members in the activated standby datacenter. The primary datacenter should always contain the majority of the DAG quorum voters, which means that when power is restored, even in the absence of WAN connectivity to the DAG members in the standby datacenter, the DAG members in the primary datacenter have a majority and therefore have quorum. This is a problem because with quorum, these servers may be able to mount their databases, which in turn would cause divergence from the actual active databases that are now mounted in the activated standby datacenter.

    DACP was created to address this issue. Active Manager stores a bit in memory (either a 0 or a 1) that tells the DAG whether it's allowed to mount local databases that are assigned as active on the server. When a DAG is running in DAC mode (which would be any DAG with three or more members), each time Active Manager starts up the bit is set to 0, meaning it isn't allowed to mount databases. Because it's in DAC mode, the server must try to communicate with all other members of the DAG that it knows to get another DAG member to give it an answer as to whether it can mount local databases that are assigned as active to it. The answer comes in the form of the bit setting for other Active Managers in the DAG. If another server responds that its bit is set to 1, it means servers are allowed to mount databases, so the server starting up sets its bit to 1 and mounts its databases.

    But when you recover from a primary datacenter power outage where the servers are recovered but WAN connectivity has not been restored, all of the DAG members in the primary datacenter will have a DACP bit value of 0; and therefore none of the servers starting back up in the recovered primary datacenter will mount databases, because none of them can communicate with a DAG member that has a DACP bit value of 1.

    Friday, April 2, 2010 10:22 AM
  • Databases will not be mounted on servers that do not have quorum. If a server notices that it no longer has quorum the Active Manager will dismount the database copies on itself, just to make sure that no split brain happens for the database.
    • Marked as answer by Haitham1977 Friday, April 2, 2010 2:37 PM
    Friday, April 2, 2010 2:34 PM
  • Is this true even if there is an alternate witness file share in the second site?  If the link goes down, the server will have access to the alternate WFS and still think it has a majority?
    Wednesday, March 23, 2011 6:03 PM
  • Sorry, it will not. The Alternate File Share Witness settings are only used during manual datacenter recovery tasks.

    Datacenter Switchovers: http://technet.microsoft.com/en-us/library/dd351049.aspx

    Restore-DatabaseAvailabilityGroup: http://technet.microsoft.com/en-us/library/dd351169.aspx


    Microsoft Premier Field Engineer, Exchange
    MCSA 2000/2003, CCNA
    MCITP: Enterprise Messaging Administrator 2010
    Former Microsoft MVP, Exchange Server
    My posts are provided “AS IS” with no guarantees, no warranties, and they confer no rights.
    Thursday, March 24, 2011 2:36 AM