locked
The correct way to do failback after Disaster Recovery RRS feed

  • Question

  • Hi there,

    I'm testing the DR failover (Invoke-CSManagementServerFailover ... -Force) scenario, but at the end of the test when bringing the failed site up the CMS lands on broken state.

    The test steps are:

    1. Stop the services (SQL, FEs, etc..) in the site A (with CMS)

    2. Perform the CMS failover from site A to B, with force and other required parameters.

    3. All looks good, CMS is running fine on site B.

    4. Then I start the services on site A again.

    When I now run the command Test-CsManagementServer, that fails with error:

    WARNING: Backup Central Management Store state is Active, the expected status is Backup. Note that if the local replica is out of date, the topology document may be obsolete. Ensure that the local replica is up to date, and run Test Management Server cmdlet.

    And yes, this ends to "Split Brain" scenario where both XDS DBs have master status.

    Is this a scenario which should not be done: If you lost the site, then restore is the only option to bring that back? Or have I missed some steps again? :)


    Petri

    Monday, November 5, 2018 1:20 AM

All replies

  • Hi Petri,

    The steps of Skype for Business Pool Failover & Failback you could refer to this link

    For the issue you are facing, you should update the CsDatabase which state is active, then change backup into active and update another one. I found another blog you could do some reference: Skype for Business – WARNING: Standard Edition Pool Failover Disaster. This blog shows the issue similar as yours, and it suggest that you need to have a backup and make sure you test failover in a controlled manner before having to rely on it for real.

    Note: This response contains a reference to a third party World Wide Web site. Microsoft can make no representation concerning the content of these sites. Microsoft is providing this information only as a convenience to you: this is to inform you that Microsoft has not tested any software or information found on these sites and therefore cannot make any representations regarding the quality, safety, or suitability of any software or information found there. There are inherent dangers in the use of any software found on the Internet, and Microsoft cautions you to make sure that you completely understand the risk before retrieving any software on the Internet.

    Best Regards,
    Evan Jiang


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.


    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    • Proposed as answer by woshixiaobai Friday, December 14, 2018 6:37 AM
    Monday, November 5, 2018 7:18 AM
  • Hi Evan87

    Unfortunately Shankary's document does not describe the site failover where the CMS was lost. He did the failover and failback in nice way when no services are lost.

    And about the blog, this of course makes me a bit worried:

    I would like to thank Chris Hayward (@WeakestLync) for warning me of a potential issue during failover that screws up your CMS.


    Petri

    Monday, November 5, 2018 10:01 AM
  • Hi Petri,

    Yes, Shankary's document shows the steps of SFB Pool Failover & Failback with the SFB environment works fine. In fact, the steps are same as the steps provided in the Shankary's document when the services down in SFB environment. You could also refer to this blog to find more details about Execute unplanned DR Fail Over and Fail back for Skype for Business

    In addition, as it said in the blog I provided in my last reply, it is better to have a backup and make sure you test failover in a controlled manner.

    Best Regards,
    Evan Jiang


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.


    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    Tuesday, November 6, 2018 7:14 AM
  • If you read the commends from other admin on the first blog you referred, that procedure seems to be an issue for others as well.

    The latest blog does not describe how the main site came back. If that has been done by recovery (from scratch) then that process is fine. If you are able to get the main site back normally, then you hit to these problems.


    Petri

    Tuesday, November 6, 2018 8:27 AM
  • Hi Petri,

    In fact, to failback to the main site, you just need to failover the CMS and failback the FE pool to the main site, it has been descripted in the blog I provided.

    If you only want to find the steps to test, you could refer to the similar case.

    Best Regards,
    Evan Jiang


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.


    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    Friday, November 9, 2018 3:14 AM
  • The blog says: "...Once after the main site is back..." wihtout saying how the site was lost and how it has bring back to online. As I told, and as the other blog describes as well, when the main site is coming back the SQLs on there have content which says: "I have active CMS". When that is happening, the failover to that site cannot be performed.

    Because of that, it is critical to know how the main site came back to online on the blog article:

    a) from recovery (SQLs are empty)

    b) by just starting it (SQLs contains old information)

    It could be so, that I'm still missing some steps, but as reading the comments from the other blog article, it makes me believe I'm not alone with this problem.


    Petri

    Wednesday, November 14, 2018 12:13 AM