none
Exchange 2010 DAG: Force DB copy with status "Failed and Suspended" to mount?

    Question

  • Hi,

    I have a question for the following situation:

    - Server DAG01: Hosts active copy of DB001
    - Server DAG02: Hosts passive copy of DB001
    - Activation Policy on DAG02 is set to blocked
    - removing storage on DAG01 that hosts DB001
    - manually activating passive copy of DB001 on DAG02 with using "Losless"

    This brings an error message that logs are missing and sets the DB to "Failed and Suspended". After this, I can't seem to be able to activate the DB, even when choosing a different mount dial setting.

    Is there a way to force the DB to become active?

    Thanks

    Friday, June 15, 2012 10:16 AM

Answers

  • I went to Disk Management and set the disk that containes the DB to offline to simulate a storage failure. Since the Activation Policy on the DAG server that holds the passive copies is set to Blocked I then went to EMC, right clicked the passive copy of the DB and selected "Activate Database Copy" with Database Mount Dial Setting of "Lossless".

    This generated the error I wrote above. After that, the passive DB either has a status of "Failed and Suspended" or "Disconnected and Resynchronizing". When it is "Disconnected and Resynchronizing" I can right click it again and successfully activate it using another Mount Dial Setting.

    When it is "Failed and Suspended", I cannot activate it using the GUI. I have however found out that it is possible to activate it using EMS with the SkipHealthChecks parameter of the Move-ActiveMailboxDatabase Cmdlet:

    The SkipHealthChecks parameter specifies whether to bypass passive copy health checks. With the SkipHealthChecks parameter, you can move the active copy to a database copy that's in the Failed state. This parameter should be used only if the initial attempt to move the active database has failed. This is because SkipHealthChecks performs additional validation to ensure that the log files are consistent, which can take a considerable amount of time.

    http://technet.microsoft.com/en-us/library/dd298068.aspx

    However, since the ContentIndex State was failed as well I also needed to specify SkipClientExperienceChecks parameter. All in all, I got it activated again using the following command:

    [PS] C:\Windows\system32>Move-ActiveMailboxDatabase -Identity "DB001" -ActivateOnServer "2010-mig-ex02" -MountDialOverride "BestAvailability
    " -SkipActiveCopyChecks -SkipClientExperienceChecks -SkipHealthChecks

    After adding storage again, I was successfuly able to re-seed the DB that was active before.


    • Edited by sam.bell Friday, June 15, 2012 1:17 PM
    • Proposed as answer by Sukh828 Friday, June 15, 2012 1:58 PM
    • Marked as answer by sam.bell Friday, June 15, 2012 2:04 PM
    Friday, June 15, 2012 1:17 PM

All replies

  • Can you post the exact error and checl the app log for any event ID's?

    Sukh

    Friday, June 15, 2012 10:41 AM
  • I just performed another test to collect the logs and now the results are different: Once the storage of the active databases becomes unvailable, the DB goes into state "Dismounted".  Passive copy of the DB is "Healthy". Copy and Replay Queues are both zero. When I now try to activate the passive copy with Mount Dial setting set to "Losless", I receive the following error message:

    Cannot activate database copy 'Activate Database Copy...'.

    Activate Database Copy...
    Failed
    Error:
    An Active Manager operation failed. Error The database action failed. Error: The database was not mounted because it has experienced data loss as a result of a switchover or failover, and the attempt to copy the last logs from the source server failed. Please check the event log for more detailed information. Specific error message: Attempt to copy remaining log files failed for database DB001\2010-MIG-EX02. Error: The log copier was unable to continue processing for database 'DB001\2010-MIG-EX02' because the source server '2010-MIG-EX01.test.local' returned an error: Invalid file path (-1023) [HResult: 0x80131501]. The copier will automatically retry after a short delay.

    . [Database: DB001, Server: 2010-mig-ex02.test.local]

    An Active Manager operation failed. Error The database was not mounted because it has experienced data loss as a result of a switchover or failover, and the attempt to copy the last logs from the source server failed. Please check the event log for more detailed information. Specific error message: Attempt to copy remaining log files failed for database DB001\2010-MIG-EX02. Error: The log copier was unable to continue processing for database 'DB001\2010-MIG-EX02' because the source server '2010-MIG-EX01.test.local' returned an error: Invalid file path (-1023) [HResult: 0x80131501]. The copier will automatically retry after a short delay.

    Now, instead of my test before, the passive copy has the status of "Disconnected and Resynchronizing". I'm not sure yet why I now see a different state. However, being in this state I can successfully activate the the DB with another mount dial setting.


    • Edited by sam.bell Friday, June 15, 2012 12:29 PM
    Friday, June 15, 2012 12:28 PM
  • How was the storage moved?

    How was the server/database taken offline? Was a switchover performed? Or a failover? Was it clean?


    Sukh

    Friday, June 15, 2012 12:32 PM
  • I went to Disk Management and set the disk that containes the DB to offline to simulate a storage failure. Since the Activation Policy on the DAG server that holds the passive copies is set to Blocked I then went to EMC, right clicked the passive copy of the DB and selected "Activate Database Copy" with Database Mount Dial Setting of "Lossless".

    This generated the error I wrote above. After that, the passive DB either has a status of "Failed and Suspended" or "Disconnected and Resynchronizing". When it is "Disconnected and Resynchronizing" I can right click it again and successfully activate it using another Mount Dial Setting.

    When it is "Failed and Suspended", I cannot activate it using the GUI. I have however found out that it is possible to activate it using EMS with the SkipHealthChecks parameter of the Move-ActiveMailboxDatabase Cmdlet:

    The SkipHealthChecks parameter specifies whether to bypass passive copy health checks. With the SkipHealthChecks parameter, you can move the active copy to a database copy that's in the Failed state. This parameter should be used only if the initial attempt to move the active database has failed. This is because SkipHealthChecks performs additional validation to ensure that the log files are consistent, which can take a considerable amount of time.

    http://technet.microsoft.com/en-us/library/dd298068.aspx

    However, since the ContentIndex State was failed as well I also needed to specify SkipClientExperienceChecks parameter. All in all, I got it activated again using the following command:

    [PS] C:\Windows\system32>Move-ActiveMailboxDatabase -Identity "DB001" -ActivateOnServer "2010-mig-ex02" -MountDialOverride "BestAvailability
    " -SkipActiveCopyChecks -SkipClientExperienceChecks -SkipHealthChecks

    After adding storage again, I was successfuly able to re-seed the DB that was active before.


    • Edited by sam.bell Friday, June 15, 2012 1:17 PM
    • Proposed as answer by Sukh828 Friday, June 15, 2012 1:58 PM
    • Marked as answer by sam.bell Friday, June 15, 2012 2:04 PM
    Friday, June 15, 2012 1:17 PM
  • One question: For the AutoDatabaseMountDial parameter, Microsoft states the following:

    If you specify this value, the database doesn't automatically mount until all logs that were generated on the active copy have been copied to the passive copy

    At first I thought - fine as long as the Copy Queue length is 0, this should work. However, since in this case I always got the reported errors from above, I think this option means that even though the Copy Queue length is 0, the server will always check with the source whether all logs have been processed.

    Is that correct? That would expain why I have seen the error even though the Queue was 0 in my case. In addition, Lossless works just fine as long as the source is online.

    Friday, June 15, 2012 2:30 PM