none
Passive copies of DAG DBs periodically are going to a disconnected state

    Question

  • Hi,

    Two weeks ago we were attacked by spammers. About 5 million emails were received by our two exchange 2013 servers (DAG+ shadow redundancy)
    Since then every couple hours passive databases are in "Disconnected and Healthy" with a growing copy queue length. One DB at time. 

    There are no network lags between exchange node. I can confirm that RTT 1-5ms. Network link is permanently monitored. 
    All IP traffic is permitted as it was before the issue. 
    Test-cluster didn’t show relevant information.

    I have created a new database and move a test mailbox to the new database. About in hour passive copy of the new database also got status "Disconnected and Healthy". Because of that, It seems to me reseeding DBs will not resolve the issue.
    I’m thinking about removing DAG and recreating it with a new name.

    Any ideas?

    I see EventID 2153 every time the status changes to "Disconnected and Healthy"
    The log copier was unable to communicate with server <servername>. The copy of database <dbname> is in a disconnected state. The communication error was: An error occurred while communicating with server <servername>. Error: Unable to write data to the transport connection: An established connection was aborted by the software in your host machine. The copier will automatically retry after a short delay.

    Any advises are appreciate. Thanks in advance.

    UPD: EventID 2153 depends on traffic volume required for seeding. If traffic is huge EventId 2153 pop ups often.
    • Edited by IvanSergeev Thursday, April 12, 2018 12:19 AM EventID 2153
    Wednesday, April 11, 2018 11:34 PM

All replies

  • Hi Ivan,

    Based on my searching, someone encountered similar problems due to disk performance, please have a check also.

    You can refer to the following article to use window performance monitor to check :

    Windows Performance Monitor Disk Counters Explained

    Hope it helps,


    Best Regards,
    Niko Cheng


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.


    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    Thursday, April 12, 2018 8:33 AM
    Moderator
  • Niko,

    Thanks for your reply.

    I have a lot of metrics collecting from exchange servers, including HDD/Disk counters.

    From disk subsystem perspective it looks like the server doesn’t have much work :)

    I believe that is not the case.


    Thursday, April 12, 2018 2:48 PM
  • Hi IvanSergeev,

    I would recommend you open a ticket to dig this issue deeply.


    Best Regards,
    Niko Cheng


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.


    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    Wednesday, April 18, 2018 2:00 AM
    Moderator
  • Alas, we do not have support contract.
    Sunday, April 22, 2018 10:39 PM
  • Also, in the Microsoft-Exchange-HighAvailability/BlockReplication log I see:

    Event ID:      267

    Description:

    BlockModeReplication is terminated on the passive side for database <database> because the depth limit of 10485760 bytes was reached.

    And

    Event ID: 801

    Description:

    BlockModeReplication was unable to drain the write queue for passive database <database> after a timeout of '00:00:05'.

    Sunday, April 22, 2018 10:45 PM
  • Also, in the Microsoft-Exchange-HighAvailability/BlockReplication log I see:

    Event ID:      267

    Description:

    BlockModeReplication is terminated on the passive side for database <database> because the depth limit of 10485760 bytes was reached.

    And

    Event ID: 801

    Description:

    BlockModeReplication was unable to drain the write queue for passive database <database> after a timeout of '00:00:05'.


    You need to open a case with Microsoft support. You don't need a contract to do that. 
    Sunday, April 22, 2018 11:01 PM
    Moderator
  • Hi,

    Simple DB reseeding did the trick... hmm

    Friday, May 4, 2018 11:05 PM