locked
Exchange 2013 copy queue is chronically high (help welcome as 3000 mailbox migration is blocked behind this) RRS feed

  • Question

  • I have 6 Exchange 2013 CU 13 servers, 3 CAS and 3 Mailbox in a DAG. After creating the Dag, I noticed that the CopyQueueLength values on various databases would begin to increase and not decrease. I tried to restart the msexchangerepl service on the affected mailbox servers and received the message "the service would not stop in a timely manner".
    I then killed the msexchagnerepl process using the task manager, at which point the service restarted automatically. At that point, the CopyQueueLength values all began to decrease until they reached zero. However, within a few hours, the same issue occurred again. The symptoms were the same, I could not stop the replication service using the services manager but when I killed the process the CopyQueueLength would drop but in a few hours would start to go back up again.

    I've repeated the process several times and the issue continuously comes back. I also tried the following:
    1. Run resume-mailboxdatabasecopy on the copies with the CopyQueueLength values that are above zero. When I did that, I received "warning: the Microsoft exchange replication service hasn't responded to the request to resume. The replication service might not be running...". I checked and it was still running on all 3 mailbox servers.
    2. Reboot each mailbox server
    3. Remove all database copies, remove each of the mailbox servers from the DAG, delete the DAG. Then, create a new DAG, at each mailbox server, and create new mailbox database copies.

    None of these things have helped, the issue keeps happening. I welcome any thoughts as to what I do next.

    • Edited by Bobby Dore Monday, July 11, 2016 12:52 AM
    Monday, July 11, 2016 12:51 AM

Answers

  • Hi,

    Is there any application log related to Exchange or DAG? If you find Event id 1009, 1010, please try below link to create ContentSubmitters security group: https://support.microsoft.com/en-us/kb/2807668

    The copy queue is basically the number of logs that need to be shipped from the active copy to the passive, and the copy queue will increase if copy process going on. Please check the status of Get-MailboxDatabaseCopyStatus, then check status of index, value of LastReplayedLogTime, LastCopiedLogTime and copy queue.

    Moreover, please use below command to check replication status:
    Test-ReplicationHealth -Identity ActiveMailboxServer | FL
    Test-ReplicationHealth -Identity PassiveMailboxServer | FL


    Please remember to mark the replies as answers if they help, and unmark the answers if they provide no help. If you have feedback for TechNet Support, contact tnmff@microsoft.com.

    Allen Wang
    TechNet Community Support

    • Proposed as answer by Allen_WangJF Sunday, July 24, 2016 9:11 AM
    • Marked as answer by Allen_WangJF Monday, July 25, 2016 12:55 PM
    Monday, July 11, 2016 9:48 AM

All replies

  • I would figure out why you're having so much replication.  Stopping the service isn't the answer.  Perhaps you have a mail loop or a spam bot or something like that sending a large amount of messages.  Or it could be that your network isn't sufficient to handle your load.

    Ed Crowley MVP "There are seldom good technological solutions to behavioral problems."
    Celebrating 20 years of providing Exchange peer support!


    Monday, July 11, 2016 2:39 AM
  • Hi Ed,

    Thanks for the response. I don't believe I'm having large amounts of replication. There are currently less than 50 mailboxes across the 3 mailbox servers and the copyqueuelength values are creeping up slowly. As an example, one is ~150, up from ~50 a couple of hours ago.

    I'm also not seeing large amounts of logs being generated in the log directories of these databases (it's commensurate with the numbers I mentioned above). Overall, there doesn't seem to be large amount of activity, it's just that the copy queues are continuing to grow slowly based on the activity that is there.

    Monday, July 11, 2016 2:49 AM
  • Hi,

    Is there any application log related to Exchange or DAG? If you find Event id 1009, 1010, please try below link to create ContentSubmitters security group: https://support.microsoft.com/en-us/kb/2807668

    The copy queue is basically the number of logs that need to be shipped from the active copy to the passive, and the copy queue will increase if copy process going on. Please check the status of Get-MailboxDatabaseCopyStatus, then check status of index, value of LastReplayedLogTime, LastCopiedLogTime and copy queue.

    Moreover, please use below command to check replication status:
    Test-ReplicationHealth -Identity ActiveMailboxServer | FL
    Test-ReplicationHealth -Identity PassiveMailboxServer | FL


    Please remember to mark the replies as answers if they help, and unmark the answers if they provide no help. If you have feedback for TechNet Support, contact tnmff@microsoft.com.

    Allen Wang
    TechNet Community Support

    • Proposed as answer by Allen_WangJF Sunday, July 24, 2016 9:11 AM
    • Marked as answer by Allen_WangJF Monday, July 25, 2016 12:55 PM
    Monday, July 11, 2016 9:48 AM
  • One thing you can try is to reset the TCP chimney.

    netsh int tcp set global chimney=disabled

    netsh int tcp set global chimney=automatic

    Then suspend/resume the copies. Do they start to keep up then?

    Other things to look at would be performance counters on both the source and the target? Is there sufficient CPU, are the disk counters max out on either the source or the target?

    Monday, July 11, 2016 8:49 PM