none
Question about Copy Queue Length

    Question

  • I have a general question that I would just like to know more info about and I'm having trouble finding info. I'd like to know how to get a database copy up to speed with the active database, with respect to Copy Queue Length. Here's my scenario, if this helps:

    I was using the Load Generator Beta against my Exchange 2010 servers (specifically against a database that did not have a DAG copy). while the loadgen was running, I created a copy of the database on the other DAG member server. The copy came up fine, but it started up with a copy queue length of something like 200000. it also said it the Copy Status was Healthy. Well, I wanted to switch the active database over to the new one, but I can't make it the active db while its copy queue length is so large. If I suspend the db and Update it, the Copy Queue Length goes down while it is synchronizing, but it stops after going down by a few thousand and then says it is Healthy again.

    My real question here, though, is this. If I have a db copy sitting there with a very large copy queue length, is there a way to force it to "catch up" with the active copy? is there a good blog post or something out there about this topic?

    thanks,
    Paul
    Wednesday, February 03, 2010 7:11 PM

Answers

  • This is a test environment right? Are you removing the databases and adding them back with the same names? This sounds like it could be a problem with stale data in the cluster database. If you create a new mailbox database, and move the mailboxes to the new database (which is a good stress tests itself), does it still happen?

    • Marked as answer by paulbrown83 Monday, February 08, 2010 7:45 PM
    Friday, February 05, 2010 11:45 PM

All replies

  • Paul,

    Did you check the copy queue length using Perfmon also ? If not have a look at it and check if the queue there is also a large amount or zero. In Exchange 2007 there were some issues with this have a look at the blog of Dale Halin:

    http://blogs.technet.com/dhardin/archive/2009/08/12/copyqueuelength-is-displayed-incorrectly-in-a-standby-continuous-replication-environment.aspx

    Regards,

    Johan
    Exchange-blog: www.johanveldhuis.nl
    Wednesday, February 03, 2010 9:05 PM
  • Hi Paul,

    I asked this question to the product team a while back as I was seeing the same thing when making new DB copies. If I remember the repsonse correctly what you are seeing is the CQL of the highest generated log file at the time the DB copy is seeded. Usually as soon as the copying is unsuspended the CQL drops down to a normal value. So if 450,000 log files had been generated since the beginning of that DBs life then you'd see that in the CQL right after the initial seed, but it would drop to normal single digit values once things start running.

    Have you checked to see if copying is in a suspended state?


    Brian Day, Overall Exchange & AD Geek
    MCSA 2000/2003, CCNA
    MCTS: Microsoft Enterprise Server 2010, Configuration
    MCITP: Enterprise Messaging Administrator 2010
    LMNOP
    Wednesday, February 03, 2010 9:20 PM
  • very interesting stuff... it appears i am having the same issue that Dale Hardin describes in the blog post linked above. If I run the Get-MailboxDatabaseCopyStatus command for my db copy, i find the following numbers:

    CopyQueueLength: 183824
    LastLogGenerated: 188723
    LastLogCopyNotified: 4899
    LastLogInspected: 4899
    LastLogReplayed: 4899

    As you can see, it is calculating the CQL based on those numbers. However, if I look at PerfMon like you said Brian, I find that the CQL is 0. Curiously, everything in the MSExchange Replication group is zero, except the following, which are all equal to 1,132:  CopyGenerationNumber, CopyNotificationGenerationNumber, InspectorGenerationNumber, ReplayGenerationNumber, and ReplayNotificationGenerationNumber. I don't really know what to make of that.

    So with all of that said, is there a way to get Exchange to realize the CQL is actually zero? I can't really make this the active database until it "catches up", right?
    Thursday, February 04, 2010 1:41 PM
  • Paul,

    Have you tried the suggestion of Dale:

    To resolve the issue right away, you can restart the Microsoft Exchange Replication Service on the SCR target.  The problem can be avoided altogether by deleting the database and logs from the SCR target so that the target is reseeded when you re-enable SCR.

    Regards,

    Johan
    Exchange-blog: www.johanveldhuis.nl
    Thursday, February 04, 2010 1:48 PM
  • If I stop and start the Replication service on the DAG target, then it of course temporarily goes into "ServiceDown" status, so that doesn't seem to have worked.

    I can delete the database copy from the DAG and just create a new copy from the original, but I'm at least trying to use this as an opportunity to learn the ins and outs of the DAG, and if there's some quirk like this that has a solid workaround to fix it, that'd be a great thing to take note of.
    Thursday, February 04, 2010 2:04 PM
  • Paul,

    Part 1 I totally agree. About the 2nd thing I think it will be the only solution but maybe someone else has another suggestion.

    Regards,

    Johan
    Exchange-blog: www.johanveldhuis.nl
    Thursday, February 04, 2010 2:07 PM
  • Ok, I am going with the option of deleting the database copy and recreating it. When I recreate it, though, it is coming back up with a Copy Queue Length of roughly 183800. I can suspend it and update it, and it comes back as Healthy, but the CQL only decrements by a few numbers.

    This happened to one of my previous databases that I ran load generator on. This makes me wonder if loadgen does a number on the databases, or would this kind of thing happen in a production environment under normal everyday use?
    Friday, February 05, 2010 2:50 PM
  • Which loadgen do you use, loadgen for 2010 ? This because it is still beta ;-) but I did not find anything which points to a bug or something.

    Which rollup is installed on Exchange ?

    Regards,

    Johan
    Exchange-blog: www.johanveldhuis.nl
    Friday, February 05, 2010 3:52 PM
  • Yep, I'm using the 2010 beta load generator. overall i think it is a decent tool, it just shows lots of sloppiness and looks like it was thrown together as an afterthought.

    I don't remember installing any rollups on my exchange 2010 environment, and i dont see any rollups anywhere in the update history. that said, is there a surefire way to check exactly what version i am running?
    Friday, February 05, 2010 4:21 PM
  • If you suspend/resume the copy and then generate one more log, does that flush all of the numbers to be correct? Is circular logging turned on? Did the target disk fill up?
    Friday, February 05, 2010 5:43 PM
  • If I suspend/resume the copy, then the CQL number stays the same. If I suspend it and then update it, then resume it, i have found that the number goes down by 4 or 5.

    I just tried the circular logging thing. It did not seem to help. However... the other day when I ran a load test on exchange, one of my mailbox servers did run out of Log disk space. as expected, the db stopped functioning. at that point, i turned on circular logging and suspended/resumed it, it cleared the Logs drive and continued on its merry way. it seemed to work exactly like it should have. this time, not so much.
    Friday, February 05, 2010 6:36 PM
  • This is a test environment right? Are you removing the databases and adding them back with the same names? This sounds like it could be a problem with stale data in the cluster database. If you create a new mailbox database, and move the mailboxes to the new database (which is a good stress tests itself), does it still happen?

    • Marked as answer by paulbrown83 Monday, February 08, 2010 7:45 PM
    Friday, February 05, 2010 11:45 PM
  • It turns out I just migrated users over to a newly created database, like you mentioned, jader3rd. Even after the first db was empty (had no users on it), i tried making a copy of the db and it had a crazy high CQL as well. the new database was created successfully, and i was able to make a copy of it.

    seems like a pretty bad bug, if you ask me. i just fear this happening to production databases and having to move users around. with the new Exchange 2010 client access architecture, this shouldn't affect users, but it sure would be a pain to administer.
    Monday, February 08, 2010 7:44 PM
  • I am testing out DAG in a lab environment and just came across this same issue. This lab is for a client as I am showing them how wonderful DAG. We are testing failovers and switchovers along with how to seed databases. After we reseeded the databases, we cannot switchover to the "seeded" server because of this issue. Has any resolution been found yet?? Currently I have 2 databases with copy queue lengths of 153 and 30750 respectively.

    Thanks.
    Tuesday, March 16, 2010 5:33 PM
  • Hmmmm...well I went onto the active server, dismounted all databases, went into the folders and deleted all log files, folders and everything else I saw except for the database EDB file. I then did the same thing on the passive server but also deleted the EDB file. After an update of the mailbox database copy (reseed), the CQL went down to 0!!
    • Proposed as answer by WynnLu Wednesday, November 03, 2010 7:54 PM
    Tuesday, March 16, 2010 6:35 PM
  • Was having problems on a remote multi-site DAG getting mailboxes to copy.  Would come up failed, then get stuck in Resyncronising after suspending and updating. 

    Followed ibenna's experience... I dismounted the db on the host server, removed all of the mailbox database log files (not the folders).  Remounted the db, logs were rebuilt.  Re-created the database copy on the remote server and it came up after a few minutes as healthy!

    Monday, June 10, 2013 2:14 PM