none
merge subscription replication error

    Question

  • Hi,

    Sorry for the long post - just trying to be as detailed as possible!

    I have a merge replication setup, with 1 publisher and 6 subscribers.  The publisher is running SQL server 2008 R2, 5 of the 6 subscribers are still SQL Server 2005 and one is SQL Server 2008 R2 (running on Win Server 2008 R2, 64 bit).

    The subscription running SQL Server 2008 R2 has stopped replicating.  It was working fine (and has been for months), up until 2 days ago.  Nothing has changed regarding the database side of the replication set up and my network support guys inform me that nothing was changed on the server/network side of it.  The server (subscriber) has been rebooted since replication has been failing.

    When you click on “View Synchronisation Status”, in  the replication part of SSMS connected to the subscriber, it says “The agent is not running”.  Stopping and re-starting the agent from within SSMS has no effect when you try to run the job.  In services on the windows server, the SQL Server Agent is running and with SSMS it also says it is running.

    The windows Event log gives a warning when the replication job unsuccessfully runs:

    SQL Server Scheduled Job 'PIGFM2-COR-pigfmcor-pigfmcor-FS-BUNGO-pigfmcor- 0' (0x5A7113D7FF70594DA8CDE589B3F38ACF) - Status: Failed - Invoked on: 2012-02-22 09:52:09 - Message: The job failed.  The Job was invoked by User sa.  The last step to run was step 1 (Run agent.).

    Further info from the log - Source: SQLSERVERAGENT EventID: 208

    The replication job is running as the administrator on the local domain.  This hasn’t changed and has been the same username/password combination that has been used successfully up until now.

    The subscription job log file view indicates that the job has failed, with the last error message logged being:

    The replication agent encountered a failure. See the previous job step history message or Replication Monitor for more information.  The step failed.

    The previous job step indicates no error message and the first job step says that “The replication agent has been successfully started. See the Replication Monitor for more information.”

    The replication monitor at the publisher just simply says:

    Error messages:

    The replication agent has not logged a progress message in 10 minutes. This might indicate an unresponsive agent or high system activity. Verify that records are being replicated to the destination and that connections to the Subscriber, Publisher, and Distributor are still active.

    I have enabled verbose logging at the subscriber, and can paste the whole log file in if required.  However these are the last few lines, which don’t give much of a clue:

    2012-02-21 22:52:14.701 OLE DB Subscriber 'FS-SUB': {call sys.sp_MSadd_merge_history90 (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)}

    2012-02-21 22:52:14.763 Percent Complete: 0

    2012-02-21 22:52:14.763 OLE DB Subscriber 'FS-SUB': {call sys.sp_MSadd_merge_history90 (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)}

    2012-02-21 22:52:14.810 Retrieving publication information

    2012-02-21 22:52:14.903 OLE DB Distributor 'PUB-COR': {call sys.sp_MSadd_merge_history90 (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)}

    2012-02-21 22:52:15.013 Percent Complete: 0

    2012-02-21 22:52:15.091 Retrieving subscription information.

    2012-02-21 22:52:15.122 OLE DB Distributor 'PUB-COR': {call sys.sp_MSadd_merge_history90 (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)}

    So it looks like it connects, but after that – throws an error for some reason at the subscriber.  Replication has been started numerous times over the last couple of days and all errors have been the same.

    Does anybody have any clues on what else I could look for or do to get this replicating again?

    Thanks, Julie

    Wednesday, February 22, 2012 5:20 AM

All replies

  • Can you post a bit more information from the verbose logging? or upload the whole verbose logging somewhere and post the link

    Do you see any errors in the SQL Server error logs at the subscriber end?


    Regards, Ashwin Menon My Blog - http:\\sqllearnings.wordpress.com

    Wednesday, February 22, 2012 9:19 AM
  • Here's a link to the replication log file (everything since logging was turned on after the subscriber was restarted): https://docs.google.com/document/d/1fCsEoJJGAP7yJEHHVZ7U9vMMiJ-NAqz74dVyDmmRWSo/edit - replication log

    The sql server logs: https://docs.google.com/document/d/1KNggj8QlErzvs9zPuA-N0AJyzyUT4wNY9OcjLQxfARE/edit - first log

    https://docs.google.com/document/d/1m1EnDOrQrawG-DdLowarIaJhDRoWHrQT_fQTgIt-LSw/edit second log (after restart)

    SQLAgent log: https://docs.google.com/document/d/1kGOzws_LekqA3jwJ5h9BuhsLpbl13q1BY9Ex88vXLvs/edit - first one

    https://docs.google.com/document/d/1yMbRGpgK1F5vnECp25iBCcP565tQZcXq_ZN46vGOAxg/edit - second one

    Neither of the sql agent logs contain any errors.  The server logs look ok too, but somebody else may notice something.


    • Edited by bgothumper Thursday, February 23, 2012 12:31 AM
    Thursday, February 23, 2012 12:29 AM
  • Sometimes I don't trust the View Synchronization Status dialog when the Merge Agent is under a heavy load.

    According to your Merge Agent log I believe it is processing changes as normal.  It could be processing a large batch or something else.

    The error message you're seeing in Replication Monitor:

    The replication agent has not logged a progress message in 10 minutes. This might indicate an unresponsive agent or high system activity. Verify that records are being replicated to the destination and that connections to the Subscriber, Publisher, and Distributor are still active.

    This error is usually transitory and should clear up when the Merge Agent finishes processing the current batch.

    How long are you letting this sync before you stop the agent?  You might want to try giving it some more time.


    Brandon Williams (blog | twitter)

    Thursday, February 23, 2012 3:24 AM
    Moderator
  • Thanks for your reply.

    I'm actually not stopping the agent at all.  We have the sync set to run twice a day, 8 hours apart, after hours.  So at 5pm and 1am it will kick off.  On occasions we have manually started a sync when we need data across to the remote sites straight away, but normally it starts and we just leave it do its stuff.

    Thursday, February 23, 2012 3:54 AM
  • Thats not good  Can you increase the verbosity of your Merge Agent log and post the results back here?  Use the parameters -OutputVerboseLevel 4 -Output C:\TEMP\mergeagent.log for the Merge Agent job.


    Brandon Williams (blog | twitter)

    Thursday, February 23, 2012 4:09 AM
    Moderator
  • https://docs.google.com/document/d/1HpWBHnOAb-rlwq-ILIGySY3aUWiDw1rtxfBPU7SPl4Q/edit - mergeagent log, verbose level 4.

    Not sure if this is helpful or not, but this is an extract of the trace files for the server.

    https://docs.google.com/spreadsheet/ccc?key=0AtI06Fm76EzJdEowTTZPLUJvclc1UW5aZ1MyMnVrOXc  The first group is the last successful replication that was run.  The second group is the first unsuccesful replication and the others are examples of a couple of replications that have run and failed since.

    Basically the last successful replication had an entry for the SQL Agent, then the replication merge agent.  After that, the replication merge agent hasn't started and all replications have failed.

    SQLAgent - Job Manager 20/02/2012 14:21:14.143
    Replication Merge Agent 20/02/2012 14:21:43.240
    Replication Merge Agent 20/02/2012 14:21:43.577
    Replication Merge Agent 20/02/2012 14:21:43.580
    Replication Merge Agent 20/02/2012 14:21:43.590
     
    SQLAgent - Job Manager 20/02/2012 15:00:14.670
     

    • Edited by bgothumper Thursday, February 23, 2012 5:08 AM
    Thursday, February 23, 2012 5:07 AM
  • There doesn't appear to be any errors in the Merge Agent log.

    Are there any interrupted generations at the publisher or subscriber?

    SELECT 
    	COUNT(*) 
    FROM dbo.MSmerge_genhistory 
    WHERE genstatus = 4

    What is the changecount sum for open generations at the subscriber?

    SELECT 
    	SUM(changecount) AS [open gen changecount sum] 
    FROM dbo.MSmerge_genhistory
    WHERE genstatus = 0

    Brandon Williams (blog | twitter)

    Thursday, February 23, 2012 6:33 AM
    Moderator
  • The result for the first query at both the publisher and subscriber is 0.

    The result for the open gen changecount sum at the subscriber is 363.

    Thursday, February 23, 2012 11:39 AM