none
Site replication degraded then a minute later, changes back to active

    Question

  • So I'm not sure if I have a "real" problem here because I cant find anything actually broken. In the component status, the site replication for both my CAS and Primary fail then a minute later, the status changes back to active and the error is cleared. Looking at rcmctrl.log during the time of the error, I cant find a single error as to why the site was degraded (log below). Ive also took a look at replmgr.log and couldnt find anything either...However the boss is positive there is a problem...so my question is, is there anywhere else I should be looking to see why the site was degraded? Or is there something I'm missing? 

    rcmctrl.log:

    DRS change application started. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:19 AM 7720 (0x1E28) Launching 7 sprocs on queue ConfigMgrDRSQueue and 7 sprocs on queue ConfigMgrDRSSiteQueue. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:19 AM 7720 (0x1E28) The asynchronous command finished with return message: [spDRSActivation finished at 1/8/2013 8:20:19 AM. End execute query finished at 1/8/2013 8:20:19 AM.]. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:19 AM 7720 (0x1E28) The asynchronous command finished with return message: [spDRSActivation finished at 1/8/2013 8:20:20 AM. End execute query finished at 1/8/2013 8:20:20 AM.]. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:19 AM 7720 (0x1E28) The asynchronous command finished with return message: [spDRSActivation finished at 1/8/2013 8:20:31 AM. End execute query finished at 1/8/2013 8:20:31 AM.]. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:19 AM 7720 (0x1E28) The asynchronous command finished with return message: [spDRSActivation finished at 1/8/2013 8:20:19 AM. End execute query finished at 1/8/2013 8:20:19 AM.]. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:19 AM 7720 (0x1E28) The asynchronous command finished with return message: [spDRSActivation finished at 1/8/2013 8:20:31 AM. End execute query finished at 1/8/2013 8:20:31 AM.]. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:19 AM 7720 (0x1E28) The asynchronous command finished with return message: [spDRSActivation finished at 1/8/2013 8:20:19 AM. End execute query finished at 1/8/2013 8:20:19 AM.]. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:19 AM 7720 (0x1E28) The asynchronous command finished with return message: [spDRSActivation finished at 1/8/2013 8:20:31 AM. End execute query finished at 1/8/2013 8:20:31 AM.]. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:19 AM 7720 (0x1E28) There are 14 Drs Activations sprocs running. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:19 AM 7720 (0x1E28) InvokeRcmMonitor thread wait one more minute for incoming event... SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:31 AM 7728 (0x1E30) InvokeRcmConfigure thread wait one more minute for incoming event... SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:33 AM 7724 (0x1E2C) Wait for inbox notification timed out. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:33 AM 7604 (0x1DB4) Cleaning the RCM inbox if there are any *.RCM files for further change notifications.... SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:33 AM 7604 (0x1DB4) Initializing RCM. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:33 AM 7604 (0x1DB4) Processing Replication Configure SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:33 AM 7724 (0x1E2C) Processing Replication Monitor SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:33 AM 7728 (0x1E30) Summarizing all replication links for monitoring UI. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:33 AM 7728 (0x1E30) Running configuration EnsureServiceBrokerEnabled. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:33 AM 7724 (0x1E2C) Running configuration EnsureServiceBrokerQueuesAreEnabled. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:33 AM 7724 (0x1E2C) STATMSG: ID=7816 SEV=E LEV=M SOURCE="SMS Server" COMP="SMS_REPLICATION_CONFIGURATION_MONITOR" SYS=Server.lab.pri SITE=KAS PID=3860 TID=7728 GMTDATE=Tue Jan 08 14:21:33.435 2013 ISTR0="NAL" ISTR1="" ISTR2="" ISTR3="" ISTR4="" ISTR5="" ISTR6="" ISTR7="" ISTR8="" ISTR9="" NUMATTRS=0 SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:33 AM 7728 (0x1E30) The current site status: ReplicationActive. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:33 AM 7728 (0x1E30) Processing replication pattern global. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:33 AM 7724 (0x1E2C) Processing replication pattern site. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:33 AM 7724 (0x1E2C) Processing Replication success. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:34 AM 7728 (0x1E30) Rcm control is waiting for file change notification or timeout after 60 seconds. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:34 AM 7604 (0x1DB4) Cleaning the RCM inbox if there are any *.RCM files for further change notifications.... SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:34 AM 7604 (0x1DB4) Rcm control is waiting for file change notification or timeout after 60 seconds. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:34 AM 7604 (0x1DB4) Processing Replication success. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:35 AM 7724 (0x1E2C) Cleaning the RCM inbox if there are any *.RCM files for further change notifications.... SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:35 AM 7604 (0x1DB4) Rcm control is waiting for file change notification or timeout after 60 seconds. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:35 AM 7604 (0x1DB4) DRS sync started. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:21:46 AM 7716 (0x1E24) DRS change application started. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:19 AM 7720 (0x1E28) Launching 7 sprocs on queue ConfigMgrDRSQueue and 7 sprocs on queue ConfigMgrDRSSiteQueue. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:19 AM 7720 (0x1E28) The asynchronous command finished with return message: [spDRSActivation finished at 1/8/2013 8:21:30 AM. End execute query finished at 1/8/2013 8:21:30 AM.]. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:19 AM 7720 (0x1E28) The asynchronous command finished with return message: [spDRSActivation finished at 1/8/2013 8:21:31 AM. End execute query finished at 1/8/2013 8:21:31 AM.]. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:19 AM 7720 (0x1E28) The asynchronous command finished with return message: [spDRSActivation finished at 1/8/2013 8:21:31 AM. End execute query finished at 1/8/2013 8:21:31 AM.]. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:19 AM 7720 (0x1E28) There are 14 Drs Activations sprocs running. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:19 AM 7720 (0x1E28) InvokeRcmMonitor thread wait one more minute for incoming event... SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:34 AM 7728 (0x1E30) InvokeRcmConfigure thread wait one more minute for incoming event... SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:35 AM 7724 (0x1E2C) Wait for inbox notification timed out. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:35 AM 7604 (0x1DB4) Cleaning the RCM inbox if there are any *.RCM files for further change notifications.... SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:35 AM 7604 (0x1DB4) Initializing RCM. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:35 AM 7604 (0x1DB4) Processing Replication Configure SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:35 AM 7724 (0x1E2C) Running configuration EnsureServiceBrokerEnabled. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:35 AM 7724 (0x1E2C) Running configuration EnsureServiceBrokerQueuesAreEnabled. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:35 AM 7724 (0x1E2C) Processing Replication Monitor SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:35 AM 7728 (0x1E30) Summarizing all replication links for monitoring UI. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:35 AM 7728 (0x1E30) STATMSG: ID=7828 SEV=I LEV=M SOURCE="SMS Server" COMP="SMS_REPLICATION_CONFIGURATION_MONITOR" SYS=Server.lab.pri SITE=KAS PID=3860 TID=7728 GMTDATE=Tue Jan 08 14:22:35.569 2013 ISTR0="NAL" ISTR1="" ISTR2="" ISTR3="" ISTR4="" ISTR5="" ISTR6="" ISTR7="" ISTR8="" ISTR9="" NUMATTRS=0 SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:35 AM 7728 (0x1E30) Processing replication pattern global. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:35 AM 7724 (0x1E2C) The current site status: ReplicationActive. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:35 AM 7728 (0x1E30) Processing replication pattern site. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:35 AM 7724 (0x1E2C) Processing Replication success. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:36 AM 7728 (0x1E30) Rcm control is waiting for file change notification or timeout after 60 seconds. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:36 AM 7604 (0x1DB4) Cleaning the RCM inbox if there are any *.RCM files for further change notifications.... SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:36 AM 7604 (0x1DB4) Rcm control is waiting for file change notification or timeout after 60 seconds. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:36 AM 7604 (0x1DB4) Processing Replication success. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:37 AM 7724 (0x1E2C) Cleaning the RCM inbox if there are any *.RCM files for further change notifications.... SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:37 AM 7604 (0x1DB4) Rcm control is waiting for file change notification or timeout after 60 seconds. SMS_REPLICATION_CONFIGURATION_MONITOR 1/8/2013 8:22:37 AM 7604 (0x1DB4)
    Tuesday, January 08, 2013 4:05 PM

Answers

  • Degraded Link means any of the replication group is processing the message slower than the defined threshold. An admin can define this threshold according to their requirement in console

    Configuration Manager 2012 Console-> Administration WunderBar-> Hierarchy Configuration-> Database Replication-> Alt site database ->Link Properties

    For Ex- If the threshold set to downgrade the link status to Degraded is 20 and the sync interval for the replication group "Status_Messages" is 5 then the Link will be downgraded to Degraded state if the message is processing for more than 100 Mins (20 * 5)

    You can identify the Sync interval of a replication group by running the below query.

    Select * from Replicationdata (Sync Intervals are either 1,2 or 5)

    You may find the replication group which is in degraded state using the below query.

    SELECT RLS.SiteOwner,RLS.SiteSending,RLS.SiteReceiving,RLS.ReplicationID,rep.ReplicationGroup,RLS.UpdateTime-GETUTCDATE()+GETDATE() [UpdateTime],RLS.StatusName ,rep.ReplicationPattern,RLS.LastSyncFinishTime-GETUTCDATE()+GETDATE() [LastSyncFinishTime],rep.ReplicationPriority FROM dbo.RCM_ReplicationLinkStatus RLS INNER JOIN dbo.ReplicationData rep ON RLS.ReplicationID = rep.ID

    AND RLS.Status = 8

    --

    Steps you may need to check while troubleshooting Degraded Link status

    1. Check the backlog count (replication backlogs)

    2. Check the Message count, Message Data size that is being sent by CAS and received at Primary site. You can find this from DRSSendHistory and DRSRecieveHistory (Normally the degraded state will be when the CAS sent a huge chunk of data to primary)

    3. Check the SQL Server performance

    4. Check the DISK I/O while the link is in degraded status.


    Regards, Rajul






    Wednesday, February 06, 2013 5:12 AM
  • We've had confirmation from Premier Support that this is known by the Product Group and is expected behavior.

    Since the replication does indeed replicate all objects, and the status goes back to Active within a minute, it is not treated as a problem. The link would say Failed if it was a problem with the link replication. The degraded state is based on the amount of retries your site system use on replicating all the replication groups, and that again is based on performance.

    Based on Rajul OS' post, we were able to tune our retry value to the point where the link no longer gets degraded, and the replication still goes 100% okay, without any delay.

    Thursday, February 07, 2013 11:43 AM

All replies

  • Have you examined rcmctrl.log on both sites?

    Torsten Meringer | http://www.mssccmfaq.de

    Tuesday, January 08, 2013 5:20 PM
  • degraded is most often caused by a replication backlog which is typically caused by poor performance. Have you done any performance baselining of the systems, the ones hosting SQL Server specifically, hosting your sites?

    Jason | http://blog.configmgrftw.com

    Tuesday, January 08, 2013 7:26 PM
    Moderator
  • Yes, both logs look exactly the same. Yet both CAS and Primary have the same errors (just at different times.) The primary will degrade then a minute later, be back to normal. 
    Tuesday, January 08, 2013 7:49 PM
  • This defintiely sounds like a perf issue. As mentioned, have you done any perofrmance baselining or troubleshooting?

    Jason | http://blog.configmgrftw.com

    Tuesday, January 08, 2013 8:09 PM
    Moderator
  • I've done some basic troubleshooting. Problem is nothing else is broken; app deployment, inventorying, reporting, imaging...etc are working fine. The only reason this came to light is we were having issues with a task sequence but after re-creating the task sequence, everything worked fine. Do you have any recommendations as to what to look for? We have a lab and a prod server and both were setup exactly the same. The only difference is we deployed SP1 to lab last week (yeah, i know its kinda a big difference.) But have had no "real" issues since. I just am having a hard time finding a starting point because finding an actual error message is what is eluding me. 
    Tuesday, January 08, 2013 9:33 PM
  • It's not that anything is broken, it's that the system is running into a backlog during its SQL replication activities that it consideres beyond a normal/acceptable backlog. This is indicative of some type of perofrmance issue that could be caused by many things including under-powered systems. You can't really ever compare performance between a lab and production because your lab doesn't have production levels or masses of activity in it whereas your production environment is constantly under the pressure of daya-to-day business as usual. Until you do some perofrmance baselining and discovery, you won't be able to tell what is causing this as it could also be mis-configuration, bad drivers, slow network, etc.

    Jason | http://blog.configmgrftw.com

    Tuesday, January 08, 2013 10:25 PM
    Moderator
  • It may be worth running the spdiagdrs stored procedure on each site database.

    • Open SQL Management Studio
    • Connect to the site database
    • Run "exec spdiagdrs"

    In the many results that are returned you will see an "OutgoingMessagesInQueue" and "IncomingMessagesInQueue". This will confirm if you are seeing replication backlogs at the time the link state is degraded. You can also look for the "Status" of each individual "Replication Group" for further detail on which data type is degraded.

    Wednesday, January 09, 2013 2:23 AM
  • Both the 'IncomingMessagesInQueue' and 'OutgoingMessagesInQueue' are both at 0. All of the ReplicationGroup status are Active. And I ran the query right as the site was degraded. Minute later, I ran the query again, and from what I could see, nothing changed. Granted theres a lot of data there so I could have missed something. 

    And to reply to what Jason was saying, the server's are built to the specifications of what MS required. Both lab and corp are, like i said, exactly the same. Our lab environment is, at current moment, getting the same load as our production environment as we have not officially gone live with 2012 in prod. The company I work for, is VERY large. So we need to mimic lab to prod to a T. I'm not currently thinking its the performance of the server that's causing it. 

    Wednesday, January 09, 2013 2:52 PM
  • There are no specifications from Microsoft. There are minimum requirements and even those do not take into account scale, availability, and the many outside factors like network latency, storage IOPS, over-commited VMs, mis-configuration, etc, etc, etc. . Meeting those in no way guarentees acceptable performance as there are so many factors involved. Until you actually do some perofrmance monitoring and baselining there is no way you can possibly say its not a performance issue.


    Jason | http://blog.configmgrftw.com

    Wednesday, January 09, 2013 8:31 PM
    Moderator
  • When I say it was setup to Microsoft's specifications, I mean we had someone from Microsoft come in and design the server infrastructure. Sorry for the confusion. But after upgrading our Prod environment to SP1, we're seeing the same behavior that I was seeing in our lab environment. Not sure what that means exactly but as you suggested I'm going to start doing some performance monitoring. If I cant find anything wrong, I'm going to open a case with Microsoft. Thanks for your help. 

    Scott

    Monday, January 14, 2013 8:30 PM
  • When I say it was setup to Microsoft's specifications, I mean we had someone from Microsoft come in and design the server infrastructure. Sorry for the confusion. But after upgrading our Prod environment to SP1, we're seeing the same behavior that I was seeing in our lab environment. Not sure what that means exactly but as you suggested I'm going to start doing some performance monitoring. If I cant find anything wrong, I'm going to open a case with Microsoft. Thanks for your help. 

    Scott

         Yes, I agree with Scott. After installed SP1 we have the same problem. All situation is identical for my environment. I check performance everything is fine. Any idea to solve it. Scott, let me know if there is a solution, please. A lot of thanks!

    It's was very interesting : http://bbca.ru/2012/06/21/sccm-configmgr-2012-how-to-check-for-backlog/ but it's only analyze and helpful to understand about SCCM replication.... but problem is still running.





    Friday, January 18, 2013 7:48 AM
  • Any ideas?

    Tuesday, January 22, 2013 8:41 AM
  • We are seeing the same problem after upgrading to SP1 aswell: The replication link is going from active to degraded and then back to active every 2-3 hours. We are seeing this in 2 separate environments, one of them on virtual machines, and the other on brand new Gen8 HP servers.

    After searching on the web, it seems alot of people are seeing this problem after SP1 - and it makes me wonder if this is indeed a bug that needs fixing.

    Despite the error messages, the replication does its job, everything works like it should.

    Tuesday, January 22, 2013 2:29 PM
  • Just contact Microsoft CSS if you can reproduce this problem.

    Torsten Meringer | http://www.mssccmfaq.de

    Tuesday, January 22, 2013 2:42 PM
  • I concur with Torsten. If you are confident that this is not a performance issue, then it could possibly be a bug (or configuration issue) and is probably best handled with CSS directly where more hands-on troubleshooting and investigation can be done.

    Jason | http://blog.configmgrftw.com

    Tuesday, January 22, 2013 3:53 PM
    Moderator
  • I agree, I have made a support case and will post here what the result is.
    Tuesday, January 22, 2013 6:34 PM
  • I'd be interested to hear how you get along, we've had the same issue since upgrading to SP1 too.
    Sunday, January 27, 2013 11:49 PM
  • Status on the support case:

    They have receieved logs from both our CAS and Primary Site servers, and are now analyzing them. Expecting a followup tomorrow.

    Monday, January 28, 2013 12:28 AM
  • We've been working a sev c case for almost two weeks with eerily similar symptoms.  Our environment is a side-by-side migration from 07 so I haven't escalated, but at this rate, if we don't have a fix in place tomorrow, we may scrap the whole thing.  Good luck with your case.
    Monday, January 28, 2013 3:07 AM
  • My current customer is experiencing this same issue after we installed SP1, replication degraded status messages followed by active status messages a minute later exactly as described in the opening post.  In this scenario we have migrated data from an old SCCM environment, however the new CAS and 7 primary sites are currently dormant - there are no active clients yet in this environment.

    All databases are hosted locally on the CAS and primaries, all server specs far exceed the minimum recommendations, and all servers and databases show next to no performance load.

    Despite the messages, replication appears to be functioning properly.

    The frequency of folks seeing this same issue points to a bug.


    SCCM\SCOM Aficionado

    Monday, January 28, 2013 9:43 PM
  • Have you opened a support case with Microsoft?

    Jason | http://blog.configmgrftw.com

    Monday, January 28, 2013 9:48 PM
    Moderator
  • Have you opened a support case with Microsoft?

    Jason | http://blog.configmgrftw.com


    Yes we have, they are following up and analyzing logs for us.
    Monday, January 28, 2013 10:34 PM
  • I have opened service request too.
    Tuesday, January 29, 2013 1:35 PM
  • Hi,

    Have you guys found a solution for this?

    Thanks

    Wednesday, February 06, 2013 4:08 AM
  • Degraded Link means any of the replication group is processing the message slower than the defined threshold. An admin can define this threshold according to their requirement in console

    Configuration Manager 2012 Console-> Administration WunderBar-> Hierarchy Configuration-> Database Replication-> Alt site database ->Link Properties

    For Ex- If the threshold set to downgrade the link status to Degraded is 20 and the sync interval for the replication group "Status_Messages" is 5 then the Link will be downgraded to Degraded state if the message is processing for more than 100 Mins (20 * 5)

    You can identify the Sync interval of a replication group by running the below query.

    Select * from Replicationdata (Sync Intervals are either 1,2 or 5)

    You may find the replication group which is in degraded state using the below query.

    SELECT RLS.SiteOwner,RLS.SiteSending,RLS.SiteReceiving,RLS.ReplicationID,rep.ReplicationGroup,RLS.UpdateTime-GETUTCDATE()+GETDATE() [UpdateTime],RLS.StatusName ,rep.ReplicationPattern,RLS.LastSyncFinishTime-GETUTCDATE()+GETDATE() [LastSyncFinishTime],rep.ReplicationPriority FROM dbo.RCM_ReplicationLinkStatus RLS INNER JOIN dbo.ReplicationData rep ON RLS.ReplicationID = rep.ID

    AND RLS.Status = 8

    --

    Steps you may need to check while troubleshooting Degraded Link status

    1. Check the backlog count (replication backlogs)

    2. Check the Message count, Message Data size that is being sent by CAS and received at Primary site. You can find this from DRSSendHistory and DRSRecieveHistory (Normally the degraded state will be when the CAS sent a huge chunk of data to primary)

    3. Check the SQL Server performance

    4. Check the DISK I/O while the link is in degraded status.


    Regards, Rajul






    Wednesday, February 06, 2013 5:12 AM
  • Hi Rajul ! - Do you know what are these settings in CM 2012 RTM version (by default)? If I understand correctly, we don't have DataBase Replication  --> Replication Link properties options in CM 12 RTM version. Do you have any idea where is this configured in RTM version? Only configurable option I can see is under Monitoring --> Database Replication --> Replication Status Properties ? 

    Link :

    Configure Alerts For this Replication Link.

    Generate an Alert when this replication link is not working for a specific period of time.

    Number of Minutes 30 ::


    Anoop C Nair - @anoopmannur :: MY Site:  www.AnoopCNair.com :: FaceBook:  ConfigMgr(SCCM) Page :: Linkedin:  Linkedin<

    Wednesday, February 06, 2013 6:16 AM
  • Hi Anoop,

    Sorry for the confusion here... Probably this might be a new feature added in to SP1 if you are not able to see it in RTM version.

    I am currently looking in to the CM2012 SP1 console and I can see Database Replication in Administration Wunderbar as well as Monitoring wunderbar.


    Regards, Rajul

    Wednesday, February 06, 2013 6:45 AM
  • NP at all ! Yes, you're correct. This is the new feature included in SP1. I just wanted to know whether this setting was there in DB or WMI for RTM version. Probably that info will help us to check whether there is/was any change in the default values.

    Anoop C Nair - @anoopmannur :: MY Site:  www.AnoopCNair.com :: FaceBook:  ConfigMgr(SCCM) Page :: Linkedin:  Linkedin<

    Wednesday, February 06, 2013 7:00 AM
  • Rajul OS, thanks a lot for your answer. But I don't understand, why you are propose as an answer. This it is really a problem! And yes, we check all - everything work fine. The CSS said that it's error in code of threshold.
    Thursday, February 07, 2013 11:27 AM
  • We've had confirmation from Premier Support that this is known by the Product Group and is expected behavior.

    Since the replication does indeed replicate all objects, and the status goes back to Active within a minute, it is not treated as a problem. The link would say Failed if it was a problem with the link replication. The degraded state is based on the amount of retries your site system use on replicating all the replication groups, and that again is based on performance.

    Based on Rajul OS' post, we were able to tune our retry value to the point where the link no longer gets degraded, and the replication still goes 100% okay, without any delay.

    Thursday, February 07, 2013 11:43 AM
  • SELECT RLS.SiteOwner,RLS.SiteSending,RLS.SiteReceiving,RLS.ReplicationID,rep.ReplicationGroup,RLS.UpdateTime-GETUTCDATE()+GETDATE() [UpdateTime],RLS.StatusName ,rep.ReplicationPattern,RLS.LastSyncFinishTime-GETUTCDATE()+GETDATE() [LastSyncFinishTime],rep.ReplicationPriority FROM dbo.RCM_ReplicationLinkStatus RLS INNER JOIN dbo.ReplicationData rep ON RLS.ReplicationID = rep.ID
    
    AND RLS.Status = 8
    Okay, so if I run it, it's show degraded state? My result of query - "Null"! What I must analyze? I check the state by exec spdiagdrs - All  replication group without errors. CSS after analyze my environment did not said about system performance and said about code error.
    Thursday, February 07, 2013 12:12 PM
  • SELECT RLS.SiteOwner,RLS.SiteSending,RLS.SiteReceiving,RLS.ReplicationID,rep.ReplicationGroup,RLS.UpdateTime-GETUTCDATE()+GETDATE() [UpdateTime],RLS.StatusName ,rep.ReplicationPattern,RLS.LastSyncFinishTime-GETUTCDATE()+GETDATE() [LastSyncFinishTime],rep.ReplicationPriority FROM dbo.RCM_ReplicationLinkStatus RLS INNER JOIN dbo.ReplicationData rep ON RLS.ReplicationID = rep.ID
    
    AND RLS.Status = 8
    Okay, so if I run it, it's show degraded state? My result of query - "Null"! What I must analyze? I check the state by exec spdiagdrs - All  replication group without errors. CSS after analyze my environment did not said about system performance and said about code error.

    You will get "null" if you run the query when the link is active. You must wait till it becomes degraded, and then run the query. It will then show you which of the replicationgroups that retries. Based on this, you can do two things:

    1 - Change the amount of retries before the link is degraded to higher than the amount of retries X minutes. (do this under alerts in the properties of the link in the console) We ended up doing this.

    2 - Change the minutes before each retry on the specified replication group (either 1,2 or 5) (this is done directly in the SQL DB, and might not be supported)

    Premier Support did not say this was a code error, so you should ask your CSS to provide background for why they said that.

    Thursday, February 07, 2013 1:55 PM
  • Ola Holtberget,  Thanks!

    But I'll wait for the next response from MS. Cos any official message from MS I can not find, I decide just wait, it's  not critical.

    CSS said, that they have the same trouble and they think it's a code error.

    Thursday, February 07, 2013 4:00 PM
  • Any update on this from MS ?

    I'm having the same issue and cannot seem to be able to tweek my settings to get rid of the replication errors.

    Wednesday, February 20, 2013 3:19 PM
  • Any update on this from MS ?

    I'm having the same issue and cannot seem to be able to tweek my settings to get rid of the replication errors.

    Hi, JohnyLuky!  I look forward to, but no answer for my service request for 3 weeks.

    Thursday, February 21, 2013 10:03 PM
  • Answer from CSS: "It's will be fixed in CU1 for SP1. "
    • Proposed as answer by Ivan Kirianov Wednesday, March 13, 2013 12:11 PM
    Wednesday, March 13, 2013 12:11 PM
  • Thanks for the followup Ivan, I guess they have not said anything about a release date for the CU1...?
    Thursday, March 14, 2013 1:38 PM
  • Unfortunately, the release dates of CU they did not said.
    Friday, March 15, 2013 7:41 AM
  • Site systems

    • Replication Configuration Manager incorrectly reports the link status as Degraded and then reports the status as Active one minute later.

    http://support.microsoft.com/kb/2817245/en-us

    • Proposed as answer by Ivan Kirianov Tuesday, April 02, 2013 6:56 AM
    Monday, March 25, 2013 8:05 PM
  • I still see the same issue in ConfigMgr 2012 R2 CU3 - I guess this hotfix should be included already, isn't it ?
    Wednesday, January 14, 2015 1:44 AM