locked
Windows services and CMS replication stopping on SFB FE servers RRS feed

  • Question

  • Hello, 

    I have a topology with 3x SFB FE servers, 2x SFB Edge servers and various other server roles. All servers are running on Windows Server 2012 R2. All SFB features and services are running as expected with the exception of the following issue:

    Occasionally on all FE servers, the following Windows services stop working: 

    1) Server

    2) Task Scheduler

    3) Themes

    As a result, the SFB CMS replication stops working as the SFB scheduled tasks are not working. Is this a known issue? Do you have any ideas or thoughts as to what might be wrong? The issue occurs randomly and there are no specific event log errors in the SFB FE servers to help pinpoint the root cause.


    Stefanos Evangelou

    Tuesday, July 4, 2017 1:43 PM

All replies

  • Do you have any events put in on event viewer? I have never seen this happening. There should be some indication of what's going on in the event viewer.

    http://thamaraw.com

    Wednesday, July 5, 2017 12:48 AM
  • Hi Stefanos Evangelou,

    Based on my research, there is no document describes that it is an know issue.

    For this problem, in addition to the suggestions provided by Thamara.Wijesinghe, we suggest you install the latest update for your SFB FE server.

    Here is aa article about how to troubleshooting SFB CMS, please refer to

    https://ocsguy.com/2011/09/07/troubleshooting-cms-replication/

    Note: Microsoft is providing this information as a convenience to you. The sites are not controlled by Microsoft. Microsoft cannot make any representations regarding the quality, safety, or suitability of any software or information found there. Please make sure that you completely understand the risk before retrieving any suggestions from the above link.


    Regards,

    Alice Wang


    Please remember to mark the replies as an answers if they help and unmark them if they provide no help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    • Proposed as answer by Alice-Wang Monday, July 10, 2017 10:00 AM
    Wednesday, July 5, 2017 2:57 AM
  • SMB services should be enabled on SFB servers and between server 445 ports should be accessible.And you may check the source for each server replication and find out which server causing issues.May be enabling loggin will help you to find out more also check in Lync server event logs

    Jayakumar K

    Wednesday, July 5, 2017 6:59 AM
  • Hello, 

    Thank you all for the feedback.

    I have not been able to pinpoint an obvious issue yet by checking the windows server event logs. I have updated the installation to latest SFB version and will use SFB logging/tracing while I keep monitoring the behaviour of the topology throughout the week. I will share any updates in this thread. 


    Stefanos Evangelou

    Monday, July 24, 2017 6:59 AM
  • Hi Stefanos,

    We will waiting for your response.


    Regards,

    Alice Wang


    Please remember to mark the replies as an answers if they help and unmark them if they provide no help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Monday, July 24, 2017 7:53 AM
  • Hello, 

    After all SFB servers had latest SFB server patch and Windows updates installed and rebooted, the issue seems to persist. The following services occassionally (every 3-4 days or every week) stop on all SFB FE servers:
    1) Server service
    2) Task scheduler service
    3) Themes service

    The following event is persistently logged in all SFB FE servers "Lync Server" event log: 
    --------------------------------------------------------------------------------------------
    1) Error 3036 (LS Replica Replicator Agent Service): Failed to initialize Windows Task Scheduler task for replication of certificates from the central management store to the local machine. Skype for Business Server 2015, Replica Replicator Agent will continuously attempt to re-initialize the task. While this condition persists, no replication of the certificates from the central management store to the local machine will be done. Exception: System.Runtime.InteropServices.COMException (0x800706AB): The network address is invalid. (Exception from HRESULT: 0x800706AB)    at Microsoft.Rtc.Internal.JobsInterop.ITaskService.Connect(Object serverName, Object user, Object domain, Object password)    at Microsoft.Rtc.Xds.Replication.Replicator.Replica.CMSCertificateReplicator.InitializeTaskScheduler()

    Cause: Windows Task Scheduler may not be running or certificate replication task may have been deleted or disabled.
    Resolution:Ensure that Windows Task Scheduler service is running and certificate replication task is enabled.


    The above event seems to be the symptom of the issue (CMS replication failing). 

    The root cause of the issue seems to be related to Windows services stopping without apparent reason on all SFB FE servers. 


    The following event is persistently logged in the Application Event log around the time that the Windows services fail on all SFB FE servers: 
    -------------------------------------------------------------------------------------------------------------------------
    1) Error 1000: Application Error:  Faulting application name: svchost.exe_DsmSvc, version: 6.3.9600.17415, time stamp: 0x54504177
    Faulting module name: DeviceDriverRetrievalClient.dll, version: 6.3.9600.17415, time stamp: 0x54504c0f
    Exception code: 0xc0000005
    Fault offset: 0x0000000000004aba
    Faulting process id: 0x191e4
    Faulting application start time: 0x01d308cb92fb2ecb
    Faulting application path: C:\Windows\system32\svchost.exe
    Faulting module path: C:\Windows\System32\DeviceDriverRetrievalClient.dll
    Report Id: 3d1ce318-74bf-11e7-80e7-001dd8b71c70
    Faulting package full name: 
    Faulting package-relative application ID: 


    The following event is persistently logged in the System Event log around the time that the Windows services fail on all SFB FE servers: 
    -------------------------------------------------------------------------------------------------------------------------
    1) Error 7031: Service Control Manager:    The Task Scheduler service terminated unexpectedly.  It has done this 1 time(s).  The following corrective action will be taken in 60000 milliseconds: Restart the service.


    I need to determine the root cause of the above events with Windows Services stopping. There are no other event logs in the Windows server event logs which are relevant. 


    I will continue my investigation with more detailed analysis of the Skype for Business centralized logging service and get back with the results. I will also run sfc /scannow on all SFB FE servers.

    In the meantime, do you have any ideas or suggestions based on the above findings up to this point?


    Stefanos Evangelou

    Monday, July 31, 2017 9:23 AM
  • I also ran sfc /scannow in all SFB FE servers and the result I received in all is "Windows Resource Protection did not find any integrity violations". 

    I will continue with SFB CLS logs and get back with an update. Please review previous post for any possible thoughts or feedback. Thank you.


    Stefanos Evangelou

    Monday, July 31, 2017 10:00 AM
  • I will do some research for it
    Thursday, August 3, 2017 8:23 AM
  • Looks like a certificate, Firewall or service unavailable  (down) issue to me. All your Lync FE certificates are valid? Launch deployment wizard and check the certificate, and publish topology, if there is an issue, it will prompt you. 
    Thursday, August 3, 2017 10:01 AM
  • Have you had any success in finding the root cause of this?  I spun up a new 2012 R2 server on my VM host (qemu kvm on CentOS) a couple of nights ago and after getting it all up-to-date I have started having these exact same symptoms.  I am not using SFB, however.

    The same servics seem to frequently crash as a result of svchost.exe crashing which is happening due to DeviceDriverRetrievalClient.dll.  From what I can tell, this is a Windows module that gathers metadata for devices.  This continues to crash every few minutes causing the same services (Server, Themes, IPHelper, and sometimes additional services) to crash.  Windows will usually restart them after a few minutes, but then they crash again.  Task scheduler is unreliable since it doesn't run tasks that were missed when the service was stopped.

    I have three other 2012 R2 servers running on this same host that were installed with the same ISO and use the same device drivers and are all up-to-date.  I thought maybe something went wrong during install/update, so I created a new disk image and rebuilt the server only to have the same results.

    I found your post as it's the only thing I've been able to find related to this problem on the internet.  Hoping you've made some progress in tracking down the cause of this.

    Monday, September 18, 2017 2:36 AM
  • Hello jecal22

    Unfortunately I am still having this behavior and I need to manually run a script to check the status of the failing services and start them at predefined time intervals. I have double checked the status of my SFB topology and there are no relevant issues there. The problem seems to be Windows Server related and your own case seems to be confirming this. 

    I will perform a couple more tests and event log investigation and send you an update in due time.

    Based on the history of this thread, I would like to ask the Microsoft community if this is a known case with Windows server 2012 R2 and if there is a known fix/patch to resolve it?


    Stefanos Evangelou

    Monday, September 18, 2017 11:07 AM
  • Hmm.  I think I may have fixed it on mine.  I had a GPO applied to my non-domain controller servers that had enabled using Windows Updates for automatic device driver installation.  This was from something I was doing quite a few years ago.  I removed the policy and set the device installation setting to never use Windows Updates automatically, which should be the default when no policy is set to override it, and now after a couple of reboots I haven't seen the Application Error and all my services startup and continue running. 

    It's only been a few minutes, but I'll report back tomorrow and confirm if it remains stable.  You  might go ahead and check to see what yours is set to.  



    • Edited by jecal22 Monday, September 18, 2017 10:53 PM
    • Proposed as answer by Philip ElderMVP Wednesday, November 1, 2017 5:00 PM
    Monday, September 18, 2017 10:52 PM
  • Hello, 

    In my case the services tend to stop at random times but not necessarily on a daily basis. There are cases in which the service are up and running for 7 or 10 days and then stop and need to be re-started manually. I am applying Windows updates to SFB servers only manually during scheduled server maintenance windows, in parallel with SFB server updates. 

    Please let us know if you discover anything after your further investigation in your environment.


    Stefanos Evangelou

    Tuesday, September 19, 2017 9:30 AM
  • We've had GPO settings in place for years now telling client systems what to do and where to go for device drivers and in what order.

    It's only recently that a brand new greenfield network on 2012 R2 and an existing 2012 R2 network we set up years ago started seeing the Server and other SVCHost.exe based services crashing.

    I blogged about it and the fix here: Error Fix: Event 7034 Service Control Manager.


    Philip Elder Microsoft Cluster MVP Blog: http://blog.mpecsinc.ca

    Thursday, November 2, 2017 1:50 AM