locked
Fresh WSUS 2016 core server crashes on synchronization RRS feed

  • Question

  • I am at wit's end with this currently. I have completely redone a fresh WSUS server on 2016 core 4 times now. The WSUS content folder is on a separate drive, we are using the WID-Db. I have tested a lot of scenarios and WSUS is fine until I do my firs synchronization. At that point I see streaming app service failures like in this post https://social.technet.microsoft.com/Forums/en-US/67c08a8a-f798-45ca-be37-560dbdc8d7e4/the-wsus-content-directory-is-not-accessible?forum=winserverwsus. I have also verified the permissions of all registry keys, shares, and files as referenced in these articles https://technet.microsoft.com/en-us/library/cc708545%28v=ws.10%29.aspx?f=255&MSPPError=-2147217396 and the 2 appendices. an IIS restart brings the site back up but it will simply crash again after a time even if I no longer attempt a synchronization. All the documentation I have read says WSUS should be able to support much larger environments than ours. Any help would be much appreciated.

    Success is a lousy teacher. It seduces smart people into thinking they can't lose. -Bill Gates

    Monday, May 8, 2017 8:51 PM

Answers

  • ok, so what have you got your private bytes mem limits set to be?

    (the defaults, would be bad, according to lots of recent discussion :(


    Don [doesn't work for MSFT, and they're probably glad about that ;]

    • Marked as answer by MicroRich Thursday, May 11, 2017 10:08 PM
    Wednesday, May 10, 2017 8:37 AM

All replies

  • after you installed the Feature for WSUS, (something like this)

    Install-WindowsFeature -Name UpdateServices -IncludeManagementTools

    , did you then create a content_dir on a "local" drive;

    New
    -Item -Path C: -Name WSUS -ItemType Directory


    ,and did you then perform postinstall?

    CD "C:\Program Files\Update Services\Tools"

    .\wsusutil.exe postinstall CONTENT_DIR=C:\WSUS

    what is being recorded in the event logs, from the checkhealth threads of UpdateServices?

    what is being recorded in the logfiles for WSUS c:\program files\update services\logfiles\

    are you getting http503 errors as per the articles you linked, or, some other errors logged/returned?

    PS: there are no mentions of "streaming" in any of the discussions/articles you linked ?


    Don [doesn't work for MSFT, and they're probably glad about that ;]

    Monday, May 8, 2017 9:28 PM
  • Thanks for the reply. Yes I installed WSUS via powershell and did run the post install. in this case I used E:\WSUS. The final result is a 503 but I am not sure what is causing it. I think the 503 is a symptom of IIS crashing more that the files actually being unreachable.

    It starts with EventID 5002 "Application pool 'WsusPool' is being automatically disabled due to a series of failures in the process(es) serving that application pool." About 1 minute later I see rather generic failures for each of the web services such as self-update, API Remoting, Client Web... ( this is what I meant by streaming App Service failures) These events simply state the failure. Time stamps show that all of the events occur the same second along with EventID 12072 - "The WSUS Content directory is inaccessible" (503

    The Change.log file in c:\program files\update services\logfiles\ shows all success. I can see where I initiated Synchronization and underneath is all success. The final entry is that it did retry a download

    2017-05-08 21:35:17.512 UTC     Successfully deployed deployment(Decline) of Security Update for Adobe Flash Player for Windows Server 2012 R2 (KB3144756) UpdateID:5A4558CD-006F-481A-AFDE-EC1E9575F80F Revision Number:202
    2017-05-08 21:35:19.080 UTC     Downloading retried

    Note I have not approved any updates for any servers yet so the decline is expected.

    SoftwareDistribution Log shows all good until the download retry. Then the first 503 occurs

    2017-05-08 21:35:19.043 UTC     Info    WsusService.35  CatalogSyncAgentCore.StoreUpdateDecryptionData  Successfully imported DecryptionData for UpdateId: cfb44dca-6a28-46ec-ab1e-eee3613521c7/201
    2017-05-08 21:35:19.043 UTC     Info    WsusService.35  CatalogSyncAgentCore.GetAndSaveUpdateMetadata   Total actually inserted updates: 20396
    2017-05-08 21:35:19.080 UTC     Change  WsusService.35  AdminDataAccess.ExecuteSPResumeAllDownloads     Downloading retried
    2017-05-08 21:35:19.106 UTC     Info    WsusService.35  CatalogSyncAgentCore.UpdateSyncResultAndGenerateReportingEvent  CatalogSyncThreadProcess: report subscription succeeded
    2017-05-08 21:35:19.106 UTC     Info    WsusService.35  EventLogEventReporter.ReportEvent       EventId=384,Type=Information,Category=Synchronization,Message=Synchronization completed successfully.
    2017-05-08 21:35:19.122 UTC     Info    WsusService.22  ThreadEntry     ThreadHelper.ThreadStart
    2017-05-08 21:35:19.122 UTC     Info    WsusService.22  CatalogSyncAgent.WaitUntilSyncFinishedOrCancelled       Agent signalled done.
    2017-05-08 21:35:19.137 UTC     Info    WsusService.22  CatalogSyncAgent.SetSubscriptionStateWithRetry  Firing event SyncFinish...
    2017-05-08 21:35:19.153 UTC     Info    WsusService.22  CatalogSyncAgent.WakeUpWorkerThreadProc Found no more jobs. CatalogSyncAgent quits but will run rollup before terminating ...
    2017-05-08 21:35:19.153 UTC     Info    WsusService.22  CatalogSyncAgent.UpdateServerHealthStatusBasedOnError   ServerHealth: Updating Server Health for Component: CatalogSyncAgent, Marking as Not Running
    2017-05-08 21:35:22.647 UTC     Info    WsusService.8   SusEventDispatcher.TriggerEvent TriggerEvent called for NotificationEventName: RollupAgent, EventInfo:
    2017-05-08 21:35:22.647 UTC     Info    WsusService.20  SusEventDispatcher.DispatchManagerWorkerThreadProc      DispatchManager Worker Thread Processing NotificationEvent: RollupAgent
    2017-05-08 21:35:22.647 UTC     Info    WsusService.8   SusEventDispatcher.TriggerEvent TriggerEvent called for NotificationEventName: ConfigurationChange, EventInfo: ConfigurationChange
    2017-05-08 21:35:22.662 UTC     Info    WsusService.34  SusEventDispatcher.DispatchManagerWorkerThreadProc      DispatchManager Worker Thread Processing NotificationEvent: ConfigurationChange
    2017-05-08 21:35:22.678 UTC     Info    WsusService.34  ServerCertificateValidator.ConfigChangedHandler Update server configuration has changed. Sync against MU: True
    2017-05-08 21:35:23.907 UTC     Info    WsusService.3   RollupEventReporter.BuildReportingServiceUrl    Found config says USS is MU site
    2017-05-08 21:35:24.204 UTC     Info    WsusService.3   MicrosoftUpdateRollup.ReportEvents      Rolling up 1152 events.
    2017-05-08 21:35:24.376 UTC     Info    WsusService.3   SusService.ValidateServerCertificate    CheckValidationResult Succeeds: CertOK
    2017-05-08 21:35:24.376 UTC     Info    WsusService.3   ServerCertificateValidator.IsHostAllowedException       Requested host: statsfe2.update.microsoft.com
    2017-05-08 21:35:24.376 UTC     Info    WsusService.3   ServerCertificateValidator.IsHostAllowedException       Ignoring SSL validation for a well-know host.
    2017-05-08 21:35:42.492 UTC     Info    WsusService.3   RollupEventReader.SetLastRollupComplete Finished Rollup.  Setting time anchor in local DB as 5/8/2017 9:35:19 PM
    2017-05-08 21:43:34.341 UTC     Error   WsusService.9   HmtWebServices.CheckReportingWebService Reporting WebService WebException:System.Net.WebException: The request failed with HTTP status 503: Service Unavailable.
       at System.Web.Services.Protocols.SoapHttpClientProtocol.ReadResponse(SoapClientMessage message, WebResponse response, Stream responseStream, Boolean asyncCall)
       at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String methodName, Object[] parameters)
       at Microsoft.UpdateServices.Reporting.WebService.Ping(Int32 pingLevel)
       at Microsoft.UpdateServices.Internal.HealthMonitoring.HmtWebServices.CheckReportingWebService(EventLoggingType type, HealthEventLogger logger)
       at Microsoft.UpdateServices.Internal.HealthMonitoring.HmtWebServices.CheckReportingWebService(EventLoggingType type, HealthEventLogger logger)
       at Microsoft.UpdateServices.Internal.HealthMonitoring.HealthMonitoringTasks.ExecuteSubtask(HealthMonitoringSubtask subtask, EventLoggingType type, HealthEventLogger logger)
       at Microsoft.UpdateServices.Internal.HealthMonitoring.HmtWebServices.Execute(EventLoggingType type)
       at Microsoft.UpdateServices.Internal.HealthMonitoring.HealthMonitoringTasks.Execute(EventLoggingType type)
       at Microsoft.UpdateServices.Internal.HealthMonitoring.HealthMonitoringThreadManager.Execute(Boolean waitIfNecessary, EventLoggingType loggingType)
       at Microsoft.UpdateServices.Internal.HealthMonitoring.HealthMonitoringThreadManager.PeriodicTask()
       at Microsoft.UpdateServices.Internal.HealthMonitoring.PeriodicTaskExecuter.ThreadStart()
       at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Threading.ThreadHelper.ThreadStart()


    Success is a lousy teacher. It seduces smart people into thinking they can't lose. -Bill Gates

    Monday, May 8, 2017 9:57 PM
  • UPDATE:

    Server ran fine before initial sync for multiple days. Sync was finally able to complete after after IIS reset and launching from powershell "(Get-WsusServer).GetSubscription().StartSynchronization()" This took a few attempts but I can see it did complete a sync in the WSUS GUI.

    The IIS site will crash now after a few minutes with no interaction via cli or MMC console. I watched the logs in realtime via splunk. I can paste in a CSV but I think this summary should provide enough info if anyone else has seen something similar. I scrubbed host names and any usernames. Basically minutes after the reset is complete a worker process for 'WSUSPool' requests a recycle. until it finally just gets disabled. 

    Timeline starts from the bottom and moves up the list

    LogName=System
    SourceName=Microsoft-Windows-WAS
    EventCode=5002
    EventType=2
    Type=Error
    TaskCategory=The operation completed successfully.
    OpCode=The operation completed successfully.
    RecordNumber=1887
    Keywords=Classic
    Message=Application pool 'WsusPool' is being automatically disabled due to a series of failures in the process(es) serving that application pool.","2017-05-09T14:01:22.000-0700"

    LogName=System
    SourceName=Microsoft-Windows-WAS
    EventCode=5117
    EventType=4
    Type=Information
    TaskCategory=The operation completed successfully.
    OpCode=The operation completed successfully.
    RecordNumber=1886
    Keywords=Classic
    Message=A worker process serving application pool 'WsusPool' has requested a recycle because it reached its private bytes memory limit.","2017-05-09T14:00:22.000-0700"

    LogName=System
    SourceName=Microsoft-Windows-WAS
    EventCode=5117
    EventType=4
    Type=Information
    TaskCategory=The operation completed successfully.
    OpCode=The operation completed successfully.
    RecordNumber=1885
    Keywords=Classic
    Message=A worker process serving application pool 'WsusPool' has requested a recycle because it reached its private bytes memory limit.","2017-05-09T13:59:22.000-0700"

    LogName=System
    SourceName=Microsoft-Windows-WAS
    EventCode=5117
    EventType=4
    Type=Information
    TaskCategory=The operation completed successfully.
    OpCode=The operation completed successfully.
    RecordNumber=1884
    Keywords=Classic
    Message=A worker process serving application pool 'WsusPool' has requested a recycle because it reached its private bytes memory limit.","2017-05-09T13:58:22.000-0700"

    LogName=System
    SourceName=Microsoft-Windows-WAS
    EventCode=5117
    EventType=4
    Type=Information
    TaskCategory=The operation completed successfully.
    OpCode=The operation completed successfully.
    RecordNumber=1883
    Keywords=Classic
    Message=A worker process serving application pool 'WsusPool' has requested a recycle because it reached its private bytes memory limit.","2017-05-09T13:57:22.000-0700"

    LogName=System
    SourceName=Microsoft-Windows-IIS-IISReset
    EventCode=3201
    EventType=4
    Type=Information
    TaskCategory=The operation completed successfully.
    OpCode=The operation completed successfully.
    RecordNumber=1881
    Keywords=Classic
    Message=IIS start command received from user . The logged data is the status code.","2017-05-09T13:56:58.000-0700"

    LogName=System
    SourceName=Microsoft-Windows-Service Control Manager
    EventCode=7036
    EventType=4
    Type=Information
    TaskCategory=The operation completed successfully.
    OpCode=The operation completed successfully.
    RecordNumber=1882
    Keywords=Classic
    Message=The IIS Admin Service service entered the running state.","2017-05-09T13:56:57.000-0700"

    LogName=System
    SourceName=Microsoft-Windows-Service Control Manager
    EventCode=7036
    EventType=4
    Type=Information
    TaskCategory=The operation completed successfully.
    OpCode=The operation completed successfully.
    RecordNumber=1880
    Keywords=Classic
    Message=The World Wide Web Publishing Service service entered the running state.","2017-05-09T13:56:56.000-0700"

    LogName=System
    SourceName=Microsoft-Windows-Service Control Manager
    EventCode=7036
    EventType=4
    Type=Information
    TaskCategory=The operation completed successfully.
    OpCode=The operation completed successfully.
    RecordNumber=1879
    Keywords=Classic
    Message=The Windows Process Activation Service service entered the running state.","2017-05-09T13:56:56.000-0700"

    LogName=System
    SourceName=Microsoft-Windows-WAS
    EventCode=5211
    EventType=4
    Type=Information
    TaskCategory=The operation completed successfully.
    OpCode=The operation completed successfully.
    RecordNumber=1878
    Keywords=Classic
    Message=The Windows Process Activation Service (WAS) started with 'Classic' mode using 'ConfigurationSystem'","2017-05-09T13:56:56.000-0700"

    LogName=System
    SourceName=Microsoft-Windows-IIS-IISReset
    EventCode=3202
    EventType=4
    Type=Information
    TaskCategory=The operation completed successfully.
    OpCode=The operation completed successfully.
    RecordNumber=1877
    Keywords=Classic
    Message=IIS stop command received from user . The logged data is the status code.","2017-05-09T13:56:56.000-0700"


    LogName=System
    SourceName=Microsoft-Windows-Service Control Manager
    EventCode=7036
    EventType=4
    Type=Information
    TaskCategory=The operation completed successfully.
    OpCode=The operation completed successfully.
    RecordNumber=1876
    Keywords=Classic
    Message=The IIS Admin Service service entered the stopped state.","2017-05-09T13:56:54.000-0700"

    LogName=System
    SourceName=Microsoft-Windows-Service Control Manager
    EventCode=7036
    EventType=4
    Type=Information
    TaskCategory=The operation completed successfully.
    OpCode=The operation completed successfully.
    RecordNumber=1875
    Keywords=Classic
    Message=The Windows Process Activation Service service entered the stopped state.","2017-05-09T13:56:51.000-0700"

    LogName=System
    SourceName=Microsoft-Windows-Service Control Manager
    EventCode=7036
    Type=Information
    TaskCategory=The operation completed successfully.
    OpCode=The operation completed successfully.
    Message=The World Wide Web Publishing Service service entered the stopped state.","2017-05-09T13:56:49.000-0700"


    Success is a lousy teacher. It seduces smart people into thinking they can't lose. -Bill Gates

    Tuesday, May 9, 2017 9:55 PM
  • ok, so what have you got your private bytes mem limits set to be?

    (the defaults, would be bad, according to lots of recent discussion :(


    Don [doesn't work for MSFT, and they're probably glad about that ;]

    • Marked as answer by MicroRich Thursday, May 11, 2017 10:08 PM
    Wednesday, May 10, 2017 8:37 AM
  • This did help the issue. I checked the Content folder and that was set correctly. I had trouble finding how to set the private bytes with powershell so I enabled remote IIS management. I set the byte limit to 6GB and the site will still crash. Setting it to 0 (unlimited) has made it so the site does not crash but I did have to up to RAM for the this machine to 16GB in order for it to not max out.The VM currently has 8 vCpus that max out at 100% during many queries causing a very slow response.

    I am reluctant to change the CPU Limit property from unlimited. Have you had any experience with this in the past? All of this seems like a bug if one server should be able to support 10,000 clients. We have roughly 500 and they only check in once ever 24 hours so the load should not be that high.


    Success is a lousy teacher. It seduces smart people into thinking they can't lose. -Bill Gates

    Wednesday, May 10, 2017 5:55 PM
  • I've no experience with WSUS on WS2016, but yes we've had to increase memory on servers and memory limits in the past.

    WSUS itself (as a product) can handle more than 30000 clients, well, it could comfortably do that back when the update catalogs weren't quite so bloated with as many products, each product having many less updates in each product.

    The catalog has grown enormously in recent years. This has caused lots of issues especially at the client-side, leading MSFT to re-tune the WUAgent many times in the last 2 years so that clients like 32bit Win7 can handle the huge catalog size.

    We've found that adding vCPU didn't help a lot, but adding vRAM and tuning the IIS settings made the most improvement.

    Also note that when you reconfigure clients to be on a different WSUS, the 'switch' to the new WSUS causes those clients to perform a full re-scan of the catalog. This places significant load on both client and WSUS until everything settles down and then the clients go back to doing delta scans (which is much less effort for both client and server), so if you 'switch' a lot of clients at once, you increase the load massively, bottlenecks etc occur.

    The 10000 client capacity, is an older figure, from back when the catalog was much smaller. It also dates back to before WSUS was doing complex things like handling encrypted ESD payloads for Win10 feature upgrades etc.

    These days, it pays to keep a tight rein on the products and classifications you are syncing to your WSUS so as to keep your catalog size down.

    Both IIS and SQL/WID are memory-hungry beasts.

    Also, regular (monthly) WSUS cleanups and reindexing/defragging of the SUSDB are almost mandatory these days due to the quantity of updates churning through. This should ease in the near future due to adoption of rollups/cumulative updates, retiring of Vista/WS2008, etc (well the growth in quantity of updates should drop off anyway ;)


    Don [doesn't work for MSFT, and they're probably glad about that ;]

    Wednesday, May 10, 2017 9:05 PM
  • Thanks Don! I ran a few different test scenarios on the server and in our case using 4 worker processes instead of one helped to keep things more stable. We are using dynamically allocated RAM and I set the max to 32GB. It is currently using 20GB of that. There is definitely a longer process happening as I will approve updates and then the net day come in to see others that are needed but not yet approved.

    I can understand how the catalog is huge now. Even our small company still has workstations running windows 7 and Office 2010 etc... so I still want to grab those updates. IMO Microsoft really needs to overhaul their entire update mechanism, which I think they are trying to do going forward. I would like to see a better push architecture implemented instead of us having to script solutions. Also, I think going form a catalog to separate repository model could help lighten some of the load. </EndRant>

    Just wondering, do you think that setting up downstream servers and using an external SQL DB would help performance?


    Success is a lousy teacher. It seduces smart people into thinking they can't lose. -Bill Gates

    Thursday, May 11, 2017 5:35 PM
  • DSS can be a different type of anguish - I don't think it's a good solution to your apparent problem.

    I think you'll get more mileage from tinkering with your memory aspects, assuming that you aren't suffering threads/cores limitations nor network/IO limitations nor storage IOPS limitations ?

    remote SQL adds cost and complexity and actually introduces extra strain on network IO.

    for a smallish estate (less than 2000 machines) I can't imagine remote SQL would ever be needed.

    one of our environments has a single WS2008R2 WSUS running 8GB vRAM and 2vCPU. this serves 35,000 Win7 32bit clients fairly well, almost all of the time.

    I'm increasingly distressed by the sad stories being told about WSUS on WS2016... :(


    Don [doesn't work for MSFT, and they're probably glad about that ;]

    Thursday, May 11, 2017 9:34 PM
  • After a few days now resource usage has stabilized. I think it was all of the clients re-scanning the full catalog that was causing dramatic spikes. Still running around 20GB of RAM but CPU usage has gone back to normal ~ %20 when deploying updates. Thanks for your help!

    Success is a lousy teacher. It seduces smart people into thinking they can't lose. -Bill Gates

    Thursday, May 11, 2017 10:10 PM