none
wsusutil.exe reset on replica server causes it to be stuck at max cpu in sql indefinately, help!

    Question

  • We noticed that our WSUS content folder was growing out of control on the master and downstream replicas.  The decision was made to delete everything in the content folders on all WSUS servers and issue a wsusutil.exe RESET command on all WSUS servers as well as change every patch to UNAPPROVED on the master server.

    Everything worked great very quickly on the master which appears to be functioning fine but the downstream replicas are stuck.  The CPU is maxed on a core and WSUS is running queries indefinately.  There is absolutely nothing in the wsus content folder on these replica servers now and we have not approved a single patch after setting them all back to unapproved.  What is going on and how do I fix this?

    Here is what is running in SQL on SUSDB:

    SUSDB.dbo.spResetStateMachineAndReEvaluate;1 This query is running

    SUSDB.dbo.spGetRevisionInfo;1 This query is blocked by the first one

    SUSDB.dbo.spGetRevisionIdListForCache;1 This query is blocked by the first one

    The downstream replicas are also not able to sync with the master while this is going on now.

    How do I fix the downstream replicas?


    We are running WSUS fully patched on 2008 R2 for both master and downstream servers.  The SUSDB is running on SQL 2008 R2 fully up to date not sql express or windows database.
    • Edited by davidb1234 Friday, August 01, 2014 3:33 PM
    Friday, August 01, 2014 2:20 PM

Answers

  • I unapproved the content BEFORE deleting the updates and performing the WSUSUTIL.EXE RESET on all of the wsus servers.

    No, you didn't do this on *ALL* of the WSUS servers, you only removed the approvals on the UPSTREAM server only. You *cannot* remove approvals from a replica server, you must *synchronize* the downstream replica server to get the removal of those approvals.

    Furthermore, I promise you, if you had tried to synchronize the removal of thousands of approvals, the synchronization would have failed miserably. (This is also a known issue.)

    I still have no idea what the downstream servers were doing and why only the 2008 R2 downstream servers were impacted

    I told you exactly what the downstream servers were doing and exactly why they were impacted. You didn't remove the approvals, the servers still had thousands of approvals, and then you launched a WSUSUTIL RESET, which will take *DAYS* to complete on a server with thousands of approvals.

    The 2012 downstream replica was just fine with this procedure.

    Because.. no doubt... the WS2012 downstream server is NEWER, probably does not have all of the JUNK that the WS2008R2 servers have, and you just got lucky that the RESET ran a boatload faster on the WS2012 server than it did on the WS2008R2 servers. (Maybe even they have better hardware?)

    Also we had WSUS set to decline superseded updates, etc as part of best practices.

    I'm not quite sure what you mean by this. There is no option to automatically decline superseded updates. The only way you can get superseded updates declined is to run the Server Cleanup Wizard, and even then those updates will only be declined if you have manually removed the APPROVALS from those updates (or the updates are EXPIRED).

    Superseded updates with approvals are IGNORED by the Server Cleanup Wizard!

    we ran those utilities and reindex every few months.

    And the Server Cleanup Wizard did virtually nothing for you because you used it improperly.

    As for the database maintenance, the recommended interval is monthly, and even then, you'll only get marginal improvement from the reindexing if you're not also defragmenting the filesystem hosting the database files. If the database is spread across hundreds of disk clusters, the best you can ever hope for a database index is that it, too, will be spread across hundreds of disk clusters.

    I suggest the following maintenance procedure to be run monthly following your patch deployment activities:

    1. Removing Approvals
    2. Running the Server Cleanup Wizard.
    3. Defragmenting the filesystem (with the database service STOPPED).
    4. Reindexing the database

    I don't go back and manually DECLINE updates that I have previously approved. I let wsus do it as part of maintenance.

    And I'm telling you that WSUS does not do that! Unless you have manually REMOVED APPROVALS from those updates, the Server Cleanup Wizard DOES NOT decline those updates.

    The content folder still got to be over 70GB over a few years of service

    The content folder got to be 70GB for exactly ONE reason, and ONE reason only:

    The SCW could not DELETE FILES, because those FILES were associated with NOT-DECLINED updates that are still approved. It's just that simple.

    but I have a feeling in a couple years I'll be right back where I started.

    Yes, you will, unless you change your current procedures. I absolutely promise you that will happen, and not "in a couple of years", but probably before Christmas 2014!


    Lawrence Garvin, M.S., MCSA, MCITP:EA, MCDBA
    SolarWinds Head Geek
    Microsoft MVP - Software Packaging, Deployment & Servicing (2005-2014)
    My MVP Profile: http://mvp.microsoft.com/en-us/mvp/Lawrence%20R%20Garvin-32101
    http://www.solarwinds.com/gotmicrosoft
    The views expressed on this post are mine and do not necessarily reflect the views of SolarWinds.


    Wednesday, August 06, 2014 10:14 PM
  • I am still unclear on what my mistake was.

    Fair enough. Let me try to explain further based on my understanding:

    You removed approvals from the upstream server, and apparently, assumed that action would be immediately effective on the downstream servers. It's not. Adding/Removing approvals are events just like anything else. Events must be synchronized. Unfortunately, as we've all learned from this forum over the past year, downstream servers (especially downstream servers lacking good maintenance) do not do well with large amounts of events to synchronize. So had you attempted to sync the removal of hundreds (thousands) of approvals, the downstream server would have choked all over that attempt. So, given that fact, and knowing that you cannot remove approvals from a replica server, the simple conclusion is that approval were not removed from the downstream server.

    Now, combine that with what a WSUSUTIL RESET does. That utility scans and reconciles every approved update with the content of the filesystem, to delete files no longer needed, or re-queue for download any files that are missing. The key here is the issue of scanning every approved update, and if there are thousands of approved updates, that task (particularly so on a machine lacking good maintenance) takes a Very Long Time. Then, to further complicate things, you purged all of the file content from those servers, which means none of it was found, which means every one of those approved updates scanned, also triggered an external call to create a queue item in the BITS task queue (taking even more time from the reset process).

    In the end, doing these actions in this manner does absolutely nothing of value, and simply ties up the server for several days, unable to service clients at all.

    I think you are saying there is no automated way to remove old updates and that it is a manual procedure and that the cleanup wizard will not do this.

    What I'm saying, which is exactly the same thing stated in the Server Cleanup Wizard dialog -- the Server Cleanup Wizard only processes NOT APPROVED updates.

    Can you point me to documentation on the correct procedure?

    I don't know that the "correct procedure" is explicitly documented, but the documentation does say the same thing as the dialog with respect to the update must be "Not Approved".

    http://technet.microsoft.com/en-us/library/dd939856(v=ws.10).aspx

    It's pretty much a Boolean thing. If the update has not been set back to Not Approved, nothing does anything to it.

    The only things that get declined "automatically" are superseded updates that were *never* approved to begin with (which also, btw, would have never had file content associated with them), and thus that activity (declining never approved updates) results in exactly *zero* files being deleted from the filesystem.

    I've never seen any WSUS documentation that said to do anything other than run the cleanup wizard and reindex the database but that's not your fault.

    Yeah, it doesn't say a lot of things that actually need to be done in the course of using a WSUS system. Yes it says you should run the SCW; yes it says you should reindex the database.

    I agree, it probably doesn't say you should remove approvals from approved updates in order to successfully decline those updates. It also doesn't say you should disable the database and defragment the filesystem in order to maximum effectiveness from reindexing the database, but you should.


    Lawrence Garvin, M.S., MCSA, MCITP:EA, MCDBA
    SolarWinds Head Geek
    Microsoft MVP - Software Packaging, Deployment & Servicing (2005-2014)
    My MVP Profile: http://mvp.microsoft.com/en-us/mvp/Lawrence%20R%20Garvin-32101
    http://www.solarwinds.com/gotmicrosoft
    The views expressed on this post are mine and do not necessarily reflect the views of SolarWinds.

    Thursday, August 07, 2014 9:30 PM

All replies

  • The decision was made to delete everything in the content folders on all WSUS servers and issue a wsusutil.exe RESETcommand on all WSUS servers

    Probably the single worst thing you can do in this scenario -- as you're now observing.

    as well as change every patch to UNAPPROVED on the master server.

    Curosity: Did you remove the approvals before, or after, you launched the RESET task? I'm guessing before, as evidenced by your success on the upstream server... although you've now created yourself another couple hours of work sorting out which updates should still be approved.

    but the downstream replicas are stuck.

    Yep. Because you can't remove approvals on the replicas until after they sync the removal of the approvals from the upstream server.. so, in fact, you're now running a RESET on the replicas with hundreds (maybe thousands???) of approved updates to sift through and reconcile...

    Which will then generate *GIGABYTES* (maybe TENS of gigabytes) of downloads from the upstream WSUS server.. which of course, it cannot provide because those files have been deleted... which luckily should resolve itself fairly quickly on the downstream server as they should get HTTP 404 errors for the missing files.... or maybe not.... :-//

    An *autonomous* server (wherein this scenario would be a normal state), actually causes the upstream server to queue a download request from Microsoft, and then the downstream (autonomous) server checks again for that file at the next sync and downloads if its available.

    But, being as this is a totally abnormal state for a replica server, I don't know what the actual behavior of the upstream server will be. As I said, it could simply return a '404' for the missing file... or it could pass on the request for the file to the local BITS queue and now your USS is downloading gigabytes (or tens of gigabytes) of files from Microsoft.

    The CPU is maxed on a core and WSUS is running queries indefinately.

    What is going on and how do I fix this?

    That's what's known as a WSUSUTIL RESET.

    How you "fix" it is you WAIT. That is all you can do. There is no known way to terminate a WSUSUTIL RESET (well, you could format the hard drive and reinstall a fresh replica server).... which... actually.. might take less time than waiting on the RESET to finish.

    Luckily it's still 11 days until Patch Tuesday. :-)


    Lawrence Garvin, M.S., MCSA, MCITP:EA, MCDBA
    SolarWinds Head Geek
    Microsoft MVP - Software Packaging, Deployment & Servicing (2005-2014)
    My MVP Profile: http://mvp.microsoft.com/en-us/mvp/Lawrence%20R%20Garvin-32101
    http://www.solarwinds.com/gotmicrosoft
    The views expressed on this post are mine and do not necessarily reflect the views of SolarWinds.

    Friday, August 01, 2014 9:34 PM
  • Hello,

    It has been for several day, I want to confirm if the issue has been resolved.

    Please feel free to send your feedback.

    Tuesday, August 05, 2014 8:28 AM
  • Hello,

    It has been for several day, I want to confirm if the issue has been resolved.

    Please feel free to send your feedback.

    You were correct.  It went on for 4 or 5 days before I had enough.  I can't believe how inefficient and stupid WSUS behaves....  I uninstalled WSUS on the downstream replicas(which also had errors that I had to work through) and reinstalled with a fresh database.  They then would not sync 1 update with the master due to some kind of dependency but it wouldn't tell me enough info to fix it.  So I flipped them to syncing with Microsoft first, then flipped them back to syncing with the master wsus server and now all is well.  I put in place proper maintenance scripts now to prevent this from hopefully happening again but I don't see how.  It seems to me that after running long enough this will always happen.  We don't go back and change patches back to not approved or declined and I am not sure what Microsoft's recommendation is on this...
    Tuesday, August 05, 2014 11:19 AM
  • Hello,

    It has been for several day, I want to confirm if the issue has been resolved.

    Please feel free to send your feedback.

    You were correct.  It went on for 4 or 5 days before I had enough.  I can't believe how inefficient and stupid WSUS behaves....  I uninstalled WSUS on the downstream replicas(which also had errors that I had to work through) and reinstalled with a fresh database.  They then would not sync 1 update with the master due to some kind of dependency but it wouldn't tell me enough info to fix it.  So I flipped them to syncing with Microsoft first, then flipped them back to syncing with the master wsus server and now all is well.  I put in place proper maintenance scripts now to prevent this from hopefully happening again but I don't see how.  It seems to me that after running long enough this will always happen.  We don't go back and change patches back to not approved or declined and I am not sure what Microsoft's recommendation is on this...
    Wow, I am constantly declining superceded patches. I thought that is what you are supposed to do when maintaining a WSUS server.
    Tuesday, August 05, 2014 5:31 PM
  • Wow, I am constantly declining superceded patches. I thought that is what you are supposed to do when maintaining a WSUS server.

    It is.

    Sadly, it seems, these days, some people just don't want to do the job.


    Lawrence Garvin, M.S., MCSA, MCITP:EA, MCDBA
    SolarWinds Head Geek
    Microsoft MVP - Software Packaging, Deployment & Servicing (2005-2014)
    My MVP Profile: http://mvp.microsoft.com/en-us/mvp/Lawrence%20R%20Garvin-32101
    http://www.solarwinds.com/gotmicrosoft
    The views expressed on this post are mine and do not necessarily reflect the views of SolarWinds.

    Tuesday, August 05, 2014 11:55 PM
  • I can't believe how inefficient and stupid WSUS behaves....

    Let's be fair... you launched a utility for the wrong reasons, without knowing what the utility was for, nor the potential impact of running that utility. Don't be blaming the software because you made a bad decision.

    I put in place proper maintenance scripts now to prevent this from hopefully happening again but I don't see how.

    Because the whole situation was caused by failing to perform that maintenance in the first place.

    We don't go back and change patches back to not approved or declined

    Then you **WILL** continue to have problems because what you choose to refuse to do is a REQUIRED step of administering and maintaining a WSUS server.


    Lawrence Garvin, M.S., MCSA, MCITP:EA, MCDBA
    SolarWinds Head Geek
    Microsoft MVP - Software Packaging, Deployment & Servicing (2005-2014)
    My MVP Profile: http://mvp.microsoft.com/en-us/mvp/Lawrence%20R%20Garvin-32101
    http://www.solarwinds.com/gotmicrosoft
    The views expressed on this post are mine and do not necessarily reflect the views of SolarWinds.


    Your assumptions are all incorrect which makes you look like a jerk.  Your comments are also rude Lawrence.  You are completely unhelpful and it seems your purpose on this forum is just to be rude to people seeking help for a buggy product by blaming people.

    I unapproved the content BEFORE deleting the updates and performing the WSUSUTIL.EXE RESET on all of the wsus servers.  So your assumption is incorrect.  I still have no idea what the downstream servers were doing and why only the 2008 R2 downstream servers were impacted.  The 2012 downstream replica was just fine with this procedure.  Seems like a bug in WSUS to me....  

    Also we had WSUS set to decline superseded updates, etc as part of best practices.  What I meant by maintenance is that we don't reindex the databases like daily or anything.  we ran those utilities and reindex every few months.  The issue it appears is that over time(years) wsus gets too big for itself and cleanup/maintenance operations take forever.  Eventually the maintenance operations would just time out or cause errors and never complete.

    I don't go back and manually DECLINE updates that I have previously approved.  I let wsus do it as part of maintenance.  The content folder still got to be over 70GB over a few years of service and the whole wsus infrastructure became slugish and maintenance operations would time out even with quarterly maintenance which also times out and errors.

    I am not running maintenance daily instead of quarterly but I have a feeling in a couple years I'll be right back where I started.

    Wednesday, August 06, 2014 2:32 PM
  • I unapproved the content BEFORE deleting the updates and performing the WSUSUTIL.EXE RESET on all of the wsus servers.

    No, you didn't do this on *ALL* of the WSUS servers, you only removed the approvals on the UPSTREAM server only. You *cannot* remove approvals from a replica server, you must *synchronize* the downstream replica server to get the removal of those approvals.

    Furthermore, I promise you, if you had tried to synchronize the removal of thousands of approvals, the synchronization would have failed miserably. (This is also a known issue.)

    I still have no idea what the downstream servers were doing and why only the 2008 R2 downstream servers were impacted

    I told you exactly what the downstream servers were doing and exactly why they were impacted. You didn't remove the approvals, the servers still had thousands of approvals, and then you launched a WSUSUTIL RESET, which will take *DAYS* to complete on a server with thousands of approvals.

    The 2012 downstream replica was just fine with this procedure.

    Because.. no doubt... the WS2012 downstream server is NEWER, probably does not have all of the JUNK that the WS2008R2 servers have, and you just got lucky that the RESET ran a boatload faster on the WS2012 server than it did on the WS2008R2 servers. (Maybe even they have better hardware?)

    Also we had WSUS set to decline superseded updates, etc as part of best practices.

    I'm not quite sure what you mean by this. There is no option to automatically decline superseded updates. The only way you can get superseded updates declined is to run the Server Cleanup Wizard, and even then those updates will only be declined if you have manually removed the APPROVALS from those updates (or the updates are EXPIRED).

    Superseded updates with approvals are IGNORED by the Server Cleanup Wizard!

    we ran those utilities and reindex every few months.

    And the Server Cleanup Wizard did virtually nothing for you because you used it improperly.

    As for the database maintenance, the recommended interval is monthly, and even then, you'll only get marginal improvement from the reindexing if you're not also defragmenting the filesystem hosting the database files. If the database is spread across hundreds of disk clusters, the best you can ever hope for a database index is that it, too, will be spread across hundreds of disk clusters.

    I suggest the following maintenance procedure to be run monthly following your patch deployment activities:

    1. Removing Approvals
    2. Running the Server Cleanup Wizard.
    3. Defragmenting the filesystem (with the database service STOPPED).
    4. Reindexing the database

    I don't go back and manually DECLINE updates that I have previously approved. I let wsus do it as part of maintenance.

    And I'm telling you that WSUS does not do that! Unless you have manually REMOVED APPROVALS from those updates, the Server Cleanup Wizard DOES NOT decline those updates.

    The content folder still got to be over 70GB over a few years of service

    The content folder got to be 70GB for exactly ONE reason, and ONE reason only:

    The SCW could not DELETE FILES, because those FILES were associated with NOT-DECLINED updates that are still approved. It's just that simple.

    but I have a feeling in a couple years I'll be right back where I started.

    Yes, you will, unless you change your current procedures. I absolutely promise you that will happen, and not "in a couple of years", but probably before Christmas 2014!


    Lawrence Garvin, M.S., MCSA, MCITP:EA, MCDBA
    SolarWinds Head Geek
    Microsoft MVP - Software Packaging, Deployment & Servicing (2005-2014)
    My MVP Profile: http://mvp.microsoft.com/en-us/mvp/Lawrence%20R%20Garvin-32101
    http://www.solarwinds.com/gotmicrosoft
    The views expressed on this post are mine and do not necessarily reflect the views of SolarWinds.


    Wednesday, August 06, 2014 10:14 PM

  • I am still unclear on what my mistake was.  I think you are saying there is no automated way to remove old updates and that it is a manual procedure and that the cleanup wizard will not do this.  Can you point me to documentation on the correct procedure?  I've never seen any WSUS documentation that said to do anything other than run the cleanup wizard and reindex the database but that's not your fault.
    • Edited by davidb1234 Wednesday, August 06, 2014 10:53 PM
    Wednesday, August 06, 2014 10:53 PM
  • I am still unclear on what my mistake was.

    Fair enough. Let me try to explain further based on my understanding:

    You removed approvals from the upstream server, and apparently, assumed that action would be immediately effective on the downstream servers. It's not. Adding/Removing approvals are events just like anything else. Events must be synchronized. Unfortunately, as we've all learned from this forum over the past year, downstream servers (especially downstream servers lacking good maintenance) do not do well with large amounts of events to synchronize. So had you attempted to sync the removal of hundreds (thousands) of approvals, the downstream server would have choked all over that attempt. So, given that fact, and knowing that you cannot remove approvals from a replica server, the simple conclusion is that approval were not removed from the downstream server.

    Now, combine that with what a WSUSUTIL RESET does. That utility scans and reconciles every approved update with the content of the filesystem, to delete files no longer needed, or re-queue for download any files that are missing. The key here is the issue of scanning every approved update, and if there are thousands of approved updates, that task (particularly so on a machine lacking good maintenance) takes a Very Long Time. Then, to further complicate things, you purged all of the file content from those servers, which means none of it was found, which means every one of those approved updates scanned, also triggered an external call to create a queue item in the BITS task queue (taking even more time from the reset process).

    In the end, doing these actions in this manner does absolutely nothing of value, and simply ties up the server for several days, unable to service clients at all.

    I think you are saying there is no automated way to remove old updates and that it is a manual procedure and that the cleanup wizard will not do this.

    What I'm saying, which is exactly the same thing stated in the Server Cleanup Wizard dialog -- the Server Cleanup Wizard only processes NOT APPROVED updates.

    Can you point me to documentation on the correct procedure?

    I don't know that the "correct procedure" is explicitly documented, but the documentation does say the same thing as the dialog with respect to the update must be "Not Approved".

    http://technet.microsoft.com/en-us/library/dd939856(v=ws.10).aspx

    It's pretty much a Boolean thing. If the update has not been set back to Not Approved, nothing does anything to it.

    The only things that get declined "automatically" are superseded updates that were *never* approved to begin with (which also, btw, would have never had file content associated with them), and thus that activity (declining never approved updates) results in exactly *zero* files being deleted from the filesystem.

    I've never seen any WSUS documentation that said to do anything other than run the cleanup wizard and reindex the database but that's not your fault.

    Yeah, it doesn't say a lot of things that actually need to be done in the course of using a WSUS system. Yes it says you should run the SCW; yes it says you should reindex the database.

    I agree, it probably doesn't say you should remove approvals from approved updates in order to successfully decline those updates. It also doesn't say you should disable the database and defragment the filesystem in order to maximum effectiveness from reindexing the database, but you should.


    Lawrence Garvin, M.S., MCSA, MCITP:EA, MCDBA
    SolarWinds Head Geek
    Microsoft MVP - Software Packaging, Deployment & Servicing (2005-2014)
    My MVP Profile: http://mvp.microsoft.com/en-us/mvp/Lawrence%20R%20Garvin-32101
    http://www.solarwinds.com/gotmicrosoft
    The views expressed on this post are mine and do not necessarily reflect the views of SolarWinds.

    Thursday, August 07, 2014 9:30 PM