Changing collection limiting is taking between 5 and 10 minutes, scripted or using GUI RRS feed

  • Question

  • Hi everyone,

    Environment is SCCM 2012 R2 SP1 CU1, CAS + 10+ Primaries in separate domains. 

    I am migrating content from 2 x 2007 Environments into 1x SCCM 2012 Environment.

    I migrated approx. 9000 collections (only, no content), when everything began to perform extremely poorly. Console crashes, my script to set limiting was freezing, WMI/WmiPrvSE exceeding 500MB and crashing, SMSProv.log going nuts etc etc etc. Bascically very hard to do anything without timeout/crashing. I discovered Incremental updates were enabled for all 9000 imported collections and scheduled updates were also enabled for every 2 hours. - MY BAD.

    I fixed the WmiPrvSE problem by increasing allowed memory, as per: https://blogs.technet.microsoft.com/askperf/2014/08/11/wmi-high-memory-usage-by-wmi-service-or-wmiprvse-exe/    Performance was better but still sluggish.

    I disabled all incremental updates and started setting Scheduled updates to every Sat at midnight. This has brought everything back to a workable state, EXCEPT for setting Collection Limiting and Updating Collections. 

    Collection Limiting is taking between 5 and 20 minutes, using PowerShell cmdlets or using GUI. And Updating a collection can take up to half an hour for the hourglass to disappear. 

    Colleval.log on the primaries is not showing any errors, nor is SMSProv.log on the CAS. I cannot figure out what is causing the slow setting of Limiting.

    I have 5000 collections left to migration and update the limiting on. Note that changing incremental updates or scheduled update using PShell or GUI is still working quite quickly. 

    Each Primary has its own SQL box, these all seem fine, however the CAS SQL is doing something a little strange. The Primary SQL boxes have 4 x CPU and seem to all perform fairly evenly. The CAS SQL on the other hand has 8 CPUs and only CPU2 is high, the other CPUs seem quite low, as in possibly not being used. 

    What is best practice for CAS SQL?

    Friday, February 5, 2016 2:31 AM

All replies


    Hi all,

    I have just applied the WMI fix describes above on all the primaries. It appears to be settling down. 

    About 10,000 .UDC files were generated on one of the primaries that I was running the Set-Limiting script from that apparently couldn't be generated before because WMI (WmiprvSE) was falling over, it must have been starting from the start again or something. :( it has been a painful two weeks.

    I will report back again in an hour with final update on whether I can now set limiting on Collections as per normal. I want to process all these .UDC files before doing anything else. 

    Hopefully this helps someone else.

    Friday, February 5, 2016 5:59 AM
  • The Primary SQL boxes have 4 x CPU and seem to all perform fairly evenly.

    How many clients will there be in total?
    CPUs do not matter too much here. More important is the amount of RAM and disk performance.

    Torsten Meringer | http://www.mssccmfaq.de

    Friday, February 5, 2016 6:57 AM
  • "Environment is SCCM 2012 R2 SP1 CU1, CAS + 10+ Primaries in separate domains."

    This also matters. Each and every domain you add to a forest *will* cause additional overhead and latency in *every* administrative action you do. Why do you have 10 primary sites? If it's because you had 10 primary sites in 2007 that's an bad answer and honestly you should stop and start over. Primary sites in 2007 are not the same as primary sites in 2012 -- in fact they are quite different on many levels.

    Jason | http://blog.configmgrftw.com | @jasonsandys

    Friday, February 5, 2016 2:04 PM
  • 300,000 Clients.

    CAS SQL has 64GB, Primary SQLs have 8GB and at times are getting up to about 80-90% utilization. I think our design requested 16GB for each Primary SQL box. So I will look at getting these increased.

    I will check out the Disk IOPs and report back. What is recommended?

    Friday, February 5, 2016 11:46 PM
  • Hi Jason, 

    The reason for the 10 Primaries is due to a one way trust with the other domains. From memory here, the domain that the CAS and 1x Primary is in are trusted by the other 9 domains, but not the other way.

    Friday, February 5, 2016 11:48 PM
  • UPDATE 2:

    After applying WMI hotfixes and increasing MaxPerHost (1GB) and MaxAllHost (2GB) in Wbemtest yesterday, I can update 1 collections limiting and it processed quite quickly, <30 seconds. And it replicates back to the CAS within 1 min.

    But doing a second one took ~ 2 mins. But the Dialog box closed, which wasnt happening before. Previously it was *eventually* processing but the dialog box and the console would freeze and never close, on CAS or Primaries. 

    I decided to wait 10mins and try a script with 2 collections to update. Again the first went through < 30 seconds, then the second task is executed. It has been 25 mins now and it has just finished the second item. I was connected to the Primary PSdrive not the CAS. 

    Unfortunately I have about 4000 collections left to import and set limiting on, so I am at a loss how best to proceed. 

    Perhaps I could split up the work load across all the primaries. But that would be still 400 each and I fear this is only going to get worse once I import more collections and again worse when I bring across deployments, packages etc.

    Does anyone think increasing the RAM on Primaries from 8GB to 16GB will assist in processing time?

    Saturday, February 6, 2016 12:27 AM
  • That's not a valid reason to have multiple primaries. ConfigMgr simply does not care about AD domains and trusts when it comes to managing clients. Why would it? It doesn't rely on AD authentication for anything.

    At most, to support 300,000 clients, all you need is 3 primaries (really 2 as of 2012 SP2 / R2 SP1).

    Evaluating collections happens on the primary. Thus, every time a collection needs to be updated, the CAS must communicate with each and every primary, each primary then evaluates the collection, and returns its results to the CAS which then compiles them. Thus, the more primaries, the longer things will take.

    Jason | http://blog.configmgrftw.com | @jasonsandys

    Saturday, February 6, 2016 1:06 AM
  • Thanks Jason. I will clarify the rational behind the design. I know it was done for a reason. However that is good to know re: 2 or 3 PRI's required to manage 300,000 clients. 

    It just seems strange that one collection limiting change takes 1 min, plus 1 min replication. Then the second one takes about 20 minutes to complete the limiting. 

    Changing Incremental Updates or Scheduled updates happens very quickly. I can do 700 in ~15mins. Wouldnt a change like this also cause a Collection Evaluation? I can monitor a CollEval inbox to check I guess. 

    Saturday, February 6, 2016 4:03 AM
  • "Changing Incremental Updates or Scheduled updates happens very quickly."

    No, these don't trigger collection eval to my knowledge -- why would they?

    As for the varying collection eval times, well, there's always variability and now you've increased the variables 10 fold.

    For your design specs, 8GB is very low for a non-trivial primary site.

    Are your SQL instances co-located? If not, that's causing additional latency.

    Once again, honestly, the design is terrible and not based on what 2012 requires (just being blunt here). You will experience many, many more problems in the future here. 

    Jason | http://blog.configmgrftw.com | @jasonsandys

    Saturday, February 6, 2016 3:52 PM
  • Yes Jason, good point regarding the coll eval trigger for Incremental / Scheduled updates. That wouldn't make sense.

    Unfortunately my hands are well and truly tied in regard to the design.

    "As for the varying collection eval times, well, there's always variability and now you've increased the variables 10 fold."       Yes I can attest to this. Today I was able to update 3 manually and it took < a minute. Tried three more and it is still not complete 15 mins later. 

    No SQL is not co-located on the primaries. All Sites are split into, PSS, SQL, SSS, SUP. Is that what you mean? Or are you talking DataCentres?

    I appreciate your directness. I am going to up the RAM to 16GB on each Primary and see if it makes any difference. If not we are going to have to call a meeting and reassess the plan going forward. This isn't working and it is only going to get worse.

    • Edited by Ewan Monro Sunday, February 7, 2016 10:42 PM
    Sunday, February 7, 2016 10:42 PM
  • Correction, the SQL boxes for Primaries DO have 16GB but design asked for 32GB. I will be chasing this up today and will report back whether or not it improved anything.

    All Primary SQL boxes are currently using ~15.5GB. And all were restarted last Friday post WMI hotfixes/wbemtest adjustment. Also, only one Primary is managing clients.

    Sunday, February 7, 2016 10:53 PM