none
Publishing a Project Plan causes a full crawl of all Project sites in collection and pegs memory at 100% utilization RRS feed

  • Question

  • Publishing a Project Plan causes a full crawl of all Project sites in the collection.  This causes memory on the WFE running the crawl to slowly creep up (takes about 15 mins) until all server memory is used.  This in turn causes PWA client timeouts\errors as there is no available memory to process other requests.  I've checked ULS while this is occurring and the following entry is logged thousands of times "GetSecurityChangesFromSharePoint : Site Id = '77415162-5a9c-461e-9c36-6f5aded60c4b'. db id = '4fed8194-d804-47aa-ba81-58e92a168c03'. Group membership/security policy changes detected. All furls in the in the site collection will be crawled."

    Even though we are load balanced users connecting to the second WFE may still experience slow response while the crawl is running.  To make matters worse if another project is published while the first crawl is running the whole process may be kicked off again; imagine what happens on Friday afternoons when everyone is trying to update their plans before the weekend.  I've seen the memory peak for over an hour at a time as multiple projects are published, adversely affecting PWA and publish performance.

    I've checked the "Project Server: Synchronization of SharePoint Server permissions to Project Web App permissions job for Project Services" timer job it is scheduled to run every minute (out of the box).  It obviously isn't running every minute otherwise this process would never end as a full crawl takes about 15-17 minutes to complete.

    Why is a full crawl being done with nearly every project plan publish, this makes no sense.  It should only be crawling the specific site for the project you just published rather than crawling the whole collection each time.  Complete site collection crawls should only be running after midnight when user traffic is low.

    Is anyone else seeing this issue?  I've been hitting the web and forums to see if anyone else is experiencing this issue and I haven't found a single hit.  I started noticing this behavior about two weeks ago however it could have been going on for sometime as PWA performance isn't affected every time the memory peak occurs.

    I don't recall seeing this behavior prior to installing the September 2014 CU.  I've checked all the published CUs after September 2014 and it isn't mentioned as a fix in any of them.  The only way I can see to prevent this is to disable the Project site sync feature and then do manual syncs as needed.  This is obviously not a fix as I have no desire to be a sync monkey. 

    If I don't get any helpful responses from this forum I will open a ticket with Microsoft as this is not desirable behavior.


    Frank Miranda Florida Hospital MIS

    Tuesday, March 10, 2015 3:54 PM

All replies

  • Frank,

    I have a theory on this, and have not had the chance to confirm.

    To start with, take a look at this article: http://blogs.msdn.com/b/kaevans/archive/2013/05/06/clarifying-guidance-on-sharepoint-security-groups-versus-active-directory-domain-services-groups.aspx

    I think this is what is happening: The Project Publish action (with Sync Turned on), removes the users and adds them back to the project sites, user by user, to the synced groups, which according to the article prompts a search crawl.

    I think a better way is to disable the project site sync, and then just use AD groups to add to each site (could be done with a workflow).

    Anyway, I will test this when I get a chance, but if you confirm this, let me know.


    Cheers,

    Prasanna Adavi, Project MVP

    Blog:   Podcast:    Twitter:    LinkedIn:   


    Tuesday, March 10, 2015 4:09 PM
    Moderator
  • Can you say Major Bug.

    Yes the behavior was completely revamped in the August CU for how project site syncs occur to fix an issue that was preventing certain users from being synced to certain sites.  That problem was resolved but it looks like the complete removal and addition of every user to the site may very well be triggering a full crawl.

    HOWEVER, if I only publish plan A then the crawl should only be happening on that plan.  It doesn't make sense to crawl the whole site collection since I didn't republish or make changes to very project plan or site in the collection.  The logic behind the full crawl makes no sense.

    Having to run a workflow with every publish is just silly since the function to sync site security is already built into the product.  Microsoft simply needs to fix this bug.

    This is precisely why I don't like installing CU's they fix one problem and break another 20 things.


    Frank Miranda Florida Hospital MIS

    Tuesday, March 10, 2015 4:20 PM
  • This week I've been doing further testing on this issue and have discovered that my initial assumption was wrong.  It isn't the publishing of Projects that triggers the full site security crawl.  The publishing of Projects and the security sync is working correctly.  It so happened as I was doing my initial testing there were a lot of projects being published and it looked as if they were triggering the full crawl.

    Here is what I have found.  We have two separate farms, a Commons farm for all things not Project and a Project Farm.  We run our enterprise search on the Commons farm and it crawls itself, our drupal environment and the Project Farm.  The Project Search crawl was set to run at the 55 minute mark every hour.  As soon as that crawl begins I start seeing the messages I listed above in the ULS, in addition there is a GetChangedFriendlyUrlBasedWebs being requested.  When researching that request I find that it does a full enumeration of all sites that have changed.  "The protocol server MUST enumerate all the sites (2) containing friendly URLs and that have changed since given changeToken and MUST create a FriendlyUrlBasedWeb object for each site and add it to the ArrayOfFriendlyUrlBasedWeb which is returned." https://msdn.microsoft.com/en-us/library/hh642870(v=office.12).aspx

    That is ugly because this enumeration process is what is likely chewing up all available memory on the server for the duration of the crawl.  The process usually releases the memory when the process terminates.  However sometimes it takes another 10-15 minutes for the process to time out, terminate and release the memory by then its just a few minutes till the next scheduled Search Crawl and the whole process starts up again.

    To me this is a major bug, I don't understand why the enumeration should require the use of ALL available memory.  I suppose if you had 10 web front ends you probably wouldn't notice the issue but when you only have 2 it is very noticeable and essentially brings SharePoint and Project to a halt until the crawl completes.

    To mitigate this issue we have set the crawl to run every three hours so that it has less impact on the users.  So far this has helped limit the impact to the server for 15 minutes every 3 hours.

    If anyone else is seeing this issue and has a better solution I'd love to hear it.  We have tweaked our Search crawl settings to only read 2 objects at a time and this hasn't made a difference.


    Frank Miranda Florida Hospital MIS

    Friday, March 13, 2015 4:30 PM
  • I'm experiencing the same issue on a Project Server 2013 farm.

    During non-crawl time the server mem aberages around 14Gb used, when the crawl runs it increases up to 23 out of 24Gb and never goes down until app pool recycle ... CPU is also super high during crawl (99-100%) but goes down once crawl is completed.

    I had continuous crawling enabled. I've know switched it back to inc every 6h. I've also set the search service performance level to PartlyReduced. So I hope it'll minimize the impact on the end-users.

    Yet this doesn't resolve the memory usage which is not released.

    The farm is made of 2 VMs, 24Gb fixed size RAM, 8vCPU.
    SQL is on physical machines.
    Performances are great before crawl kicks in.

    Did you eventually find anything else about the issue described above?

    Thanks

    Wednesday, November 4, 2015 12:44 PM
  • Unfortunately I have not.  I basically did the same thing you did and changed the sync interval to two times a day 6am and 7pm which will have very little impact on users since that is outside their normal work hours.

    I will be updating my SharePoint Farm to SP1 and the October 2015 PU and see if the behavior changes. I have little faith it will since no one from MS support has responded to this post. I've been monitoring the CU's for sometime and this issue has never been listed or addressed in any of them.

    I may open a ticket with MS If I don't see any more responses to this post.


    Frank Miranda Florida Hospital MIS

    Wednesday, November 4, 2015 3:04 PM
  • OK, thanks for the pompt reply;

    I'm tempted to auto-recycle the EPM app pool every 6h after the crawl in order to release the memory on the server, but this is a dirty way of hiding the real problem. Anyway, thanks again and have a nice day

    Wednesday, November 4, 2015 4:47 PM
  • Interesting the memory on my server is usually released within 5-10 minutes after the crawl completes.

    Frank Miranda Florida Hospital MIS

    Wednesday, November 4, 2015 6:07 PM
  • Hi Frank,

       I have faced this situation as well myself and we need to accept the fact that system behaves like this. What you can do is first check your SQL Cluster or DB server to have ample CPU cores and Memory. Secondly, you can Optimize your queue job service in the farm. Refer to below link 

    https://technet.microsoft.com/en-us/library/fp161228.aspx

    Regards,

    Faizan.


    Regards, Syed Faizan ur Rehman, CBPM®,PRINCE2®, MCTS

    Tuesday, November 10, 2015 5:55 AM
  • I disagree, this is poorly written code.  It doesn't matter the specs of the server it will totally consume all available memory on the server for the duration of the crawl and up to 15 minutes after the crawl completes.  Our Web front ends have 16GB of ram each which is far more than needed for the number of current users.  The daily operational memory load on these servers (excluding the crawls) rarely climbs above 8.6GB that is well below any level that would impact performance.  Doubling the ram would only mean that the crawl would consume all 32GB which would not resolve the problem.  Fluidetom above states that his severs have 24GB of ram and the crawl consumes 23GB for the duration of the crawl.  This is clearly broken code.

    There is no need for a single process to consume 95% of available RAM just to complete a crawl.  Remember this is occurring on the WFE not the SQL backend cluster.  This is abnormal behavior and can't be fixed by optimizing your queue jobs or adding more resources to the server.  The code needs to be fixed.

    The only way to minimize impact is to schedule the crawls outside of normal working hours otherwise you will have a huge negative impact to users while the crawl runs.

     

    Frank Miranda Florida Hospital MIS

    Tuesday, November 10, 2015 5:45 PM
  • Hi Frank,

    Any luck on resolving this issue? I'm experiencing the same thing.


    Paul Lor

    Thursday, November 10, 2016 7:47 PM