locked
SharePoint 2007 Random Freezes\Timeouts SPUser.Notes, SPUser.Groups, SPGroup.Users, & Quick Launch Navigation? RRS feed

  • Question

  • We have a production SharePoint 2007 farm that has about 10 site collection with on average 15 sub-sites each. We have between 100-200 groups for each site collection. We have little under 4k users total. Our DB & web servers are 32bit servers with 4GB of ram and 4 cores each. During peak load we have between 90-300MBs of available physical memory on the web server. Maybe part of the problem, but we are not seeing any reports of out of memory..???

    Issues:

    1. Randomly our entire SP web app will freeze(little CPU utilization and all requests will time out).  The site resumes normal functionality, (for a bit), if we recycle the app pool. NOTE: We have a .net 4 web application that shares the same IIS web app that still works even though SP is locked up(They run in sep app pools). 

    2. Randomly an individual site collection will lock up. It will never load unless we load the API for the site and slowly hide random quick launch nodes on a random child sites...

    2A. We can some times 'fix' this navigation issue by adding a new navigation node to the top of a quick launch.

    2B. Some this can be fixed by recreating the SPGroups that control the item on the quick launch menu.

    2C. We 'Fix'ed this once by dropping some/all users Notes/'tp_notes'/SPUser.Notes

    3. When calling getGroupCollectionFromUser(), for a select group of specif users(repeatable until we make a change to their groups), the requests times out and never responds. 

    3A. We have worked around this by either adding or removing random that user's SPGroups. 

    Also, when the entire SP web app was locked up we ran a built in SQL management studio report on dead locks. It reported 0 deadlocks. 


    Thoughts?

    Thank you for reading this.

    Zero

    Tuesday, August 27, 2013 2:03 AM

Answers

All replies

  • Hi,
    For this issue, I’m trying to involve someone familiar with this topic to further look at it.
    Thanks,
    Qiao
    Forum Support
    Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Qiao Wei
    TechNet Community Support

    Wednesday, August 28, 2013 2:23 AM
    Moderator
  • Hi Zero,

    i think i had this kind of issue in the past days, in my issue i the root cause is the ACL, that more than 1500, so that the performance was getting slow, and time out usually happened if the issue was too slow.

    please have a check: http://technet.microsoft.com/en-us/library/cc262787(v=office.12).aspx

    http://msdn.microsoft.com/en-us/library/ms457294(v=office.12).aspx

    http://www.sharepointjoel.com/Lists/Posts/Post.aspx?ID=123

    http://social.technet.microsoft.com/Forums/sharepoint/en-US/bcee1b6c-f95b-4df0-ad12-2d25989b7e79/sharepoint-2007-acl-2000-64kb-limitations

    in my testing, i tried to create a dummy environment, and have a try, while dummy environment dont have that many of ACLs, then the perfomance is better, as the ACL is not that many.


    Regards,
    Aries
    Microsoft Online Community Support


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.

    • Marked as answer by Zero0Day2 Thursday, September 5, 2013 1:41 PM
    Wednesday, August 28, 2013 8:49 AM
  • Aries:

    Thank you for the response. I we are not close to that 1500 marker. Also, it doesn't seem to be a performance issues. As the site will never load after it goes down.

    Do you have any other suggestions?

    Thank you

    Zero

    Wednesday, August 28, 2013 8:22 PM
  • +1 @Zero

    We have a similar farm setup and have been experiencing these symptoms. All collections will completely fail under heavy load until an iisreset is issued. The event log will show deadlocks detected. Also, we'll have random sites/child sites completely lock up after security/navigation changes and can only be fixed by touching quick launch settings or modifying group membership.

    Thanks.

    Wednesday, August 28, 2013 8:35 PM
  • New development...

    Even though the site is locked up, I am still able to go directly to at least CSS files that are stored in a doc lib. 

    Thank you for reading this,

    Zero

    Wednesday, August 28, 2013 9:16 PM
  • Hi Zero,

    its quite strange, because when the site is not accessible, if sql dont have any deadlocks, then the bottleneck most probable in IIS.

    if this is only happened on one site, i am thinking that if you possibly to use different browser than IE, and see, if the site able to load.

    usually to get the root cause of the application hang issue, we grab the memory dump and check the trace, to which state of the process that is hang.

    http://support.microsoft.com/kb/919790

    http://blogs.msdn.com/b/johan/archive/2007/01/11/how-to-install-windbg-and-get-your-first-memory-dump.aspx


    Regards,
    Aries
    Microsoft Online Community Support


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.


    Thursday, August 29, 2013 2:35 AM
  • Aries:

    Thank you for your response. I have not tried the memory dump and I will work that angle next.

    However, at this point I'm hesitant to say its a SQL\IIS deadlock issue. Because of the following things i have noticed.

        1. I can load other site collections(when the issue is isolated to 1 site collection).

        2. When it impacts all site collections, i can load non-sp resources that are in the same web app(These non-sp resources require DB connects to diff databases, but same db server).

    Thursday, August 29, 2013 3:31 AM
  • Hi Zero,

    as i know, IIS have also worker threads.

    for example we have an application process, IIS may already set the worker threads to that application, when the worker threads are out and not able to have another request to be received, then the timeout or not able to access the site may appear.

    if you reset or recycle your IIS, the issue may be gone for temporary, but when the worker threads out again then it may happened again.

    with the memory dump process, we may check which application that causing it, most of my experience deal with the garbage collection issue, that this garbage collection is accessed way too often, so that the workers are out.

    please check this best practice, seems quite similar with your needs:

    http://msdn.microsoft.com/en-us/library/aa973248(v=office.12).aspx


    Regards,
    Aries
    Microsoft Online Community Support


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.

    Thursday, August 29, 2013 4:13 AM
  • Aries:


    I continued to work this issue last night. Couple more interesting aspects. When an individual site collection is not loading, i cannot export the site collection(stsadm -o export). 

    Also, the downed site collection I'm testing with, seem to start working last night(with out intervention). Then I went to bed and woke up to it not responding again. 

    Thank you for reading this,

    Zero

    Thursday, August 29, 2013 2:31 PM
  • Hi Zero,

    regarding this, if i may ask, do you have any like content deployment process when you sleep?

    if the content deployement content is way too much, it may cause the worker to be out.

    perhaps you can cut your content deployment content, and have a try

    http://technet.microsoft.com/en-us/library/cc263428(v=office.12).aspx

    http://blogs.technet.com/b/stefan_gossner/archive/2009/01/16/content-deployment-best-practices.aspx?PageIndex=2


    Regards,
    Aries
    Microsoft Online Community Support


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.

    • Marked as answer by Qiao WeiModerator Thursday, September 5, 2013 3:22 AM
    • Unmarked as answer by Zero0Day2 Thursday, September 5, 2013 1:41 PM
    Friday, August 30, 2013 1:41 AM
  • Aries:

    We started digging more into the ACL issue you initially mentioned. It's not an exact match to our issue, but it seems to be a close match. We are trying to figure out how make those changes work for our environment. Thank you for your time.

    Zero

    Thursday, September 5, 2013 1:47 PM