locked
Trying to locate a course of action to take when user response time is terrible RRS feed

  • Question

  • I'm trying to locate useful resources to teach myself more about how to handle situations as an admin of a MOSS 2007 farm. There's 1 machine being used. We have 1000+ users, with about 900 "site collections" (the majority of which are the mysites). The search query and crawler processes run on the same machine as the rest of the sharepoint processes.

    Yesterday I was on the phone with a couple of users, trying to figure out what to do about one of those crazy problems SP users have, when suddenly we found we were no longer able to move around the site or even refresh the browser screen. I rdp'd to the server, and it indicated it was running about 99% idle. I contacted the sql server admins and they indicated they didnt see anything major - a write delay on tempdb, but only small. They looked at the hardware stats for the server and there wasn't anything there.

    The incident lasted 5-10 minutes. After that, things returned to normal.

    I don't know what else to look at. When I look at the sharepoint logs, I seldom see anything discernable as a problem. What are some things that I can do when an event like this is in progress to identify and resolve the situation?

    Thursday, July 21, 2011 5:41 PM

Answers

  • HI,

    You do not have load balancer so that mean your entire traffic is being directed to only one server.

    This is a big reason of outage you have faced because the server was heavily involved that it was unable to server more requests.

    Also at the same time crawl was running so it too was consuming resources.

    I suggest you to schedule your crawls during the low user activity time to avoid any kind of slowness or outages at the userends. From your description it seems that at that time maximum user activities were going on.

     

    Do you have log parser installed? If yes you canuse the following query to get it:

     LogparserInstalltionDirectory :\ Logparser -i:IISW3C "SELECT COUNT(*) FROM log file path along with file name  TO IIS.CSV WHERE TO_TIME(Time) Between TIMESTAMP('StartTime','hh:mm:ss') AND TIMESTAMP('EndTime','hh:mm:ss')"

    I hope this will help you out.

    Thanks,

    Rahul Rashu

    Thursday, July 21, 2011 7:27 PM

All replies

  • Hi ,

    Are you also getting the log entries specified in the this thread:

    http://social.technet.microsoft.com/Forums/en-US/sharepointadmin/thread/8c9dcb8e-838f-438e-9461-5a510522ad67/

    Have you traced it from network end?

    Have you checked if the load balancer is configured correctly?

    When the outage happened whether there were any processes running in the system causing high CPU utilization?

    Have you checked the IIS connection limit and have you compared it with the data of that time? If not I suggest you to refer IIS logs and use Log parser to get the count of users working at that time to get clear idea.

     

    I hope this will help you out.

    Thanks,

    Rahul Rashu

     

     

    Thursday, July 21, 2011 5:55 PM
  • 1. No, I am not getting the log entries in that other thread.

    2. I have not traced packets between desktop, sharepoint server, and sql server.

    3. as far as I am aware, we have no load balancer.

    4. There were no processes on the system causing high cpu utilization.  HOWEVER, what I do is a lot of activity during that time from the search server. The last of activity shows up 5 minutes before we start getting server timeouts from sharepoint.

    from the 12 hivelogs - our outage was approx 14:35-14:40.

    07/20/2011 14:30:49.30 mssearch.exe (0x079C)                   0x0228 Search Server Common           GatherStatus                   0 Monitorable Remove crawl 71153 from inprogress queue - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:6651  

    07/20/2011 14:30:49.30 mssearch.exe (0x079C)                   0x0228 Search Server Common           GatherStatus                   0 Monitorable Unlock Queue - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:2879  

    07/20/2011 14:30:49.00 mssearch.exe (0x079C)                   0x2534 Search Server Common           GathererSql                   0 Monitorable CGatherer::LoadTransactionsFromCrawlInternal Flush anchor, count 0 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4871  

    07/20/2011 14:30:49.02 mssearch.exe (0x079C)                   0x2384 Search Server Common           GatherStatus                   0 Monitorable Advise status change 12, project AnchorProject, crawl -1 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4781  

    07/20/2011 14:30:49.11 mssearch.exe (0x079C)                   0x1BF0 Search Server Common           GathererSql                   0 Monitorable CGatherer::LoadTransactionsFromCrawlInternal Flush anchor, count 0 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4871  

    07/20/2011 14:30:49.30 mssearch.exe (0x079C)                   0x1BF0 Search Server Common           GatherStatus                   0 Monitorable Advise status change 4, project AnchorProject, crawl 71153 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4781  

    07/20/2011 14:31:18.45 wsstracing.exe (0x0940)                 0x1354 ULS Logging                   Unified Logging Service       uls1 Monitorable Tracing Service lost trace events.  Current value 5.  

    07/20/2011 14:32:27.74 w3wp.exe (0x18C0)                       0x1D18 Windows SharePoint Services   Topology                       88gs Monitorable hostHeaderSiteInfo is null  

    07/20/2011 14:32:27.74 w3wp.exe (0x18C0)                       0x1D18 Windows SharePoint Services   Topology                       88gv Monitorable hostHeaderSiteInfo is null  

    07/20/2011 14:32:27.74 w3wp.exe (0x18C0)                       0x1D18 Windows SharePoint Services   Topology                       88gr Monitorable alternateUrl is null  

    07/20/2011 14:43:43.88 w3wp.exe (0x18C0)                       0x248C Windows SharePoint Services   Topology                       88gs Monitorable hostHeaderSiteInfo is null  

    07/20/2011 14:43:43.88 w3wp.exe (0x18C0)                       0x248C Windows SharePoint Services   Topology                       88gv Monitorable hostHeaderSiteInfo is null  

    07/20/2011 14:44:18.66 wsstracing.exe (0x0940)                 0x1354 ULS Logging                   Unified Logging Service       uls1 Monitorable Tracing Service lost trace events.  Current value 1.  

    I see so many of the mssearch.exe entries in the log leading up to this particular time.

    5. The idea about the IIS connection is a great one. Here's the prelim info I can find - I am not certain whether I have found the info you were asking about.

    I open the IIS Manager, open the "Web Sites" folder, and select the default web site_pair and open its properties.

    The connection timeout is 120, and the HTTP Keep-alives is on. There is no bandwidth throttling and the web site connections are unlimited. 

    In the 12hive/logs/guid/usage/ folder i open the 14.log file. There are 4 entries during the time period being discussed. I don't know how to use the log parser to get the count of users working at that time - it would be very useful to know how to do that though.

     

    Thursday, July 21, 2011 6:15 PM
  • HI,

    You do not have load balancer so that mean your entire traffic is being directed to only one server.

    This is a big reason of outage you have faced because the server was heavily involved that it was unable to server more requests.

    Also at the same time crawl was running so it too was consuming resources.

    I suggest you to schedule your crawls during the low user activity time to avoid any kind of slowness or outages at the userends. From your description it seems that at that time maximum user activities were going on.

     

    Do you have log parser installed? If yes you canuse the following query to get it:

     LogparserInstalltionDirectory :\ Logparser -i:IISW3C "SELECT COUNT(*) FROM log file path along with file name  TO IIS.CSV WHERE TO_TIME(Time) Between TIMESTAMP('StartTime','hh:mm:ss') AND TIMESTAMP('EndTime','hh:mm:ss')"

    I hope this will help you out.

    Thanks,

    Rahul Rashu

    Thursday, July 21, 2011 7:27 PM
  • Thanks - I have one more question before I can get the data and look at it.

    what specific log file is being discussed here?

    Are you talking about the 12 hive/Logs files {server name}-{date stamp}-{time stamp}.log ?

    Are you talking about the 12 hive/Logs/Usage files?

    Or some other log files - and if something else, what is the path on your system (I will have to hunt, but perhaps by seeing the name I will be able to search for some unique name.

    Thank you for your patience?

    Friday, July 22, 2011 2:08 PM
  • HI,

    I have mentioned about the IIS logs.

    Go to your IIS and select your site and then click on properties this will give you the location of IIS logs in your server.

    I hope thsi will help you out.

    Thanks,

    Rahul Rashu

    Friday, July 22, 2011 2:31 PM