none
SharePoint 2010 - The start address cannot be crawled

    Question

  • Hi Guys,

     

    I'm a bit stuck trying to sort crawling issues on a brand new three-tier SP2010 farm.

    I do not know whether the problem is related to SharePoint configuration or to the F5 load balancer in front of the farm.

    On the load balancer, we've created a rule to redirect all traffic to https://site.domain.com to the SP2007 farm and all the traffic to https://site.domain.com/sites/awebsite to the SP2010 farm. This is working fine.

    I'm trying to get https://site.domain.com/sites/awebsite crawled, but each time I start a full crawl I get this event in the event viewer

    Log Name:      Application
    Source:        Microsoft-SharePoint Products-SharePoint Server Search
    Date:          9/8/2011 9:32:12 PM
    Event ID:      14
    Task Category: Gatherer
    Level:         Warning
    Keywords:      
    User:          DOMAIN\sp_farm
    Computer:      SERVER.domain.corp
    Description:
    The start address https://site.domain.com/sites/awebsite cannot be crawled.
    
    Context: Application 'Search_Service_Application', Catalog 'Portal_Content'
    
    Details:
    	This item could not be crawled because the repository did not respond within the specified timeout period. Try to crawl the repository at a later time, or increase the timeout value on the Proxy and Timeout page in search administration. You might also want to crawl this repository during off-peak usage times.   (0x80040d7b)
     


    In the ULS logs I can see this other error message

    The start address https://site.domain.com/sites/awebsite cannot be crawled.  Context: Application 'Search_Service_Application', Catalog 'Portal_Content'  Details:  Access is denied. Verify that either the Default Content Access Account has access to this repository, or add a crawl rule to crawl this repository. If the repository being crawled is a SharePoint repository, verify that the account you are using has "Full Read" permissions on the SharePoint Web Application being crawled.   (0x80041205)


    I do not have a huge experience with SP2010, so any help is welcome.

     

    Thanks in advance

     

    Thursday, September 08, 2011 8:45 PM

All replies

  • Hi,

    You gave the crawl account 'full read' access to this affected web application, right?

    Also I came across similar issue with sites with Host Header. My solution was to Disable the Loopback Check.

    Hope this helps!


    BlueSky2010
    Thursday, September 08, 2011 9:04 PM
  • Hi BlueSky,

     

    Thanks for the quick reply.

    yes indeed, i've disabled the loopback and restarted all 3 servers.

    the crawl is running with the Sp_Farm account.

    I can see in the permission of the application that "Local farm" has full control

     

    Thursday, September 08, 2011 9:27 PM
  • Can you check if you can browse your site from the SP server? 

    If not, you will need to set the loopback on the the SP 2010 server so the site can be browsed locally. Once the site can be browsed locally, the crawler will have access to it. 

    http://support.microsoft.com/kb/926642  Method - 2 to set up a loopback.


    • Edited by JohnFL_B Thursday, September 08, 2011 9:36 PM
    Thursday, September 08, 2011 9:31 PM
  • Can you check if you can browse your site from the SP server? 

    If not, you will need to set the loopback on the the SP 2010 server so the site can be browsed locally. Once the site can be browsed locally, the crawler will have access to it. 

    http://support.microsoft.com/kb/926642  Method - 2 to set up a loopback.



    Hi,

    I can browse the site locally from the SP server running the CA and the index (the 2 other are WFE)

     

    Thursday, September 08, 2011 9:43 PM