Crawling SharePoint publishing site RRS feed

  • Question

  • Hi,

    I want to setup search (using FS4SP) on my SharePoint 2010 Authoring and Publishing site. The publishing site is meant for Internet facing site (configured for anonymous access).

    There's a lot of questions popping out of my head, e.g.

    Which crawler should I use? Using default crawler or Enterprise Web Crawler..

    If using default crawler, which type should I choose -- is it as SharePoint site or as Website? When crawling as SharePoint site, does it mean that the crawler can just get all the changes from SharePoint during incremental crawl? Or it's actually cannot be crawled as SharePoint site .. I noticed somehow the crawler does not use sitedata.asmx when I replaced default HTTP header to something else, as such the site was crawled as normal web site (I could not see any site column value)

    Next should I use crawl rule, either to crawl complex URL or include SharePoint contents as http pages, and specify default content access account. Is this content access related to above crawl type selection (as SharePoint)? Can i crawl as anonymous user instead -- using cookie perhaps?? Hmm, if the crawl type is SharePoint, can i use sitedata.asmx using anonymous crawler..

    Or should I crawl as per normal, and have a search scope to selectively return specific content class (e.g. only those from publishing feature)

    And if I'm using Enterprise Web Crawler, I guess I need to list down all content type files, max file size that is supported if using SharePoint default crawler, isn't it? Is there a template that at least mimicks the default setting that SharePoint crawler has..

    In short, is there a reference or best practice to crawl SharePoint 2010 Internet facing site.


    Monday, May 20, 2013 6:55 PM

All replies

  • 1. use the SharePoint sites crawler

    2. you can schedule full crawl, so instead of incremental, it will run full if desired

    3. make sure that you Robot file allows you crawler to run

    4. I do not recommend you to use Enterprise Web crawler

    5. do a crawl, test search and determine what rules you should put in place

    Hope it helps

    SharePoint MVP, Microsoft VTSP,

    Tuesday, May 28, 2013 1:31 PM