none
Crawling root URL address instead of the given URL RRS feed

  • Question

  • Good
    afternoon,<o:p></o:p>

    I am new at
    FAST Search. I have successfully setup a content source in the FAST Content
    Service Application, as well as FAST rules. The system crawls well and returns
    items on my FAST Search SharePoint web page. <o:p></o:p>

    The issue
    that I am having, and it's probably me not understanding something, is that I
    have items returned from address locations I have not yet specified to be
    crawled.

    Example:<o:p></o:p>

    I have a
    test.doc file which is found in the http://moss/sites/xxxx/SitePages/Home.aspx,
    yet it shouldn’t be displayed because the only crawl address I performed was on
    http://moss/sites/yyyy/SitePages/Home.aspx
    which is a different address.<o:p></o:p>

    I guess I
    could have created rules to specifically ignore theses locations, but it seems
    a little odd to me that it would crawl a adjacent address, in addition to the
    specified one.<o:p></o:p>

    May I ask,
    what I configuration step I missed?<o:p></o:p>

    Thank you, <o:p></o:p>



    Monday, July 2, 2012 4:41 PM

All replies

  • When you setup the content source what did you specify in the Crawl Settings section? The options are:

    'Crawl everything under the hostname for each start address'

    OR

    'Only crawl the Site Collection of each start address'

    The first option will crawl everything under http://moss in your case.

    Also, I noticed that you referenced .aspx pages in your URLs. Where does the doc actually reside?

    Tuesday, July 3, 2012 1:38 PM
  • Thank you for your help Chris,

    When I set up the Content Source under the Content Source Type section, where it states: "Select the type of content to be crawled:" , I selected "web sites". Then typed this address: http://moss/sites/xxxx/SitePages/Home.aspx

    Then in hte section right below that I selected custom where it says "Select crawling behavior for all start addresses in this content source:"

    Down below is a screen shoot of my current source content:

    My documents resides throughout the SharePoint site located at the URL address I copied from the browser, which in this case the home page does contain a .aspx extension. Would that be a problem?

    thank you,

    olivier

    Thursday, July 5, 2012 11:54 AM
  • I found the answer here on a microsoft article:

    Crawl only the SharePoint site of each start address

    NoteNote:

    This option accepts any URL, but will start the crawl from the top-level site of the site collection that is specified in the URL you enter. For example, if you enter http://contoso/sites/sales/car but http://contoso/sites/sales is the top-level site of the site collection, the site collection http://contoso/sites/sales and all of its subsites are crawled.


    http://technet.microsoft.com/en-us/library/cc160648.aspx

    Thursday, July 5, 2012 1:18 PM