none
Guidelines sort on crawling rules and account used RRS feed

  • Question

  • These questions relate to providing credentials for various crawl operations

    When setting up Sharepoint 2010 and Fast Search special accounts were used, e.g. wssservice, sqlservice

    These accounts not having general network file access because the passwords don't expire. So for this a general account domain account is needed.

    1. For network crawls, is read permissions effectively all that is needed for the crawl account ?
    2. Will using a general domain account mean a password needs to be re-done every password expiry period ?
    3. Does Sharepoint offer some clean "password expired" messages or email alerts ?
    4. How far up the crawl setup (like a new service...) do you need to go if in indexing different network or other sharepoint sites, when you need different users? e.g. how is this done ?
    5. I also want to crawl our existing corpoate MOSS 2007 Sharepoint (as our IT don't know how to enable search in it as the former Sharepoint expert left the company). This may require a different user account.

    6. Do i need crawl rules? if i am happy to see all the files in the crawl ? The crawl has found no files yet manually i have access to all the files in the crawl directory ?
    It would be really useful if someone could provide a real example, the help talks in the abstract, e.g. i have a file share and and ahhpy (at first) to show all the file type matches, what is my rule ?
    I have asked to crawl my own site collection that the Fast Search Site lives under, what is my crawl rule ?
    Currently on the Fast Search Server, my 'Get-FASTSearchContentCollection -Name "sp"' only shows 1, the item in 7. . so i don't understand what you need to do to make it work, e.g. crawl rules seem to be not tied to content sources, yet the the rule for either could be quite different.
    7. I manually on the fastsearch server added \\abc-file01.thecompany.com\volatile\Common\readme.txt
      using "docpush", sp2010 search result
    http://cohowinery.com/\\abc-file01.thecompany.com\volatile\Common\readme.txt
    Will the normal crawl when it works not have the http://cohowinery.com ?
    Obviously when i click on this, it does not resolve.


    Thanks in advance

    (i owe who can answer all these a few beers)





    Sunday, October 30, 2011 5:54 AM

Answers

All replies

  • Hi,

    First off I recommend you take a look at the following excellent posts by Corey Roth

    How to: Configure Enterprise Search to index a file share

    How to: Properly set permissions on your Search crawl account in SharePoint 2010

    Corey’s Guide to SharePoint Service Accounts

    Then over to you questions :)

    1. Yes (but read the first link)

    2. Related to 3, but yes :(

    3. Take a look at using "managed accounts", but this should not be used for crawl accounts (unfortunately)

    4. See 6.

    5. See 6 (and you have to make sure your crawler user has full read policy on the MOSS farm)

    6. Unless your default crawl account has read access to the source you are indexing, then crawl rules is the way to make a particular source use different credentials during indexing. And crawl rules are tied to the server (file://server, http://sharepoint, http://webserver, bcs://lob) you are indexing, not the content source itself.

    Also you have to make sure the crawler user has the correct permissions in the system you are indexing. For SharePoint this is the "full read" policy. The default crawler user is given this right automatically.

    7. docpush is for testing, and you have a paramter which allows you to use something different than cohowinery. And yes, when you get indexing to work you will get proper urls.

    From reading this I gather you have not been able to index (any source?). Have you checked your crawl logs for errors?

    I think I answered all your questions, but please follow up if I didn't. And the lucky part is that crawling is the same for FAST as for SharePoint when using the SharePoint connectors. This makes it a tiny bit easier to find information on it :)

    Regards,
    Mikael Svenson 


    Search Enthusiast - SharePoint MVP/WCF4/ASP.Net4
    http://techmikael.blogspot.com/
    Sunday, October 30, 2011 6:56 PM
  • No joy

    1. On the crawl of my web sites the in get an error about no crawl flag set for the the sub site my "Fast Search" one. OK it isn't set. The top level site has a description "Top level O&O site"

    I entered "top level" in the search and it found nothing. Is it simply that the description of a web site is not included in the crawl ? and crawls are only for web part content so any data not fitting into this is not found ?

    Is it possible text in html or web parts is not crawled ? Only documents are crawled ? (see http://social.technet.microsoft.com/Forums/en-US/sharepoint2010general/thread/04a4c5d2-b0d2-4356-b0f8-8433497e5d39 for the text)

    2. Crawl network files - The error logs show 2 errors, and 2 warnings. Errors for 2 files locked for read acess, and 2 warnings for the file > 16MBytes (max download)
    I am not sure what should be in the success category, every file indexed ?
    Cause there are none in the logs. Success has 0 entries. Note there are 100s of files in this network area.
    >> I assume an error would not cause a crawl to stop ? especially a file in use error.
    >> Is there anyway to check what files have been considered and is that needed to be done on the Fast Search Server ? Why would the system be happy to report issues but not index the other files. I had added all the Office files + pdf, +txt to the file types.
    I was suprised how many normal files are NOT in the file types list.

    3. Central Administration : FASTContent: Manage Crawl Rules

    Unfortunately your URLs had nothing on crawl rules so i am still lost. I deleted what i had there in case it was the wrong format. I don't understand what the crawl test is trying to do because when i enter the server name it finds nothing. Also i assume at the test stage it uses the default account. My guess is because the Fast Search side is showing no entries, this means i will never get a match anyway...
    i.e. until i can get files indexed my crawl rules are academic.

    4. The "Policy for Web Application" showed my account with full control however the administrators for FASTContent was empty. Should it me ? I added the general account i am using but this made no difference.

     

    I assume there is hope, this stuff does work but you have to find that obscure setting which is causing the issue...

    Thanks


    Monday, October 31, 2011 6:53 AM
  • Hi Greg,

    I'll answer real short...

    Quote: "I had added all the Office files + pdf, +txt to the file types"

    That's the error right there. If you read closely on the text at the top for "File Types" in the Content SSA, it says "exclude" all files listed. Not "include" as the SharePoint SSA has. So by adding all your know types you actually prevented them from being indexed. Leave it as default, and all should be well.

    Regards,
    Mikael Svenson 


    Search Enthusiast - SharePoint MVP/WCF4/ASP.Net4
    http://techmikael.blogspot.com/
    Monday, October 31, 2011 7:41 AM
  • Thanks Mikael

    Yes it was exclude, silly me. Ok after i fixed this, cleared the index, reboot both servers and re-run the crawl and get similar errors to before

    SP2010 side - Can't find any files, can't find some keywords i added to my topsite page in a rich text box.

    On the Fast Server side - The http://localhost:13280/finds nothing and the PS for FS4SP  Get-FASTSearchContentCollection -Name "sp" shows 0 entries.

    >> So how to i track the issue down, is there some verbose logging mode to tell me what is happening ?

    I know the query system works (as above) because if i manually add a file on the FS4SP side, i can find it.

    >> Note, the network area is only 1.2G in size. I deliberately chose something small, The top sharepoint site has about 10 words in it. However the current crawl duration is 1:15m .
    This to me suggests it is stuck ?

    Very frustrating, thanks again

    p.s. found there are some reports , what does the retries mean ?

    http://abc-sp2010:30000/AdminReports/Search%20administration%20reports/CrawlRatePerType.aspx


    Monday, October 31, 2011 11:01 AM
  • Hi,

    My best guess is you have not set up the certificates correctly for the Content SSA as per http://technet.microsoft.com/en-us/library/ff381261.aspx#BKMK_Configure_ssl_enabled_communication

    That or a firewall blocking access to the content distributor from the SharePoint farm, or you didn't add the correct content distributor address/port for the Content SSA.

    Regards,
    Mikael venson 


    Search Enthusiast - SharePoint MVP/WCF4/ASP.Net4
    http://techmikael.blogspot.com/
    • Marked as answer by Greg B Roberts Monday, October 31, 2011 11:36 PM
    Monday, October 31, 2011 1:06 PM
  • Success ...

    I redid the sertificate on Sharepoint from FS4SP and it came good . I did this earlier (i have a screen capture) but some how something must have reset this.

    Sharepoint site indexing is not working so i will raise a fresh post.

    Thanks again

    Monday, October 31, 2011 11:37 PM