none
Limit indexed content to a particular date RRS feed

  • Question

  • Hello all.
     
    I have a FAST installation for a SharePoint 2010 server that crawls public
    web sites. However, the information crawled should not remain in the
    database after a week. That is, if an URL is indexed today, it should be
    available up to the next week, but after that it should be removed.
     
    Is there any way in which I could configure a rule like this one: after x
    amount of time, the crawled URL should be deleted from the FAST database?
     
    Thanks in advance!
     
     
     


    Fernando A. Gómez F.
    fermasmas.wordpress.com
    Galería de ejemplos
    Thursday, February 2, 2012 8:00 PM

Answers

All replies

  • Hi Fernando,

    Can you please provide some more specific example of what you want to achieve? Especially your schedule of full or incremental crawls. It is not quite clear at the moment what you want to achieve.

    For example, you create a new Content Source, specify "Web Sites" as a type, http://www.microsoft.com as a Start Address. Do you just run a single full crawl and want the data to be automatically removed from the index after a week? Or do you recrawl this content regularly? If you recrawl it the next day, for example, are you considering http://www.microsoft.com URL one day old or zero days old? And after you delete it after a week, should it be indexed again in the next crawl or not?

    The procedure of removing URLs from search results is described here: http://technet.microsoft.com/en-us/library/ff191226.aspx. It requires knowing the Item ID, however; so may be not quite applicable for you; but at least describes what procedures need to be done.

    I guess, you would need to implement quite a deep customization here. It's unlikely you'll be able to achive what you want using the out-of-the-box configuration.

    Regards,

    Nikolay.

    Monday, February 6, 2012 12:26 PM
    Moderator
  • Hi,

    There is no easy way to prevent an item from being indexed during web crawl as long as the page is linked to.

    You can do something similar to by blog post about using the offensive content filter to drop documents, but a better approach might be to pick up a date from with the html (meta tags) and write out a new crawled/managed property pair with "expiredate". Then create a search scope to filter out expired pages on your search page. This will still keep the items in the index, but at least they are not displayed.

    Regards,
    Mikael Svenson 


    Search Enthusiast - SharePoint MVP/WCF4/ASP.Net4
    http://techmikael.blogspot.com/
    Monday, February 6, 2012 12:46 PM