none
Web Sites content source does not remove deleted pages from index RRS feed

  • Question

  • We have configured a Web Sites content source for testing purposes and it indexes a small web site (5 pages) successfully.

    The problem is that deleted pages (the pages and all links to them were removed) are not being removed from the index when INCREMENTAL crawler runs and keep showing up as search results. Running a FULL crawl does the job.

    The doclog tool indicates that it decided for a DEL operation on the removed content, but it never gets deleted by incremental crawl. The page still resides on FixML too.

    What is going on?

    Thanks!

    Thursday, May 31, 2012 10:04 PM

All replies

  • Hi Andre,

    Did you try to run incremental crawl 3 times?

    Per MS http://technet.microsoft.com/en-us/library/ff621096.aspx -

    "When a crawler cannot find an item that exists in the index because the URL is obsolete or it cannot be accessed due to a network outage, the crawler reports an error for that item in that crawl. If this continues during the next three crawls, the item is deleted from the index".

    Hope this helps.

    Friday, June 1, 2012 5:49 PM
  • No. Incremental crawl ran "n" (n > 3) times in a row and the content never was deleted from the index. I suppose there is no difference between a scheduled incremental crawl and a manually started incremental crawl.

    Whenever I delete a file from the web server *and* remove any link to it, each incremental crawl run shows a deleted file error on crawl log ("The object was not found."). Crawler knows there is a missing file but no deletion is done from the index.

    Unlike I stated before (I was wrong), there is NO delete operation on this file id from doclog output.


    Friday, June 1, 2012 7:32 PM