how does one know how far along a crawl is RRS feed

  • Question

  • I have a two crawls that seem to be running for every. My document count does go up so it is doing something.

    I am crawling cifs shares which are very large (> 10TB) with many millions of files, crawls have beeing going for over a 1000 hours.

    My question really is how do I tell how far along is it. Is there somewhere I can see that is has traversed xx TB or some other indicator, so that I may estimate a time to completion?


    Tuesday, March 27, 2012 10:00 PM

All replies

  • The only way is to compare the total number of items in the content source vs the number of items that have been crawled.
    Tuesday, March 27, 2012 10:19 PM
  • lets use some numbers...

    I know that my cifs shares contain 18,269,236 files of all types, Fast reports a document count in the collection of 5,897,292.

    When starting the crawls I also added many file extensions to ignore.

    So just comparing the two values is not a valid estimate as I do not know how many files were excluded.

    It would be cool to know when scanning cifs shares how far along that cifs share it has traversed.

    Tuesday, March 27, 2012 10:35 PM
  • In that case, Have a look through the Crawl Logs in the Content Service Application. It will show you the last files which were crawled. Based on that you can see generally where it is in the filesystem.
    Tuesday, March 27, 2012 10:47 PM