Odeslat dotazOdeslat dotaz
 

Dotazindex only partially works

  • 2. července 2009 10:58Saaffy Uživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaile
     
    Hi,

    we have to sites with one doclib per site. One golds about 150.000 pdf docs and one holds aboput 350.000 pdf's. I did not install the pdf ifilter, for now we only need to index metadata, created to content sources ofr eacht doclib one. If I do a reset and a new full crawl, the log says for the first doclib it finds about 112.000 documents, for the second one it only find 50.000 documents. I am not sure if it is related but there is one error per content source:
    The item may be too large or corrupt. You may also verify that you have the latest version of this IFilter.

    Questions:
    1) How would I go about to determine which items caused the above error
    2) why does the crawler stop. Can I configure it to NOT stop on an (or this) error?

    Sander



Všechny reakce

  • 2. července 2009 14:30Serge Luca [MVP]MVPUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaile
     
    -I don't fully understand : you didn't install the pdf IFilter and you get this error ?
    -your error reminds me an old post with the same error.


    Serge Luca; blog: http://www.redwood.be
  • 2. července 2009 15:18Mike Walsh MVPMVP, ModerátorUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaile
     
    I don't understand why anyone with 150,000 + 350,000 PDF files doesn't install a PDF filter ...
    WSS FAQ sites: http://wssv2faq.mindsharp.com and http://wssv3faq.mindsharp.com
    Total list of WSS 3.0 / MOSS 2007 Books (including foreign language) http://wssv3faq.mindsharp.com/Lists/v3%20WSS%20FAQ/V%20Books.aspx
  • 2. července 2009 17:44Saaffy Uživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaile
     

    Mike, though your question is a bit off topic, I would still like to anwer it. If my customer searches for 12345, he wants to find only documents where customer number, ordernumber or productnumber equals 12345 and not any telephonenumbers, addresses or other unstructured infomation. If you have about 3mljn documents in total I would say at least this is valid strategy. Besides that, the customer has a very bad history with 2003 and indexing. It made itself very depening on the accuracy of the index and though causes vary, they have had days of non-production because of rebuilding indices. Rebuilding without the content takes about 45 minutes per 500.000 docs. I would guess in future, the ifilter might come back, but for now....

    Serge, Yeah I have NOT installed the ifilter (I have to say I have installed it (adobe 9 64bit) but deinstalled it again, hopefully that does not make any difference). I created 3 sites now with each its own contentdb + contentsource and all stop at a certain point, way before all items are indexed. All have this error msg....


    Sander 

  • 2. července 2009 18:28Serge Luca [MVP]MVPUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaile
     
    Sander,

    1°not using the pdf IFilter makes sense in your case, obviously you don't need the content, just the metadata and a link to the document...
    2°I'm a bit surprised by the error + lack of performance, what is your farm topology ?
    Are you using sp2 ?
    Some advices in the following hyperlinks
    http://sharepointsearch.com/cs/blogs/notorioustech/archive/2009/03/06/sharepoint-indexing-performance-tips.aspx
    http://technet.microsoft.com/en-us/library/cc262574.aspx --->here they crawl through 50 millions of documents...


    Serge Luca; blog: http://www.redwood.be
  • 3. července 2009 6:50Saaffy Uživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaileUživatelské medaile
     
    Hi Serge,

    thanx for you response. we have medium farm (2 wfe, SQL 2008 cluster (DELL SAN)+ dedicated app server voor centraladmin, search + index all w2k8 64 bit moss sp2 . Our topology has on extra app server (w2k3 32bits) for dedicated PDF conversion (we convert everything (scan, mail office tiffs etc) to pdf, but unfortunatly 3rd party software is not yet fully 64bit). Servers have 4 quadcores 12 Gb mem. So farm is very capable of doing the job (while indexing, SAN throughput seems to be bottleneck). That is why I am so in the mist here. It just stops and does nothing, new increment brings nothing, new full does nothing more... I will check out the links

    Sander