none
Does a pipeline extension NEED to write the output if the resulting Document node has no children ? RRS feed

  • Question

  • I'm trying to pinch every millisecond off my custom pipeline extension. I've brought it up to an avg of 200 ms. It will probably be better if I don't have to write the output to disk if the resulting Document node has no children.

    Is it required for FS4SP to always write the output document ?

    Wednesday, August 8, 2012 7:25 AM

Answers

  • Hi,

    If you want to increase crawling speed, then kill off ifilter registation on the SharePoint server. Seems all items are run through ifilters on the SharePoint server even though they are converted once over on the FAST servers.

    Check this thread: http://social.technet.microsoft.com/Forums/en-US/fastsharepoint/thread/cc130ec2-fc86-4a8f-936e-9b00f5403d85#648c128e-7dc1-4b31-8b93-fbd96e8e4fdf

    You can also experiment with adding crawler impact rules to use more threads while crawling.

    Thanks,
    Mikael Svenson


    Search Enthusiast - SharePoint MVP/MCT/MCPD
    http://techmikael.blogspot.com/
    Author of Working with FAST Search Server 2010 for SharePoint

    Saturday, August 11, 2012 7:03 PM

All replies

  • Hi,

    It's not required, but you will see an error in the crawl log if you don't write one. And who knows how many milliseconds the error handling takes compared to processing an empty output file.

    The time taken to start your code (if it's .NET) is probably greater than writing an empty output file. Another option is to have an empty output file template in some folder, and then just copy it to the output file parameter for your stage. A file copy should be quicker than opening and writing an empty file.

    Also, if you are developing in .NET, use ngen.exe to precompile your application, and also make it 32bit (unless you need 64bit). 32bit .NET apps are a fraction faster as well... as we are talking milliseconds here ;)

    What kind of content volume are you working with, and you might be better off adding more document processors to handle the crawling queue instead of pinching a millisecond here and there.

    Thanks,
    Mikael Svenson


    Search Enthusiast - SharePoint MVP/MCT/MCPD
    http://techmikael.blogspot.com/
    Author of Working with FAST Search Server 2010 for SharePoint

    Thursday, August 9, 2012 7:44 PM
  • Hi Mikael, thanks for the reply.

    I'll leave the empty output file writing as is. Might check out if doing a file copy is indeed faster, but guessing it won't be much. NGEN didn't shave anything off the avg loading time for me, but will try the 32bit setting (using AnyCPU now).

    We added up to 6 document processors. Tests with 8 and 7 were slower due to too high CPU usage. 6 seems to be the sweet spot with the current setup. We're crawling medium volumes (around 3 million documents), so we really need an additional server, but that's not an option currently :)

    I think we've tweaked the custom pipeline extensions as much as possible, without resorting to a different programming language like C++, so we'll see how crawling goes and work our way from there.

    Cheers!

    S

    Friday, August 10, 2012 7:40 AM
  • Hi,

    Adding more docprocs is sometimes a matter of adding more servers to get more cpu ;) But of course, that involves hardware and license costs.

    You should also make sure all network settings are in order, eg turn off TCP offloading (http://social.technet.microsoft.com/Forums/en-US/fastsharepoint/thread/6907ee55-02ac-4706-ad29-ff2fd420cab2/).

    Thanks,
    Mikael Svenson


    Search Enthusiast - SharePoint MVP/MCT/MCPD
    http://techmikael.blogspot.com/
    Author of Working with FAST Search Server 2010 for SharePoint

    Saturday, August 11, 2012 5:20 AM
  • I turned off TCP offloading on the FAST Server already but not on the SP server. Not sure if both are required.

    The "items crawled per second" on the Content SSA Admin Page fluctuates between 5 and 15. Doesn't seem that fast but not sure if that number is any bit reliable. If it is, then it seems low. I've been monitoring some counters mentioned by you and others and they seem pretty fine, but then again I might be interpreting them wrong :)

    It feels like crawling is going at best with the current setup. I can't really compare though.

    Saturday, August 11, 2012 9:57 AM
  • Hi,

    If you want to increase crawling speed, then kill off ifilter registation on the SharePoint server. Seems all items are run through ifilters on the SharePoint server even though they are converted once over on the FAST servers.

    Check this thread: http://social.technet.microsoft.com/Forums/en-US/fastsharepoint/thread/cc130ec2-fc86-4a8f-936e-9b00f5403d85#648c128e-7dc1-4b31-8b93-fbd96e8e4fdf

    You can also experiment with adding crawler impact rules to use more threads while crawling.

    Thanks,
    Mikael Svenson


    Search Enthusiast - SharePoint MVP/MCT/MCPD
    http://techmikael.blogspot.com/
    Author of Working with FAST Search Server 2010 for SharePoint

    Saturday, August 11, 2012 7:03 PM
  • Thanks, I had already applied the iFilter fix and crawler impact rules.

    Any idea if the "crawled items per second" can be considered any reference, or should I rely on perfmon for that ?

    Regards,

    Steven


    Sunday, August 12, 2012 11:38 AM