Search indexing of BDC information containing database fileds and PDF files RRS feed

  • Question

  • We are building a new application using SharePoint including a customized search interface. The information that we want to make available through SharePoint Search comes from two sources. We have many PDF files that we want to be indexed and made available for full text search. For each of the PDF files, we have related metadata information that is stored separately in an Oracle database table. We would like to pass this information to the search crawler so that the PDF file along with its corresponding metadata information is treated as a single item.

    I have created a webservice that will cycle through the Oracle database for each item to be indexed. The metadata information in Oracle contains a pointer to the PDF file on the file system. Within the webservice, I have been able to package together the required metadata along with the binary stream of the PDF document. All works well through this point.

    I have also successfully created an application definition file mapping the metadata information and the binary stream for the BDC. The problem that I encounter is when the search crawler begins its work. It essentially does not know how to handle the PDF file (byte array). 

    The reasoning behind this is that a full text search might return several thousand matches, then it might be helpful for the user to fine tune the search results by selecting values that match certain metadata fileds from the database.

    1. Is it even possible to pass this type of packaged information into the BDC is a way that the crawler can index both the metadata and the PDF file content as a single unit? If so, where can I find an example of how the ADF should be formatted for the search crawler to property interpret the file and content?

    2. Is this a job for a custom protocol handler? Where can I find a good code example of what a custom protocol handler should look like, preferably in C#?

    3. Are the other options to consider which would make it easier to link the database content to the pdf file when searching?

    4. Our environment consists of MOSS 2007, will this type of merging of data work for this environment? 

    5. Are there other technical articles or resources that would help to complete the task? I have seen some products and tools available from other vendors that sound like they would provide the functionality so it seems like it is possible to do, I am just having a hard time finding the right resources online to help.

    Thanks in advance for your help. If mor detailed information is necessary, please let me know.


    • Edited by Mike Walsh FIN Wednesday, January 6, 2010 4:52 PM Reference to 2010 removed These are forms for pre-2010 products.
    Wednesday, January 6, 2010 4:47 PM