none
Importing crawled document metadata via XML RRS feed

  • Question

  • For crawled documents residing on a file share (i.e. not in SP document libraries) how can I import metadata relating to each document e.g.

    • Category: 'Sales Report', 'Expenditure Request', 'Travel Authorization' ...
    • Budget Code: ABC-123, ABC-124, DEF-123 ...
    • ...

    The metadata above are not stored as properties of the document itself. The key field linking the crawled file share document with the corresponding metadata would be its full path\filename.extension.

    We need to be able to search for, for example, Category='Sales Report' AND Budget Code='DEF-123' and have \\server\share\folder\JBLOGGS-SR.DOCX returned irrespective of the contents of the document itself.

    The metadata can be stored in CSV/DB/XML/... format whatever is needed

    Thanks, Shane

    Wednesday, July 20, 2011 12:58 PM

All replies

  • you can extend the pipeline with a stage that would fetch the metadata from file or DB and add it as metadata
    SharePoint MVP, Microsoft VTSP, http://www.arcovis.com
    Thursday, July 21, 2011 8:24 PM
  • Hi Shane

    The best way to implement this, is to use to implement a BCS connector and merge the metadata with the document binaries there. You would then use the StreamAccessor method (some details here: http://msdn.microsoft.com/en-us/library/ff634782(office.14).aspx) to "stream" the document binaries across. This has multiple benefits, e.g. better ability to do incremental crawls when only metadata is updated.

    Regards


    Thomas Svensen | Microsoft Enterprise Search Practice
    Saturday, July 23, 2011 8:44 AM
    Moderator
  • Hi Natalya,

    are there any examples of how to do this? I haven't worked with extending the pipeline before.

    Thanks, Shane

    Monday, July 25, 2011 7:02 AM