none
Crawling fileshare with metadata of type datetime RRS feed

  • Question

  • I have created a contentsource which crawls a fileshare of html-files. Each of these files have a metadata tag like this:

      <meta name="WT_Date" content="08-12-2006" />

    I have also tried writing it in this format:

      <meta name="WT_Date" content="2006-12-08 00:00:00" />

    To be able to filter on this metadata, I would like to make Sharepoint recognize this as a datetime datatype.

    Is that possible?


    \Martin
    Thursday, February 17, 2011 10:35 AM

All replies

  • There's a couple of steps you need to take to get this working. Meta tags are treated as strings and seems like it is not possible to map a crawled property of string to a managed property of type datetime with success.  The only solution I have found is to create a custom pipeline module. If anyone has a smarter solution I'd love to hear it as well.

    First of all the best date format to use is:

    yyyy-mm-ddThh:mm:ssZ
    

    this is something FAST can work with. Either prepare it like this in your content, or add code in the pipeline module to rewrite it.

    The other point to notice is that when your meta field is crawled it is outputted as a crawled property of type string. In order to use it as a date you have to create your own crawled property of type date (datatype 64), and copy the value over.

    For your date it should appear in the crawled property like:

    Category: d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 (Web)

    Name: WT_DATE (they are all upper cased in the pipeline)

    VariantType: 31

    Here's a short snippet of code from a custom pipeline stage which retreives the meta date value from your "WT_Date" tag and copies it to a new one with the same name, but with the correct data type. The unique key of a crawled property consists of the metadata group, the property name and the data type.

    internal class DateFixer
    {
      private static readonly Guid _crawledCategoryWeb = new Guid("d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1");
    
    
      // Actual processing
      public void DoProcessing(string inputFile, string outputFile)
      {
        XDocument inputDoc = XDocument.Load(inputFile);
        // Fetch the content type property from the input item
        var res = from cp in inputDoc.Descendants("CrawledProperty")
              where new Guid(cp.Attribute("propertySet").Value).Equals(_crawledCategoryWeb) &&
                cp.Attribute("propertyName").Value.ToLower() == "wt_date" &&
                cp.Attribute("varType").Value == "31"
              select cp.Value;
    
        if (res.Count() == 0) return;
        // Create the output item
        XElement outputElement = new XElement("Document");
        if (res.Count() > 0 && res.First().Length > 0)
        {
          outputElement.Add(
            new XElement("CrawledProperty",
                    new XAttribute("propertySet", _crawledCategoryWeb),
                    new XAttribute("propertyName", "wt_date"),
                    new XAttribute("varType", 64), res.First())<br/>        );
        }
        outputElement.Save(outputFile);
      }
    }
    

    The corresponding pipelineextensibility.xml would look like this:

    <PipelineExtensibility>
     <Run command="C:\pathtomodule\DateFixer.exe %(input)s %(output)s">
      <Input>
       <CrawledProperty propertySet="d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1" varType="31" propertyName="WT_DATE"/>
      </Input>
      <Output>
       <CrawledProperty propertySet="d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1" varType="64" propertyName="wt_date"/>
      </Output>
     </Run>
    </PipelineExtensibility>

    The last step I'll leave for you to complete, and that is mapping the new crawled property to a managed property of type Datetime.

    Regards,
    Mikael Svenson

     


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/ - http://www.comperiosearch.com/
    Thursday, February 17, 2011 3:05 PM
  • Thanks for the answer. It amazes me, that there is no out of the box way to do this.

    I have never created a pipeline module before. Do you have a short guide to how this is done?


    \Martin
    Friday, February 18, 2011 7:27 AM
  • I agree with the date parsing, and it should have been possible to reset the data type of a crawled property.

    I did a blog post in December with a short tutorial in C#. It has a code sample link at the bottom as well (which I modified for my answer). It is also possible to create pipeline stages in other languages like powershell. The idea is that the stage is passed to file references. One where it reads data, and one where it writes data.

    http://techmikael.blogspot.com/2010/12/how-to-debug-and-log-fast-search.html

    -m


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/ - http://www.comperiosearch.com/
    Friday, February 18, 2011 8:16 AM
  • This would be a great solution, if the client actualy had the FAST license. They use standard sharepoint, which does not have this opportunity.

    Any bright ideas? Other than telling them to upgrade to FAST ;-)

    By the way. I just realised, that I posted this in a wrong forum, since they do not have FAST. Sorry.


    \Martin
    Thursday, March 3, 2011 9:23 AM
  • Ask in the general SP forum. Maybe someone there has a brilliant idea which don't include creating a custom protocol handler and crawler. Ideally you should just map the crawled property to the managed one and have SP handle the conversion between string and datetime.

    Have you tried this btw?

    -m


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/ - http://www.comperiosearch.com/
    Thursday, March 3, 2011 10:23 AM
  • I did try that.

    I'm not allowed to map a crawled property of type string, to a managed property of type date.


    \Martin
    Thursday, March 3, 2011 11:13 AM
  • Ok, so the same holds for SP as for FAST. What if you have your date in the form:

    yyyyMMdd and map it to a managed string property and sort on this string instead of the document date? Will that help you out? For the web items you would then use this date property when displaying the date as well.

    -m


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/ - http://www.comperiosearch.com/
    Thursday, March 3, 2011 1:42 PM
  • Nice idea. But the problem is not sorting. The problem is when selecting the filter "Results within the last 6 months" or likewise.

    And furthermore, the results are a mix of sharepoint results with correct dates, and my result with text dates. Naturally I would like for them all to have the same behaviour.

    I have asked the same question, in the general SP forum. Hoping for lots of replies in there :)

    Thanks for your effort, Mikael!


    \Martin
    Friday, March 4, 2011 7:46 AM