locked
Managed Property for non text columns RRS feed

  • Question

  • Installation: FAST for SharePoint2010

    We want to add managed properties for the following SharePoint fields types:

    • Hyperlink
    • Publishing Image

    Our crawl is returning null for these types, since they are essentially HTML field types underneath. We found that it works if we convert these fields to a text type.

    Obviously, this is not the right approach, as we are changing our design to cater for FAST's shortcomings.

    My question is: Is anyone aware of a way to keep the field types and still be able to create managed properties?

    Thanks in advance.

     


    -- With Regards Shailen Sukul Entrepreneur/Software Architect/Developer/Consultant/Trainer (BSc | Mct | Mcpd (.Net 2/3.5/SharePoint2010) | Mcts (Sharepoint 2010/MOSS/WSS), Biztalk, Web, Win, Dist Apps) | Mcitp(SharePoint) | Mcsd.NET | Mcsd | Mcad) MSN | Skype | GTalk Id: shailensukul Twitter: http://twitter.com/shailensukul Ph: +1 916 359-9557 Website: http://sukul.org Blog: http://shailen.sukul.org/ http://www.linkedin.com/in/shailensukul
    Friday, June 17, 2011 5:00 PM

Answers

  • Here's the reply from our FAST expert:

    Unfortunately, the value for the Publishing Image column is stored as HTML (e.g. <img alt="" src="/SiteCollectionImages/PR.gif" style=" BORDER: 0px solid; ">) and the SharePoint crawler ignores HTML content.

     

    Here are two posts discussing the issue and options:

    ·         http://sjoere.blogspot.com/2009/12/publishing-image-does-not-get-indexed.html

    ·         http://stefan-stanev-sharepoint-blog.blogspot.com/2009/11/sharepoint-search-and-html-meta-tags.html

    So in short, it is not possible.


    -- With Regards Shailen Sukul Entrepreneur/Software Architect/Developer/Consultant/Trainer (BSc | Mct | Mcpd (.Net 2/3.5/SharePoint2010) | Mcts (Sharepoint 2010/MOSS/WSS), Biztalk, Web, Win, Dist Apps) | Mcitp(SharePoint) | Mcsd.NET | Mcsd | Mcad) MSN | Skype | GTalk Id: shailensukul Twitter: http://twitter.com/shailensukul Ph: +1 916 359-9557 Website: http://sukul.org Blog: http://shailen.sukul.org/ http://www.linkedin.com/in/shailensukul
    • Marked as answer by Shailen Sukul Friday, June 17, 2011 10:45 PM
    Friday, June 17, 2011 10:44 PM

All replies

  • Here's the reply from our FAST expert:

    Unfortunately, the value for the Publishing Image column is stored as HTML (e.g. <img alt="" src="/SiteCollectionImages/PR.gif" style=" BORDER: 0px solid; ">) and the SharePoint crawler ignores HTML content.

     

    Here are two posts discussing the issue and options:

    ·         http://sjoere.blogspot.com/2009/12/publishing-image-does-not-get-indexed.html

    ·         http://stefan-stanev-sharepoint-blog.blogspot.com/2009/11/sharepoint-search-and-html-meta-tags.html

    So in short, it is not possible.


    -- With Regards Shailen Sukul Entrepreneur/Software Architect/Developer/Consultant/Trainer (BSc | Mct | Mcpd (.Net 2/3.5/SharePoint2010) | Mcts (Sharepoint 2010/MOSS/WSS), Biztalk, Web, Win, Dist Apps) | Mcitp(SharePoint) | Mcsd.NET | Mcsd | Mcad) MSN | Skype | GTalk Id: shailensukul Twitter: http://twitter.com/shailensukul Ph: +1 916 359-9557 Website: http://sukul.org Blog: http://shailen.sukul.org/ http://www.linkedin.com/in/shailensukul
    • Marked as answer by Shailen Sukul Friday, June 17, 2011 10:45 PM
    Friday, June 17, 2011 10:44 PM
  • Hi Shailen,

    If you don't like the approach which you mention about using meta tags (which is a good solution in my opinion) you could solve this using the content processing pipeline of FAST instead.

    Create a custom document processor which checks for your content types, and if it matches execute a call to SharePoint using the web service API to retrieve the values of those two fields. Then write them out to two new crawled properties which you map to managed properties, which can be used for searching and search results.

    The content type is specified in this crawled property:

    Property Set Guid: 00130329-0000-0130-C000-000000131346
    Name: ows_ContentType
    Variant Type:31 (string) 

    So it is indeed possible to solve this, but it requires custom coding either on the SharePoint side or on the FAST side :)

    Regards,
    Mikael Svenson 


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/
    Saturday, June 18, 2011 6:52 PM
  • Hi Mikael,

    Thanks for the reply.

    For practical purposes, the extra column approach will solve the issue for us in the short term due to time contraints.

    However, from, an architectural perspective, it feels just plain wrong to have to add extra columns to work around the crawl issue and this could become an issue for a large number of HTML columns.

     

    My 2c.

     

    Thanks,

    Shailen 


    -- With Regards Shailen Sukul Entrepreneur/Software Architect/Developer/Consultant/Trainer (BSc | Mct | Mcpd (.Net 2/3.5/SharePoint2010) | Mcts (Sharepoint 2010/MOSS/WSS), Biztalk, Web, Win, Dist Apps) | Mcitp(SharePoint) | Mcsd.NET | Mcsd | Mcad) MSN | Skype | GTalk Id: shailensukul Twitter: http://twitter.com/shailensukul Ph: +1 916 359-9557 Website: http://sukul.org Blog: http://shailen.sukul.org/ http://www.linkedin.com/in/shailensukul

    Sunday, June 19, 2011 3:47 AM
  • Hi Shailen,

    I certainly agree with your architectural view on this. My experience over the years with search solutions has shown me that you always have to create a work around at some point, as something is not like you expect out of the box.

    Work arounds increases complexity, but at least you have different choices on how this can be solved. I think checking on the content type in a custom stage is most flexible, as you don't have to add extra columns, but indexing time will be affected as you have to do a web service lookup. Introducing some caching could make lookups more efficient, but would increase complexity ,)

    I guess when it comes to indexing, most things are possible, just not out of the box. And this is where the custom pipeline stages in FAST help a lot compared to the built-in SharePoint search.

    Regards,
    Mikael Svenson 


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/
    Sunday, June 19, 2011 11:45 AM
  • Couple of comments for those reading this thread.

    When we're talking about custom stages in the FAST For SharePoint world we mean utilization of PipelineExtensibility (http://support.microsoft.com/kb/2455437). Basically it enables you to output the document before it's been indexed. Modify the object. e.g. add meta tagging or a taxonomy, then push the modified document back into FAST where it continues to be processed and then finally indexed.

    Your own custom code is called by pipeline extensibility so you can do just about anything you want. However this sort of freedom has a price. It's very easy to do architectually unsound things if you don't think about it. So I'd always recommend an experienced architect designs how this is gone to work before handing it over to an inexperienced developer.

    e.g you could use pipeline extensibility to call a web service, which sounds fine. But considerations such as latency, fail over of the web service and in high document throughput scenarios. Can the web service scale to 10,20,30,40 etc calls per second?

    Regards

    Peter Petley

    National Competency Lead Enterprise Search
    EMC Consulting

    Monday, June 20, 2011 4:34 PM
  • Thanks Peter :)

    I agree 100% and should have stated this as well. Sometimes I get a bit caught up in my own bubble, and forget to write about the architectural consequences.

    I've been teaching some FS4SP courses this spring/summer and I always try to emphasize that you have to consider the total cost of any solution you create, as I mentioned with the complexity.

    Every extra module you create in a solution adds to the complexity and cost of a project, and you should always balance out the benefits with the costs. My experience in FAST ESP projects leads me to think most consultants just hack away on custom pipeline stages without concert for the future.

    -m


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/
    Monday, June 20, 2011 7:21 PM