none
Issue with multi-valued crawled property (Managed Metadata type) which we are mapping to a managed property RRS feed

  • Question

  • Hi,

    The problem which we are facing is, we have a multi-valued crawled property (Managed Metadata type) which we are mapping to a managed property.

    The managed property has been created, setting MergeCrawledProperties to false.

     

    On the page http://technet.microsoft.com/en-us/library/ff393811.aspx , it states that if MergeCrawledProperties is not set, only the first element is stored in the managed property. We are seeing this behaviour in our environment.

     

    That is causing an issue with refinement results based on this managed property, because it is not including the second element as a result.

    For example, if we crawl the following content:

     

    Doc1.doc           Property1;Property2

    Doc2.doc           Property1;Property2

    Doc3.doc           Property2

     

    Refinement results look like this:

    Property 1           RefinementCount = 2

    Property 2           RefinementCount = 1

     

    Ideally we would like to see Property 2’s RefinementCount = 3 to accurately reflect our source data.

     

    We also tried setting MergeCrawledProperties to true , however this merges results together.

     

    For example, with our scenario above, refinement results would look like:

    Property 1 Property2     RefinementCount = 2

    Property 2                     RefinementCount = 1

     

     

    On the following site http://social.technet.microsoft.com/wiki/contents/articles/multi-value-property-support-in-fast-search-server-for-sharepoint.aspx , it suggests that the refinement aggregation will ensure that each multi-value is counted separately, so we’re not sure if we need to configure something else.

     

    Kindly Help us out, as this has been a critical requirement for our project.

     

    Regards,

    Ash

    Thursday, September 9, 2010 12:51 AM

All replies

  • Hi Ash,

    The issue you are observing is probably related to the fact that FAST must be seen your crawled property just as a simple string that happens to have a separator, ";" in this case, that isn't been considered.

    One option you can follow to get this working is to create a custom processing component using the Pipeline Extensibility (as described here http://msdn.microsoft.com/en-us/library/ff795801.aspx) configured to replace your current separator ";" with the internal multi-value character separator "\u2029".

    And yes, you will need MergeCrawledProperties set to true to get all the multiple values mapped to your managed property.

    Hope that helps.

    Best,
    Leo

    • Proposed as answer by leonardocsouza Thursday, September 16, 2010 4:04 AM
    Thursday, September 16, 2010 4:04 AM
  • Were you able to get this working? I have been trying to get FAST to recognise my string as a multi-value by adding in '\u2029' but I am not having any luck.
    Friday, September 17, 2010 1:16 PM
  • Yes, I was able to get this working with a code like this inside my Pipeline Extensibility custom code:

            string currentSeparator = ";";
            string multivalueSeparator = new string(new char[] { '\u2029' });
    
            currentValue = currentValue.Replace(currentSeparator, multivalueSeparator);
    

    Where "currentValue" is the string variable that contains the value of the crawled property where I want to replace the ";" separator with the special multivalue separator.

    Also, as mentioned above, don't forget to make sure you configure your Managed Property (the one that will receive the contents of this multivalued crawled property) with MergeCrawledProperties=true.

    What is the issue you are having when trying this? If you provide additional details I may be able to help.

    Best,
    Leo

    Friday, September 17, 2010 4:02 PM
  • I managed to get this working using ";", but not intentionally : ) I went to "C:\FASTSearch\index-profiles\deployment-ready-index-profile.xml" and changed the "seperator" attribute on my field from "no" to "yes". Then I ran an import as I had before and during this process "FAST" updated all of the files below and configured my field to support multi-value and set the seperator value to ";". When I noticed the changes it had made I went and set my seperator value (in my data) back to ";", I recrawled, and it split my field up into multi-value based on ";" as I would have expected. I will look into this more to try and fully understand what is going on, but if someone from MS can answer that would be great.

    C:\FASTSearch\var\qrserver\webcluster\13280\cache_cs\etc\qrserver\tango\configuration.attributes.xml
    C:\FASTSearch\etc\config_data\QRServer\webcluster\etc\qrserver\tango\configuration.attributes.xml
    C:\FASTSearch\var\qrserver\webcluster\13280\cache_cs\FieldProperties.xml
    C:\FASTSearch\etc\config_data\Schema\webcluster\FieldProperties.xml
    C:\FASTSearch\var\etc\indexConfig.xml
    C:\FASTSearch\etc\config_data\RTSearch\webcluster\indexConfig.xml
    C:\FASTSearch\etc\config_data\RTSearch\webcluster\fixml_mappings.xml

    Monday, September 20, 2010 7:39 PM
  • Even if it does work this way, I would just like to point to this important reminder from TechNet: http://technet.microsoft.com/en-us/library/ff354943.aspx

    "Modifying other configuration files than those listed in this table is not supported and could lead to system inconsistencies during re-configuration and software updates."

    This is the reason why I explained above how to achieve this in a supported way using the Pipeline Extensibility.

    --Leo

    Monday, September 20, 2010 8:12 PM
  • Good point. This is pretty unfortunate.
    Monday, September 20, 2010 8:46 PM
  • Hi Leo,

    Is there a way to assign this paragraph separator in Extensibility stage using PowerShell (that's what I wrote it in)?  

    I was thinking that something like [char]$separator = '\u2029' would work but t seems like assigning unicode values is not supported in Powershell.

    Thank you,

    Mike

    Wednesday, February 9, 2011 1:08 AM
  • Hi Mike,

    I haven't tested this inside a PowerShell script to see if it would work, so I wouldn't know for sure. Best way to figure that out is by testing it.

    In any case, just out of curiosity, why do you want to do this using PowerShell instead of VB.NET or C#? :)

    Best,

    Leo

    Wednesday, February 9, 2011 1:16 AM
  • In PowerShell do it like this:

    [char]$sep = 0x2029
    

    Regards,
    Mikael Svenson


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/ - http://www.comperiosearch.com/
    Wednesday, February 9, 2011 9:04 AM
  • Hi Leo,

     

    My main reasons for that -

    it's faster to prototype for development purposes - the changes made in script are being picked up right away, no need to recompile/redeploy ('debug' switch alone is worth mentioning).

    It's been a pain to make dev level logging work in C#, while in script it's pretty much straight forward.

     

    Lesser reasons -

    Client didn't have Visual Studio installed, so to avoid copying executables between local machine and client environment I picked PowerShell.

    And the last, but not least - I'm just more comfortable with PowerShell than with C# or .NET    :-)

     

    Thank you,

     

    Mike

    Wednesday, February 9, 2011 4:34 PM
  • Thank you Mikael.

    That did the trick!

     

    Mike

    Wednesday, February 9, 2011 6:13 PM
  • Thank you very much for the explanation, Mike!

    That seems a pretty good way to get it quickly deployed indeed. If you can share your PowerShell script here, I think it would be great for others to have a quick example like this that they can try.

    Thanks again,

    Leo

    Wednesday, February 9, 2011 9:23 PM
  • Will do as soon as I get some breathing time from a project (it needs some editing).
    Thursday, February 10, 2011 11:40 PM
  • Did a small test powershell script on my blog - http://techmikael.blogspot.com/2011/03/prototyping-pipeline-stages-in.html

    -m


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/ - http://www.comperiosearch.com/
    Wednesday, March 2, 2011 11:01 AM
  • Yes, I was able to get this working with a code like this inside my Pipeline Extensibility custom code:

        string currentSeparator = ";";
        string multivalueSeparator = new string(new char[] { '\u2029' });
    
        currentValue = currentValue.Replace(currentSeparator, multivalueSeparator);
    

    Where "currentValue" is the string variable that contains the value of the crawled property where I want to replace the ";" separator with the special multivalue separator.

    Also, as mentioned above, don't forget to make sure you configure your Managed Property (the one that will receive the contents of this multivalued crawled property) with MergeCrawledProperties=true.

    What is the issue you are having when trying this? If you provide additional details I may be able to help.

    Best,
    Leo

    Hi Leo,

    I'm just wondering how you were able to get access to the data of your multi valued managed property? In my xml input file, the Crawled property is always empty. Below is my entry in the pipeline extensibility xml file:

    <CrawledProperty propertySet="00130329-0000-0130-c000-000000131346" varType="4127" propertyName="ows_IntranetLocations"/>

    I really need to be able to set the multivalue separator for this managed property and i'm stuck ! Would love any help/advice you could provide.

     

    Thanks,

     

    Sam

    Thursday, May 19, 2011 3:43 AM
  • Hi Sam!

    The first thing I would do is enable the FFDDumper stage (http://msdn.microsoft.com/en-us/library/ff795826.aspx) to have a look at what is been received by FAST, including the actual name for this property. Once you enable this stage and recrawl your content, FS4SP will dump an entry at %FASTSEARCH\data\ffd\ for each document processed.

    I've struggled a few times with situations where the name of my property in the pipeline extensibility file wasn't *exactly* like the name of my crawled property, and this configuration is case sensitive.

    Hope that helps!

    Best,

    Leo

    Thursday, May 19, 2011 5:16 AM
  • Using FFDDumper is a great idea as leonard says. Also do this google search and find good posts on debugging the FS4SP pipeline:

    http://www.google.com/search?hl=en&q=fast+sharepoint+spy+pipeline

    Regards,
    Mikael Svenson 


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/
    Thursday, May 19, 2011 6:33 AM
  • Ah, thanks for the quick responses. I will try this now. Am i correct in listing the variant type as 4127 for taxonomy? I've noticed that whenever a taxonomy managed property is created, two crawled properties (one with vartype 31 and one with vartype 4127) are generated. I map the managed property to only the 4127 crawled property. I've tried mapping to the 31 crawled property but that doesn't work either. Or should i map to both of them?

    Thanks again.

    Thursday, May 19, 2011 7:08 AM
  • Hi Leo and Mikael,

    I’ve delved a bit deeper and found a few things.

    I enabled FFDDumper which enables processing pipeline debugging. After looking through the files, I found the ows_intranetlocations crawled property entries (here’s a few):

    63 00130329-0000-0130-C000-000000131346:ows_IntranetLocations:4127 L3 s9 St George s7 Jackson s8 Adelaide

    61 00130329-0000-0130-C000-000000131346:ows_IntranetLocations:31 s11 Kyrgyszstan

    63 00130329-0000-0130-C000-000000131346:ows_IntranetLocations:4127 L2 s9 St George s9 Gladstone

     

    So the property id appears correct (although I had a lowercase C). I updated the id in the pipeline xml file to contain the upper case C but I’m still not getting any values. I’m not sure why the debug file contains entries with variant type of 31. I guess this could be old data from when I tinkered with changing mappings.

     

    I had a look at the propertycategories.xml file and found the following entry for the Sharepoint category:

    <category name="sharepoint" indexed="yes" discover="yes">
      <propset name="00020329-0000-0000-c000-000000000046" />
      <propset name="00130329-0000-0130-c000-000000131346" />
      <propset name="00140329-0000-0140-c000-000000141446" />
    </category>
    


    It has a few different property ids. The middle one is the one I have been using. I’m wondering if I should try another? There are values in the FFDDumper output which is a good sign i guess. But still nothing written to the input xml file.

    Thursday, May 19, 2011 2:26 PM
  • Hi gents,

    I’ve had some success! I’m getting crawled values inserted into the input xml now. Phew, I think it was a naming issue like you said which is handy J. A couple of issues:

     

    1.       One thing I’ve noticed though is that the crawled property value within the XML is a concatenated string (with spaces as the delimiter). Is the delimiter character hidden? There needs to be a delimiter so I can process the items and output them with a FAST delimiter in a new crawled property.

    2.       We are still having that FAST search issue. In summary:

    ·         We have a SharePoint publishing page (and hence related content type) that allows for multiple Terms (i.e. “Managed Metadata” field) to be specified.  Specifically, those fields are called “Intranet Keywords” and “Intranet Locations”.

    ·         We have created two managed properties on the FAST server:

    o   “IntranetKeywords” is mapped to “ows_IntranetKeywords”

    o   “IntranetLocations” is mapped to “ows_IntranetLocations”

    ·         When pages have single terms against these fields, search works fine.  However, when pages have multiple Intranet Keywords specified, search does not return any results when looking for a page of a specific location.  For example, if a page has “Intranet Keywords” as “Apple” and “Tomato”, and “Intranet Locations” as “Adelaide”, when you do a search such as “intranetlocation:Adelaide”, the page doesn’t come back.  Only pages with individual keywords come back, but the page with “Apple” and “Tomato” doesn’t for some reason.

     Thanks again for listening,

    Monday, May 23, 2011 5:45 AM
  • Hi,

    We were doing this replacement of ";" with "\u2029" & interestingly what we figured out was that we have to put the default FS4SP seperator at the start & end of the field value. So what i mean here is that the content in the multi-valued field should look like following when it goes pass the CustomExtensibility stage, assuming at the source it was "mike;svenson;Ashwani;Bothra":

    \u2029Mike\u2029Svenson\u2029Ashwani\u2029Bothra\u2029

    if it is not, Assuming i am crawling just 1 web page then in the refiner i get:

    mike - count 2

    svenson - count 1

    Ashwani - count 1

    Bothra - count 2

    We are using SharePoint Web Crawler here to crawl a website with HTML pages on it.

    we have to prefix & suffix the content for the multi-valued field with "\u2029" to get the refinement count for "mike" & "Bothra" to 1.

    We are going to test this with other connectors like SharePoint connector, Enterprise crawler & see if this something specific to the SharePoint Web crawler.

    Interestingly, if we use docpush for the same web page(downloaded locally), we do not have to do the prefixing & suffixing of "\u2029". It works fine.

    Has anyone else also observed this behaviour & are we missing some configuration here ?

     

    Additionally, we also observed that we need Deep Refiner to be enabled to ensure that refiner's are generated per value. Else the refiner would look like the following:

    Mike Svenson Ashwani Bothra - count 1

    Generally, deef refinement is enabled when we want to generate the refiners from the whole result set. Not sure why it impacts the way refiners are generated. Ideally, its only the count that should vary.

    Thanks,

    Ashwani



    Thursday, October 27, 2011 2:06 PM
  • Hi,

    We were doing this replacement of ";" with "\u2029" & interestingly what we figured out was that we have to put the default FS4SP seperator at the start & end of the field value. So what i mean here is that the content in the multi-valued field should look like following when it goes pass the CustomExtensibility stage, assuming at the source it was "mike;svenson;Ashwani;Bothra":

    \u2029Mike\u2029Svenson\u2029Ashwani\u2029Bothra\u2029

    if it is not, Assuming i am crawling just 1 web page then in the refiner i get:

    mike - count 2

    svenson - count 1

    Ashwani - count 1

    Bothra - count 2

    We are using SharePoint Web Crawler here to crawl a website with HTML pages on it.

    we have to prefix & suffix the content for the multi-valued field with "\u2029" to get the refinement count for "mike" & "Bothra" to 1.

    We are going to test this with other connectors like SharePoint connector, Enterprise crawler & see if this something specific to the SharePoint Web crawler.

    Interestingly, if we use docpush for the same web page(downloaded locally), we do not have to do the prefixing & suffixing of "\u2029". It works fine.

    Has anyone else also observed this behaviour & are we missing some configuration here ?

     

    Additionally, we also observed that we need Deep Refiner to be enabled to ensure that refiner's are generated per value. Else the refiner would look like the following:

    Mike Svenson Ashwani Bothra - count 1

    Generally, deef refinement is enabled when we want to generate the refiners from the whole result set. Not sure why it impacts the way refiners are generated. Ideally, its only the count that should vary.

    Thanks,

    Ashwani




    Interesting, Ashwani. I haven't tested this with content coming from HTML pages, but I would expect it to behave the same. I would start by checking the FIXML to see what is actually being reported there. You can check the FIXML using this script here.

    Hope that helps!

    --Leo

    Thursday, October 27, 2011 2:59 PM
  • Hi Ashwani,

    Could you post a sample html you use for testing? I have seen issues before where date from <meta> tags are duplicated due to being sent in from the crawler, and also detected inside the pipeline, thus duplicating the values (was an acknowledged bug at the time, and I have'n tested it after SP1 came out)

    Regards,
    Mikael Svenson 


    Search Enthusiast - SharePoint MVP/WCF4/ASP.Net4
    http://techmikael.blogspot.com/
    Thursday, October 27, 2011 6:04 PM
  • its a simple HTML:

    <html>
    <head>
    <meta name="typeofcontent" value="abc;xyz;lkj" />
    </head>
    <body>
    BLAH BLAH BLAH
    </body>
    </html>
    


    if i update the meta tag value as ";abc;xyz;lkj;" & replace the ";" with "\u2029", everything works smoothly when crawling this page via Sharepoint based WEB crawler.

    for a docpush, the above code works like a charm.

    Also, any thoughts around the deep navigator configuration ? why would that make the content look different in the navigator ?

    Thanks,

    Ashwani

     



    Thursday, October 27, 2011 6:24 PM
  • And BTW, we are already running SP1.

    Thursday, October 27, 2011 6:31 PM
  • Hi, I have a similar issue and don't know how to fix the problem.

     

    We have a crawled property with multiple-values joined by the \u2029 character, but FS4SP does not recognize them and only set the whole string as value of managed property. We also set the attribute MergeCrawledProperties to true.

     

    The crawled property is also not marked as multi-valued, but I don't know why.

     

    Any idea?

    Wednesday, November 2, 2011 9:04 AM
  • Have you selected the "Deep Refiner" property under Refiners ?

    It would sound weird, but if deep refinment is not enabled the refiner is created as a single string of all the values.

    HTH,
    Ashwani

     

    Wednesday, November 2, 2011 9:47 AM
  • RefinementType   : DeepRefinementEnabled
    Divisor          :
    Intervals        :
    Resolution       :
    Algorithm        :
    DefaultValue     :
    CutoffMaxBuckets : 1000
    Anchoring        : Auto

     

    This is the refiner configuration of the property, is that correct?

    Wednesday, November 2, 2011 9:55 AM
  • Yes this looks good.

    If you are still not getting the refiners appropriately, can you please share the output of a spy kind of stage. you can use the following code to get that:

                XDocument inputDoc = XDocument.Load(args[0]);
                String pipelineInputData = @"c:\users\" + Environment.UserName + @"\appdata\LocalLow\PipelineLog";
                
                File.AppendAllText(pipelineInputData + @"\SPY.txt",inputDoc.ToString()+"\r\n");
    


    Might have to create the pipelinelog folder manually.

    thanks,

    Ashwani

    Wednesday, November 2, 2011 10:39 AM
  • Currently I did not use any custom pipeline processing, because I think that FS4SP automatically detects the \u2029 character as delimer or not?
    Wednesday, November 2, 2011 3:12 PM
  • its a simple HTML:

    <html>
    <head>
    <meta name="typeofcontent" value="abc;xyz;lkj" />
    </head>
    <body>
    BLAH BLAH BLAH
    </body>
    </html>
    


    I was able to use the AttributeSplitter and was able to split the ; seperated values as seperate refiners in a different/alernative way. Shall provide the steps in a new thread.

    Freddie


    Freddie Maize ..A story with Glory is History. Doesn’t matter whether Glory rest in the world of Demon or God. Lets create History..
    • Edited by freddieMaize Tuesday, November 15, 2011 11:56 AM
    Tuesday, November 15, 2011 11:55 AM