none
Import terms to FAST from other system RRS feed

  • Question

  • Our customer is having a Terminology Server.

    The Terminology Server is a software application that supports creation, maintenance and dissemination of terminology and provides terminology services. The backbone of the terminology system is an ontology — an ordered set of concepts and their relations. The terminology content of the TS consists of sets of concepts that are represented as categories of a value set.  All categories in different value sets are mapped to the ontology. The value sets represent a certain subject field, or domain or other terms Value sets are created externally and imported into the TS in SKOS or ClaML formats.

    We would like to import the ontology into FAST and use it to filter (make the selection broader or narrower) the hit list. For example if the search term is "car" then it should be possible to make the hit list broader by selecting the concept "vehicle" which is the level above car or to make it more narrower by selecting "sportscar". Forgive my simplistic example.

    My problem is that I don't know how to do this. We can export the ontology in either SKOS or CVS formats. We would of course like to preserve the relations between all the terms so the ontology is kept as is as much as possible, when importing it. I assume that somehow the terms in the ontology will show up as "managed properties" if we manage to import it.

    If there are other ways to solve this I'm much grateful for any pointers.

    Thanks // Ulf

    Wednesday, February 23, 2011 3:30 PM

Answers

  • Guess I have enough information for a possible solution.

    For each term you index I would add the parent term in one field and the children terms as a multi value field, and the complete parent path in a third field. Basically you are de-normalizing your ontology to fit a search engine scenario.

    term: car preferredterm: automobile parent: vehicle parents: vehicle ; transportation children: sports car ;
 electric car ;
 beach buggy
    

    If you search for "cars" you would get a hit for the term "car" (enable lemmatization for the field), and you would have enough information in the hit itself to create links which executes new searches with those terms as the query.

    The "term" field would be the field to search against, as the others are only used to build up new queries for your UI.

    So over to the case with "cars sweden". You would need a bit of logic in your code to pull this off. First you search with the whole term, this would yield zero hits from the ontology result items. Then you would have to split the query into multiple words, and either execute it as an OR query, or multiple queries. But once the result comes back, you have to identify which word was an ontology term, and which should be used to query the text of documents.

    Example:

    1. "cars sweden" = 0 ontology hits

    2. "cars" = 1 ontology hit, "sweden" = 0 ontology hits

    3. and(or(term:"car",parents:"car"),content:"sweden")

    By adding the "parents" field with all parents you would always be able to search on a term for a complete ontology path. You should also add a field noting that the item is an ontology item, in order to filter on it for ontology searches.

    Shouldn't be that hard to develop a custom search page with this logic, or even modify the existing search web parts and creating the ontology navigator web part. But you would end up with multiple searches to pull it off, at least when documents are involved.

    Does this make sense?

    As a last note, there exists topic map engines with SharePoint support which might be better suited, but they might lack linguistic search capabilities.

    -m


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/ - http://www.comperiosearch.com/
    Friday, February 25, 2011 7:25 PM

All replies

  • This post will be a bit incoherent as I think around the problem :)

    If your example is representative of the ontology you would have a hierarchical structure in a tree. That means that narrowing from the term "car" is not only to "sports car" but also to "electric car".

    I also assume that each document or indexed item has a reference to a particular node in the ontology. Correct me if I'm wrong. So, are the ontology meta data to the indexed items, or items themselves?

    Will you in your search application search for terms in the ontology or would a more valid case be:

    Search term: "Porsche 911"

    which might yield that it is a "sports car". And showing a link to "car", which would

    a) do a filter search on "car" without the term "Porsche 911"

    b) do a filter search on "car" with the word "Porsche" (maybe not with 911)

    I'm asking these questions as search items in themselves don't have any relationships, just static meta data (which could be conceptually linked).

    It's definitely possible to solve this, but you need to think about what are searchable words, and what are filters. And what should happen with the query words or other filters as you scope up or down the ontology for this to actually work.

    So, what is a typical query a user would make, and what kind of items are listed in the results (documents, people, products)?

    One way of doing this could be to link up the parent ontology term and any child terms as meta fields to the item. This way you can easily create new searches based on them as filters. Should each item be able to navigate just one level up/down at a time, or several? How you hook the ontology hierarchy to the items I'll leave for a later post (but I guess the CSV route and a custom pipeline stage).

    Browsing thru an ontology and via refiners in search are in a way two different ways of exploration, even though they share some of the same characteristics.

    Regards,
    Mikael Svenson 


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/ - http://www.comperiosearch.com/
    Wednesday, February 23, 2011 7:45 PM
  • I'll try to explain this in a bit more detail.... We'll see if it helps :-)

    Your assumption about "sports car" and "electric car" is correct. When narrowing the selection you get more terms to choose from. But to complicate life a bit more... for all terms in the tree, for each level, there is one term which is the preferred one and then there is a lot of synonyms. So when searching for "automobil" you should also get all hits for car and vice versa. "Car" is the preferred term and "automobil" is a synonym. So when searching for one term searches for all synonyms should also be conducted.

    To answer your second question the ontology is meta data to the indexed items.

    When we have conducted a search for one term (and all the synonyms) we are presented with a hit list and depending on the result it should be possible to browse trough the ontology upwards or downwards, I think one level at at time should be enough. I aasume somehow this could be accomplished by linking parents with childs and redoing the search, even if I don't know how to do this in practice.

    Does this make sense and do you follow my explanation?

    So going back to my original question, how do we import the ontology so we can perform search in this manner? 

    Thanks // Ulf

    Friday, February 25, 2011 11:51 AM
  • If I understand you correct you basically want to search in the ontology, not data linked to the ontology? And thus create an ontology browser based on FAST search?

    -m


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/ - http://www.comperiosearch.com/
    Friday, February 25, 2011 12:19 PM
  • Hmmm... Yes, that's partially correct. Yes, we want to search in the ontology but then we would of course would like to retreive the relevant documents based on the ontology search. We would also like to have the possibillity to search the actual data but that could be handled later.

    A search could be something similar to this: "Cars" + "Sweden" this would result in a hit list with all the documents with the meta data "cars" (or any synonyms to cars) AND "Sweden". After that it would be possible to refine the search by going up or down in the hierarchy like for instances "Vehicle" + "Uppland" (Region in Sweden) or "electric car" + "European Union"

    // Ulf

    Friday, February 25, 2011 12:34 PM
  • Guess I have enough information for a possible solution.

    For each term you index I would add the parent term in one field and the children terms as a multi value field, and the complete parent path in a third field. Basically you are de-normalizing your ontology to fit a search engine scenario.

    term: car preferredterm: automobile parent: vehicle parents: vehicle ; transportation children: sports car ;
 electric car ;
 beach buggy
    

    If you search for "cars" you would get a hit for the term "car" (enable lemmatization for the field), and you would have enough information in the hit itself to create links which executes new searches with those terms as the query.

    The "term" field would be the field to search against, as the others are only used to build up new queries for your UI.

    So over to the case with "cars sweden". You would need a bit of logic in your code to pull this off. First you search with the whole term, this would yield zero hits from the ontology result items. Then you would have to split the query into multiple words, and either execute it as an OR query, or multiple queries. But once the result comes back, you have to identify which word was an ontology term, and which should be used to query the text of documents.

    Example:

    1. "cars sweden" = 0 ontology hits

    2. "cars" = 1 ontology hit, "sweden" = 0 ontology hits

    3. and(or(term:"car",parents:"car"),content:"sweden")

    By adding the "parents" field with all parents you would always be able to search on a term for a complete ontology path. You should also add a field noting that the item is an ontology item, in order to filter on it for ontology searches.

    Shouldn't be that hard to develop a custom search page with this logic, or even modify the existing search web parts and creating the ontology navigator web part. But you would end up with multiple searches to pull it off, at least when documents are involved.

    Does this make sense?

    As a last note, there exists topic map engines with SharePoint support which might be better suited, but they might lack linguistic search capabilities.

    -m


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/ - http://www.comperiosearch.com/
    Friday, February 25, 2011 7:25 PM
  • Thanks a lot I will give this a try.

    I might come back with more questions later.

     

    Regards // Ulf

    Tuesday, March 1, 2011 8:59 AM
  • I hope this finds you well Mikael is there a likelihood that we will see this as a feature in Fast soon. If not would you be interested in discussing architecture for the very same.  I would like do to this on a very large scale. I am assuming I would need something like an ontology manager and perhaps a graph database.  Out of the two schemas RDF and OWL, am not sure which one to go for.  What are your thoughts on building the same and on a larger scale as compared to using CSV. What are your thoughts on  implementation on a pipeline stage with this in mind.  Do I need any NLP tools to classify the document based on the ontology?  What rules do I need to acomplish this ? or can I do it on extracted keywords, if so what are the drawbacks.  Would I have to re-process the entire document?  What are your thoughts? Please discuss.  I would greatly appreciate Mikael.

    Regards


    MAAS


    • Edited by kakaomari Thursday, September 20, 2012 7:21 PM spelling changes
    Thursday, September 20, 2012 7:17 PM
  • Hi,

    You might want to take a look at http://www.datafacet.com/ which have a product for classification and adding metadata.

    Meta data in SharePoint will stay as it is (or so it seems), so you either have to build something yourself or rely on third parties.

    Thanks,
    Mikael Svenson


    Search Enthusiast - SharePoint MVP/MCT/MCPD
    http://techmikael.blogspot.com/
    Author of Working with FAST Search Server 2010 for SharePoint

    Friday, September 21, 2012 8:09 AM