none
CTS: DocumentReciever help RRS feed

  • Question

  • Hi

    I have the following flow

    database reader -> mapper -> document reader -> document parser ->mapper -> Espwrite

    The metadata of content stored in database along with contentpath. For example content authors create the metadata and forget to copy the content to a location flow exits at the document reader, eventhough there are other valid content in the database.

    Please suggest a way to handle this situation where I can control the flow based on the error condition from document reader.

    Thanks

    Manju

     

     

     


    Manjunatha
    Monday, February 14, 2011 9:01 PM

All replies

  • Hi Manjunatha,

    You could insert a ‘Throw’ operator between the Mapper and the DocumentRetriever, or just before the DocumentRetriever operator so it could look for an empty location/URI field.  For those records with an empty field, it could then redirect them as errors to another routine in the flow.  That routine could do something as simple as writing the record names to a file (via XmlFilewriter or DelimitedFileWrite) for correction and resubmission, or it could be as ambitious as to correct the missing location/URI information and resubmitting those corrected records back to the DocumentRetriever in the primary flow thread.

     Thanks!

    Rob Vazzana | Microsoft | Enterprise Search Group | Senior Support Engineer | http://www.microsoft.com/enterprisesearch

    Friday, February 25, 2011 10:17 PM
    Moderator
  • Rob

    Thanks for the response. I guess I was not very clear in my previous posting

    My issue is content owners in CMS enters the URI correctly, but they forget to copy the content to the specified URI location.

    The meta-data is as follows

    contentid contenttype uri ....

    1             text\html    e:\Content$\published_content\html\promotions.html

    2             application\pdf  e:\Content$\published_content\pdfs\best_practice.pdf

    For contentId 1 eventhough the URI exists in the database but actual file does not exist, flow ends at document reciever stage. What we want is the flow to mark the file does not exist and move forward.

    Please help me how I can implement this.

    Thanks

    Manjunatha


    Manjunatha
    Wednesday, March 2, 2011 8:52 PM
  • Hi Manjuantha,

    Even with a valid URI being passed the record document needs to still physically exist at that specified location so it can be accessed/read.  The output schema for the operator depends on extracting content from the physical document.  The current implementation of the DocumentRetriever does not have handling built in it to allow it to proceed if the document specified at the URI doesn’t exist.  At this stage it is working as designed, and you would want to catch the missing file, before the record gets to the DocumentRetriever.  This could be accomplished by either redirecting it with a throw operator as described above, or possibly by recognizing it is missing earlier, and inserting some sort of default value to act as a placeholder document. 

    Best of luck!

    Rob Vazzana | Microsoft | Enterprise Search Group | Senior Support Engineer | http://www.microsoft.com/enterprisesearch

    Thursday, March 3, 2011 7:22 PM
    Moderator
  • Thanks Rob

     

    I have added RunCode operator to check the content exists and using throw operator to branch out so that the flow does not reach document reciever if the document does not exist.

    Thanks for the valuable sugestion.


    Manjunatha
    Saturday, March 5, 2011 12:47 AM
  • Hi,

    finally found the right link to reply. 

    What I've done when encountering problems with document retrieval (or also document parsing etc) is that I've added an error connection to the operator.  That way you don't need to make your own runcode to check if the URI exists.  You can let the DocumentRetriever try to get it, and if it fails, follow the error connection to a branch of your flow.  A good first destination for the error connection is a Mapper, and it will then show in addition to the original attributes (such as getpath) the new attribute Exception.  This will prevent your flow from terminating.

    To add the error connection, right click on the operator, choose "Connect to" and choose type "Error connection".  After the mapper, you can handle the flow as a normal flow again, insert break points, update the DB entry to show that it failed retrieval etc.

    Regards,

    Bjorn Andersen | Microsoft | Enterprise Search Group | Principal Consultant | http://www.microsoft.com/enterprisesearch

    Wednesday, March 9, 2011 4:21 AM