Ask a questionAsk a question
 

AnswerPrevent Duplicate Message Submission

  • Friday, November 06, 2009 8:20 PMDan RosanovaMVPUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    So I've been asked to prevent business partners from submitting the same message (normally file) twice.  Any good ideas on how to do this?  I am thinking of making a BizTalk solution that will receive all files and use a pipeline that takes an MD5 and logs to a database somewhere then on the send side (to send the file I've just received on to another BizTalk solution) I would have a Pipeline check this database. 

    I don't like accessing a DB in a pipeline, but the only other thing I could think of would be an Orchestration (which I would receive XmlDocument and just not touch it in).  Which sounds better?  Will the Orchestration option increase my memory footprint even if I never do any operaton on the received message?

Answers

  • Sunday, November 08, 2009 8:25 PMKiryl Kavalenka Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    the resource article is showing use of streams that will negatively impact memory footprint
    Do you mean loading entire message into memory? Use ReadOnlySeekableStream or VirtualStream as wrappers to avoid this.

    Here is MSDN article on the subject (http://msdn.microsoft.com/en-us/library/ee377071%28BTS.10%29.aspx):

    The following techniques describe how to minimize the memory footprint of a message when loading the message into a pipeline.

    Use ReadOnlySeekableStream and VirtualStream to process a message from a pipeline component

    It is considered a best practice to avoid loading the entire message into memory inside pipeline components. A preferable approach is to wrap the inbound stream with a custom stream implementation, and then as read requests are made, the custom stream implementation reads the underlying, wrapped stream and processes the data as it is read (in a pure streaming manner). This can be very hard to implement and may not be possible, depending on what needs to be done with the stream. In this case, use the ReadOnlySeekableStream and VirtualStream classes exposed by the Microsoft.BizTalk.Streaming.dll. An implementation of these is also provided in Arbitrary XPath Property Handler (BizTalk Server Sample) (http://go.microsoft.com/fwlink/?LinkId=160069) in the BizTalk SDK.ReadOnlySeekableStream ensures that the cursor can be repositioned to the beginning of the stream. The VirtualStream will use a MemoryStream internally, unless the size is over a specified threshold, in which case it will write the stream to the file system. Use of these two streams in combination (using VirtualStream as persistent storage for the ReadOnlySeekableStream ) provides both “seekability” and “overflow to file system” capabilities. This accommodates the processing of large messages without loading the entire message into memory. The following code could be used in a pipeline component to implement this functionality.




    Kiryl Kavalenka My Blog

All Replies

  • Friday, November 06, 2009 9:03 PMTariq Majeed Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi,

    One other option can be to get the message compare it with your database in your receivepipeline.  Log somewhere if duplicate (for auditing) and do not pass it to message box.  If new message and not found in db, take an md5 and log to database and pass to messagebox. 


    Regards,

    Tariq Majeed
    Please mark it as answer if it helps
  • Friday, November 06, 2009 9:24 PMDan RosanovaMVPUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    That's pretty good!  Sounds like what I want.  I do a lot of debatching and wanted to be able to get the MD5 before the messages start getting dispatched, but I think the stream in the Pipeline is forward only read only with events, which I think would stop me from getting the MD5 before the debatching. 

    -Dan
  • Saturday, November 07, 2009 8:14 AMKiryl Kavalenka Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Debatching will take place during Disassemble stage of the pipeline. You can put all your checking duplicates logic (i.e. read the whole file, compute MD5, compare it with DB (with logging if not duplicate), throw exception if duplicate) into a custom Decode component.


  • Saturday, November 07, 2009 8:43 AMKiryl Kavalenka Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    I think the stream in the Pipeline is forward only read only with events, which I think would stop me from getting the MD5 before the debatching. 
    No, it's not. You can do whatever you want with the stream during Decoding stage. A stream wrapper may be required in some cases, but not in yours.
    Here is a decent resource on the subject:
  • Sunday, November 08, 2009 6:39 PMDan RosanovaMVPUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    So this is acutally working (although I would point out that the resource article is showing use of streams that will negatively impact memory footprint and do some things I thought you weren't supposed to in a pipeline). 

    I got it working, but my messages were always empty so after debugging I decided I would just set the position on the stream to zero and it works great.  Now my only other issue is suppose I find this duplicate message, what do I do then? 

    Effectively I want the message to simply not come out of the receive Pipeline.  I guess I could throw an exception and just deal with the suspended messages.  Can I set any of the properties in the ErrorReport so that I catch these from any application. 

    Kind Regards,
    -Dan
  • Sunday, November 08, 2009 7:56 PMKiryl Kavalenka Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    You are not using ESB exception handling, are you?
    Kiryl Kavalenka My Blog
  • Sunday, November 08, 2009 8:25 PMKiryl Kavalenka Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    the resource article is showing use of streams that will negatively impact memory footprint
    Do you mean loading entire message into memory? Use ReadOnlySeekableStream or VirtualStream as wrappers to avoid this.

    Here is MSDN article on the subject (http://msdn.microsoft.com/en-us/library/ee377071%28BTS.10%29.aspx):

    The following techniques describe how to minimize the memory footprint of a message when loading the message into a pipeline.

    Use ReadOnlySeekableStream and VirtualStream to process a message from a pipeline component

    It is considered a best practice to avoid loading the entire message into memory inside pipeline components. A preferable approach is to wrap the inbound stream with a custom stream implementation, and then as read requests are made, the custom stream implementation reads the underlying, wrapped stream and processes the data as it is read (in a pure streaming manner). This can be very hard to implement and may not be possible, depending on what needs to be done with the stream. In this case, use the ReadOnlySeekableStream and VirtualStream classes exposed by the Microsoft.BizTalk.Streaming.dll. An implementation of these is also provided in Arbitrary XPath Property Handler (BizTalk Server Sample) (http://go.microsoft.com/fwlink/?LinkId=160069) in the BizTalk SDK.ReadOnlySeekableStream ensures that the cursor can be repositioned to the beginning of the stream. The VirtualStream will use a MemoryStream internally, unless the size is over a specified threshold, in which case it will write the stream to the file system. Use of these two streams in combination (using VirtualStream as persistent storage for the ReadOnlySeekableStream ) provides both “seekability” and “overflow to file system” capabilities. This accommodates the processing of large messages without loading the entire message into memory. The following code could be used in a pipeline component to implement this functionality.




    Kiryl Kavalenka My Blog