BizTalk SQL Patterns: Polling and Batch Retrieve

Interacting with SQL Server is a bread-and-butter operation for BizTalk apps and native SQL Server support has been in the product since the beginning.

While there are many ways to develop our SQL Server interfaces, in this article, I'm presenting two patterns for Polling data and a technique for retrieving multiple records in a single request.

Download the Code Gallery Sample here: BizTalk: SQL Patterns for Polling and Batch Retrieve

Polling a Database We Own

When the database the BizTalk app is targeting is owned internally and we have influence over the database design, implementing a reliable polling pattern is a relatively straight forward proposition.

This pattern uses a specific field on a top level table to track which records have been polled and provide a level of concurrency management so that >1 Receive Location can be active at a time.

The example table is a copy of SalesOrderHeader, as SalesOrderHeaderPolling, from the AdventureWorks database (downloadable from CodePlex).  Here is the modified table:

  

A new field, PollingID of type uniqueidentifier, has been added to this table to manage the polling process from BizTalk.

To poll this table, we use a Stored Procedure that takes the following steps:

  1. Generates a new uniqueidentifier (GUID).
  2. Updates the specified number of rows with the new uniqueidentifier.  This invocation of the Stored Procedure now owns these rows.
  3. Selects out the rows by the uniqueidentifier

Here is the Stored Procedure Code:

CREATE PROCEDURE [dbo].[getSalesOrders]
       @qty int
AS
BEGIN
       SET NOCOUNT ON;
       DECLARE @PollingID UNIQUEIDENTIFIER = NEWID()
       UPDATE TOP(@qty) Sales.SalesOrderHeaderPolling
       SET PollingID = @PollingID
       WHERE PollingID IS NULL
  
       SELECT *
       FROM Sales.SalesOrderHeaderPolling
       WHERE PollingID = @PollingID
END

Here’s the BizTalk application for this.

What we have are the Schemas and Binding File generated by the WCF SQL Adapter Wizard.

Finally, here we see the Receive Location with the WCF SQL Adapter pointing directly to the AdventureWorks database.

When we enable the Receive Location with the @qty parameter set at 5, we see 5 SalesOrderHeader records have been polled from the database.

Because PollingID has been set, these record are now flagged as having been retrieved by our BizTalk app.  In this scenario, since we have control over the database, we can also add other useful fields to identify the time the record was retrieved or the time we finished processing the record.

Polling a Database We Don’t Own

On the other hand, sometime we have to poll records from a database we have no control over, such as a vendor supplied application or internal app deemed too risky to modify.

In this case, we have to do a little extra work to manage our polled records outside the target app's database.

In this pattern, we create a separate SQL database to track Sales Orders created in the main app.  This example uses a database called AdventureWorksInterface with two artifacts to manage the polling.

Here is the definition of the SalesOrderPollingTracking table:

This pattern works essentially the same as when we own the database, as in the previous example, except we have to keep a copy of the SalesOrderID’s on our own.  To accomplish this, our polling Stored Procedure takes the extra step of adding new SalesOrderID’s to the tracking table before returning the results.

This Stored Procedure takes the following steps:

  1. Executes a process to synchronize the SalesOrderID’s
  2. Generates a new uniqueidentifier (GUID).
  3. Updates the specified number of rows with the new uniqueidentifier.  This invocation of the Stored Procedure now owns these rows.
  4. Selects out the rows by the uniqueidentifier
CREATE PROCEDURE [dbo].[getSalesOrdersFromAW]
       @qty int
AS
BEGIN
       SET NOCOUNT ON;
  
       INSERT INTO SalesOrderPollingTracking (SalesOrderId)
       SELECT TOP(@qty + 5) awsso.SalesOrderID--, pso.SalesOrderId
       FROM AdventureWorks2012.Sales.SalesOrderHeader AS awsso
       LEFT OUTER JOIN SalesOrderPollingTracking AS pso
              ON awsso.SalesOrderID = pso.SalesOrderId
              WHERE pso.SalesOrderId IS NULL
       ORDER BY awsso.SalesOrderID ASC;
           
       DECLARE @PollingId uniqueidentifier = NEWID();
  
       WITH PollingOrders AS
       (
                     SELECT TOP(@qty) PollingId
                     FROM SalesOrderPollingTracking
                     WHERE PollingID IS NULL
                     ORDER BY SalesOrderId ASC      
       )
  
       UPDATE PollingOrders
       SET PollingID = @PollingId
           
       SELECT awsso.*
       FROM SalesOrderPollingTracking AS ptso
       JOIN AdventureWorks2012.Sales.SalesOrderHeader AS awsso
       ON ptso.SalesOrderID = awsso.SalesOrderID
       WHERE ptso.PollingID = @PollingID   
END

In this example, the synchronization is done by JOIN’ing the tables and INSERT’ing records from the main SalesOrderHeader table where there is no match in the Tracking table.

One important thing to consider is that there are several ways to keep the SalesOrderID’s in sync and which to use depends on several factors such as relative size of the tables and frequency of update.

Here are some additional methods that can be tested in any environment.

Using WHERE NOT IN:

INSERT INTO SalesOrderPollingTracking (SalesOrderId)
       SELECT TOP(50) awsso.SalesOrderID
       FROM AdventureWorks2012.Sales.SalesOrderHeader AS awsso
       WHERE awsso.SalesOrderID NOT IN
              (
              SELECT SalesOrderID
              FROM SalesOrderPollingTracking AS ptso
              )
       ORDER BY SalesOrderId ASC;

Using WHERE NOT EXISTS:

INSERT INTO SalesOrderPollingTracking (SalesOrderId)
       SELECT TOP(50) awsso.SalesOrderID
       FROM AdventureWorks2012.Sales.SalesOrderHeader AS awsso
       WHERE NOT EXISTS
              (
              SELECT SalesOrderID
              FROM SalesOrderPollingTracking AS ptso
              WHERE ptso.SalesOrderID = awsso.SalesOrderID
              )
       ORDER BY SalesOrderId ASC;

Using MERGE:

MERGE SalesOrderPollingTracking AS ptso
       USING AdventureWorks2012.Sales.SalesOrderHeader AS awsso
       ON ptso.SalesOrderID = awsso.SalesOrderID
       WHEN NOT MATCHED BY TARGET
              THEN INSERT(SalesOrderID) VALUES(awsso.SalesOrderID)
       WHEN NOT MATCHED BY SOURCE
              THEN DELETE;

Important point, the MERGE option is the only one that will synchronize, add and delete the SalesOrderID in one operation.

Here’s the BizTalk application for this.

What we have are the Schemas and Binding File generated by the WCF SQL Adapter Wizard.

Finally, here we see the Receive Location with the WCF SQL Adapter pointing to the AdventureWorksInterface database.

When we Enable this Receive Location with the @qty set to 5, we again see records polled from the main app database and some additional SalesOrderID’s awaiting the next Polling interval.

For demonstration purposes, the polling stored procedure retrieves 5 extra records.

Retrieving Records in Batch

Recently on the BizTalk Forum, a poster asked for advice on a batching operation they felt had a less than ideal performance profile.  The implementation was perfectly serviceable and produced the correct output but since it looped through a series of values and made individual calls to SQL Server, the poster figured there had to be a better way.

This is a sample of my proposed solution.

The WCF SQL Adapter supports User-Defined Table Types as parameters to Stored Procedures.  In the Stored Procedure itself, the table parameter behaves just like a physical table, mostly.

Because of this, we can pass the list of ID's to retrieve as a table type and JOIN to the target table to produce the result set containing all of the requested ID’s in one call.

First, we define the User-Defined Table Type:

CREATE TYPE [dbo].[SalesOrdersToRetrieve] AS TABLE(
       [SalesOrderID] [int] NOT NULL,
       PRIMARY KEY CLUSTERED
(
       [SalesOrderID] ASC
)WITH (IGNORE_DUP_KEY = OFF)
)

This SalesOrdersToRetrieve UDT consists of just one column, SalesOrderID.

Next, we define the Stored Procedure to retrieve the Sales Orders listed in our SalesORdersToRetrieve table parameter:

CREATE PROCEDURE [dbo].[getSalesOrdersByIDs]
       @SalesOrderIDs SalesOrdersToRetrieve READONLY  
AS
BEGIN
       SELECT sh.*
       FROM Sales.SalesOrderHeader AS sh
       JOIN @SalesOrderIDs AS ids
              ON sh.SalesOrderID = ids.SalesOrderID
END

The query itself is very simple, a SELECT against the AdventureWorks SalesOrderHeader table JOIN'ed to our SalesOrdersToRetrieve table parameter.

Now, let's see how this looks in BizTalk.

This project contains the Schemas generated by the WCF SQL Wizard to call the getSalesOrdersByIDs Stored Procedure, a sample source message and a Map to create the SQL Request message.

The SalesOrdersToRetrieve.xsd is a filled list of SalesOrders to retrieve as a batch.

Next, we map from our source message to the SQL Request which has the User Defined Table as its only parameter.

After configuring a simple messaging application where the Map is applied on the Two-Way SQL Send Port, we start the process with the following test message.

<ns0:SalesOrdersToRetrieve xmlns:ns0="http://BizTalkSQLPatterns.SalesOrdersToRetrieve">
  <SalesOrdersID>44018</SalesOrdersID>
  <SalesOrdersID>44019</SalesOrdersID>
  <SalesOrdersID>44020</SalesOrdersID>
  <SalesOrdersID>44021</SalesOrdersID>
  <SalesOrdersID>44022</SalesOrdersID>
</ns0:SalesOrdersToRetrieve>

The resulting response from the Stored Procedure is a batch of the 5 request SalesOrderHeader records.  The result has been edited for brevity.

  <StoredProcedureResultSet0>
      <SalesOrderID>44018</SalesOrderID>
      <RevisionNumber>3</RevisionNumber>
      <OrderDate>2005-08-21T00:00:00Z</OrderDate>
      <DueDate>2005-09-02T00:00:00Z</DueDate>
      <ShipDate>2005-08-28T00:00:00Z</ShipDate>
      <Status>5</Status>
    </StoredProcedureResultSet0>
      <SalesOrderID>44019</SalesOrderID>
      <RevisionNumber>3</RevisionNumber>
      <OrderDate>2005-08-21T00:00:00Z</OrderDate>
      <DueDate>2005-09-02T00:00:00Z</DueDate>
      <ShipDate>2005-08-28T00:00:00Z</ShipDate>
      <Status>5</Status>
    </StoredProcedureResultSet0>
      <SalesOrderID>44020</SalesOrderID>
      <RevisionNumber>3</RevisionNumber>
      <OrderDate>2005-08-21T00:00:00Z</OrderDate>
      <DueDate>2005-09-02T00:00:00Z</DueDate>
      <ShipDate>2005-08-28T00:00:00Z</ShipDate>
      <Status>5</Status>
    </StoredProcedureResultSet0>
      <SalesOrderID>44021</SalesOrderID>
      <RevisionNumber>3</RevisionNumber>
      <OrderDate>2005-08-21T00:00:00Z</OrderDate>
      <DueDate>2005-09-02T00:00:00Z</DueDate>
      <ShipDate>2005-08-28T00:00:00Z</ShipDate>
      <Status>5</Status>
    </StoredProcedureResultSet0>
      <SalesOrderID>44022</SalesOrderID>
      <RevisionNumber>3</RevisionNumber>
      <OrderDate>2005-08-22T00:00:00Z</OrderDate>
      <DueDate>2005-09-03T00:00:00Z</DueDate>
      <ShipDate>2005-08-29T00:00:00Z</ShipDate>
      <Status>5</Status>
    </StoredProcedureResultSet0>
  </StoredProcedureResultSet0>
  <ReturnValue>0</ReturnValue>
</getSalesOrdersByIDsResponse>

In a typical scenario, this result would be debatched and the individual orders processed separately.

See Also

Another important place to find a huge amount of BizTalk related articles is the TechNet Wiki itself. The best entry point is BizTalk Server Resources on the TechNet Wiki.