Introduction

Convoys are frequently used in the in the orchestrations in the BizTalk Server. Sometimes we have got counter-intuitive behavior in the convoys, when messages and orchestrations get suspended in unpredictable manner. This issue is well-known; the suspended messages are named as "zombie". The name is unofficial, but you definitely could lost your brains with them. Here I would describe in details, when and why this zombie situations are happen. See the short description of zombie in MSDN.

An orchestration can be enlisted with many subscriptions. In other word it can have several Receive shapes. Usually the first Receive creates the Activation subscription but other Receives create the Instance subscriptions. [See “Publish and Subscribe Architecture” in MSDN]

Here is a sample process:

This orchestration has two receives. It is a typical Sequential Convoy. [See "BizTalk Server 2004 Convoy Deep Dive" in MSDN by Stephen W. Thomas].

Let's experiment start.

There are three possible scenarios depending of the message sequences.

First scenario: everything is OK

Activation subscription for the Sample message is created when the SampleProcess orchestration is enlisted.

The Instance subscription is created only when the SampleProcess orchestration instance starts and this subscription is removed when the orchestration instance ends.

So far so good, the Sample_2 message is delivered exactly in this time interval and consumed.
Note, that the Sample_2 message is delivered but the Instance subscription is still working and it is working till the SampleProcess orchestration instance is working.

Second scenario: no consumers

Three Sample_2 messages are delivered. The first one is delivered before the SampleProcess starts and before the instance subscription is created. Second message is delivered in the correct time interval. The third one is delivered after the SampleProcess orchestration ended and the instance subscription was removed.

Note: It is not the first Sample_2 consumed. It was first in the queue but it was not waiting, it was suspended when it had been delivered to the Message Box and didn't have any subscribers at this moment.

We expect the first and the last Sample_2 messages will not be delivered and will fail, because in the time, when they are published, no subscriptions for these messages are enlisted.

Let's see. The first and the last Sample_2 messages are Suspended (Nonresumable) in the Message Box with routing errors. For each of them two (!) service instances have created. One service instance has the ServiceClass of Messaging, and its Error Description is:

The second service instance has the ServiceClass of RoutingFailureReport, and its Error Description is:

Third scenario: something goes wrong

Two Sample_2 messages are delivered. Both are delivered in the same interval, while the SampleProcess orchestration is working and the instance subscription has created.

The first Sample_2 is consumed. The second Sample_2 has the subscription, but the subscriber, the SampleProcess orchestration, will not consume it. The orchestration expects and needs only one such message. After the SampleProcess orchestration is ended (And only after it! I will discuss this in the next article.), it is suspended (Nonresumable). Only one service instance is suspended now. This service instance has the ServiceClass of Orchestration, and its Error Description is:

In the Message tab the Sample_2 message is in the Suspended (Resumable) status.

Notes:

  • The orchestration consumes the extra message(s) and gets suspended together with these extra messages. These messages are consumed not as “processed by orchestration”. But they are consumed as “delivered to the subscriber”. The receive shape in the orchestration does not receive these extra messages. But these messages are routed to the orchestration. The Error information looks ambiguous.
  • The time zone between the last receive shape and the end of the orchestration is a "dangerous zone". The The orchestration should be designed to minimize it.

Unified Sequential convoy

Now get one more scenario.

It is a unified sequential convoy. The activation subscription is for the same message type as it for the instance subscription. The Sample_2 message now is the Sample message. For simplicity the SampleProcess orchestration consumes only two Sample messages. Usually the orchestration consumes a lot of messages inside a loop in this scenario, but now there are only two of them.

First message starts the orchestration; the second message is received by this orchestration. Then the next pair of messages follows, and so on.

But if the input messages follow in shorter intervals we have got the problem.

We lose messages in unpredictable manner as it discussed in the third scenario.

Conclusion

  • Maybe the better behavior for BizTalk would be if the orchestration removes the instance subscription after the message is consumed, not in the end on the orchestration. Current behavior looks like a bug. But right now it is a “feature” of the BizTalk subscription mechanism.
  • The time period between the last receive shape and the end of the orchestration is a "dangerous zone". The orchestration should be designed to decrease this zone as much as possible.

Note:

Several times I saw the explanation of the zombies, where a zombie created in the time period between the moment when the orchestration instance is scheduled to dispose and the moment the instance is disposed. I.e. the average dangerous time zone is about half of the MessageBox polling interval (which is by default 1 sec, that means 0.5 sec. It is not correct. The dangerous zone is between the last receive (for the message with the instance subscription) and the end of orchestration, and this zone could be much bigger than a half of second. For example if you include some processing, maybe sending and receiving messages, after the last receive, the dangerous zone would spent minutes or hours.

See Also

Another suggested read is the following wiki article:

  • See deep-dive about zombies from the BizTalk creators here  BizTalk Core Engine Blog.  Don't be scared by the "schedules", that was just a name of orchestration in 2004.
Another important place to find a huge amount of BizTalk related articles is the TechNet Wiki itself. The best entry point is BizTalk Server Resources on the TechNet Wiki.