Ron,
Our implementation would can be configured to either 1)
resume and throw an exception on the reply or 2) do nothing (can be used in
case where all operations are idempotent and retried without side effects.). The
behavior is configurable through the deployment descriptor.
Edwin
Folks,
I was recently given an
interesting question from one of my development teams, and I thought it would
be of interest to this group, since it touches on universal implementation
issues.
The question is based on the following
scenario: given a process something like this:
<sequence>
<receive name="rcv" ... />
<assign name="as1" ... />
<invoke name="inv" ... />
<assign name="as2" ... />
<reply name="rep" ... />
</sequence>
The <receive> and <reply> activities are part
of a request-response MEP, bound to SOAP, so that the request-response is
synchronous (uses the same connection for request and
response).
Simple enough. But suppose that during
execution of an instance of the above process, somewhere after the
<receive> activity is completed but before the <reply>
activity is done, the BPEL engine suffers a crash. Since we have the
full state persistence, recovery is simple enough. We can therefore finish
creating the reply, but this is rather useless, since the client connection is
lost.
So what is the right thing to do under these
circumstances? Should the engine, upon recovery in this situation, fault the
running activity? Should it continue to the reply activity, and presumably
fault because the connection is closed?
What of the
client program? It sees that the HTTP connection closed while awaiting a
response to the request. It might reasonably resend the request (HTTP being
what it is). If this is the expected behaviour, might it not be appropriate
for the BPEL engine offering the service our client is using to, upon recover,
"roll back" or otherwise compensate the completed activities in the sequence
(not shown in the process above), to the point of the <receive>
activity, and restart the receive?
I know that some
of these complexities are the result of using unreliable messaging, and you
get what you pay for, right? On the other hand, this illustrates some
interesting states that a BPEL implementation might have to deal with, which
aren't discussed in the specification. At the very least, we have some
unspecified faults to deal with -- presumably implementation specific.
So what are other implementers doing in this case?
Generating a fault of one sort of another, or performing more heroic efforts
to recover from the crash? I'm just interested in general approaches, since we
don't want to require NDAs here! My development team is busy trying to create
some recovery mechanisms for the scenario above, based on some sort of
client/server interaction (client retries being the most likely sort). These
guys are pretty clever, so I wouldn't doubt that they could invent something
that, in many cases, actually recover from the crash scenario above.
Thoughts? Is anyone else concerned about crash
recovery, perhaps with different
scenarios?
-Ron
|