Well, I
have the same problem, and you do not need a crash to do that. The problem is
that BPEL prescribe receive-reply as implementing a synchronous WSDL operation,
when in practice you cannot enforce it. You just need add a wait for a week
between the receive and the reply, and I’m sure you do not want to keep the connection
open for that long.
I opened
issue 17 (Asynchronous operations) a while back, but have not have time to
pursue it. IMHO the receive / reply pair does requires an asynchronous WSDL
binding (one that does not require the connection to remain open). In theory,
you could define such a binding, but nobody will be able to use it because first
is not WS-I compliant, and second does not fit most WSDL implementation
frameworks.
It may be
that WS-Routing provides a solution to this issue by allowing a reverse message
path for the reply. But, I have not had time to study this alternative.
In any
case, I’m also interested on see (read) how others are tackling this
implementation issue….
--
Regards,
Mike Marin
-----Original
Message-----
From: Ron Ten-Hove
[mailto:Ronald.Ten-Hove@Sun.COM]
Sent: Tuesday, October 14, 2003
4:25 PM
To: bpel implementation
Subject: [wsbpel-implement] Fault
tolerance considerations
Folks,
I was recently given an interesting question from one of my
development teams, and I thought it would be of interest to this group, since
it touches on universal implementation issues.
The question is based on the following scenario: given a
process something like this:
<sequence>
<receive name="rcv" ... />
<assign name="as1" ... />
<invoke name="inv" ... />
<assign name="as2" ... />
<reply name="rep" ... />
</sequence>
The <receive> and
<reply> activities are part of a request-response MEP, bound to SOAP, so
that the request-response is synchronous (uses the same connection for request
and response).
Simple enough. But suppose that during execution of an
instance of the above process, somewhere after the <receive> activity is
completed but before the <reply> activity is done, the BPEL engine
suffers a crash. Since we have the full state persistence, recovery is simple
enough. We can therefore finish creating the reply, but this is rather useless,
since the client connection is lost.
So what is the right thing to do under these circumstances?
Should the engine, upon recovery in this situation, fault the running activity?
Should it continue to the reply activity, and presumably fault because the
connection is closed?
What of the client program? It sees that the HTTP connection
closed while awaiting a response to the request. It might reasonably resend the
request (HTTP being what it is). If this is the expected behaviour, might it
not be appropriate for the BPEL engine offering the service our client is using
to, upon recover, "roll back" or otherwise compensate the completed
activities in the sequence (not shown in the process above), to the point of
the <receive> activity, and restart the receive?
I know that some of these complexities are the result of
using unreliable messaging, and you get what you pay for, right? On the other
hand, this illustrates some interesting states that a BPEL implementation might
have to deal with, which aren't discussed in the specification. At the very
least, we have some unspecified faults to deal with -- presumably
implementation specific.
So what are other implementers doing in this case?
Generating a fault of one sort of another, or performing more heroic efforts to
recover from the crash? I'm just interested in general approaches, since we
don't want to require NDAs here! My development team is busy trying to create
some recovery mechanisms for the scenario above, based on some sort of
client/server interaction (client retries being the most likely sort). These
guys are pretty clever, so I wouldn't doubt that they could invent something
that, in many cases, actually recover from the crash scenario above.
Thoughts? Is anyone else concerned about crash recovery, perhaps
with different scenarios?
-Ron