OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

wsbpel-implement message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [wsbpel-implement] Fault tolerance considerations


    Interesting scenario. I have run into similar problems using HTTP in a product I worked on a while back. Our solution was to periodically send a "100 Continue" response to the client, thus keeping the connection alive, and the client happily waiting. It was an okay solution for the particular product, but in general it encourages a lot of idle network resources to be tied up in open connections. It also puts a crimp in the scaling story.

    Doesn't SOAP 1.1 talk about "natural" bindings for the request/response MEP, but does not mandate that request/response be truly synchronous? (Just a vague recollection; I can't seem to be able to raise the w3c site right now...).

    I believe WS-Routing allows specification of a return path, and some extra context information, so that one could easily correlate an asynchronous response to the originator of the request. WS-Reliability and ebXML MS have mechanisms for such message correlation as well. As long as BPEL is built on the abstract WSDL message model, it can largely ignore binding-specific issues. Of course, interoperability demands that we at least consider them! If WS-I BP 1.0 is considered the best bet for interoperability for BPEL implementations, then we should give such HTTP-related issues extra attention.


Marin, Mike wrote:


Well, I have the same problem, and you do not need a crash to do that. The problem is that BPEL prescribe receive-reply as implementing a synchronous WSDL operation, when in practice you cannot enforce it. You just need add a wait for a week between the receive and the reply, and I’m sure you do not want to keep the connection open for that long.


I opened issue 17 (Asynchronous operations) a while back, but have not have time to pursue it. IMHO the receive / reply pair does requires an asynchronous WSDL binding (one that does not require the connection to remain open). In theory, you could define such a binding, but nobody will be able to use it because first is not WS-I compliant, and second does not fit most WSDL implementation frameworks.


It may be that WS-Routing provides a solution to this issue by allowing a reverse message path for the reply. But, I have not had time to study this alternative.


In any case, I’m also interested on see (read) how others are tackling this implementation issue….




Mike Marin


-----Original Message-----
From: Ron Ten-Hove [mailto:Ronald.Ten-Hove@Sun.COM]
Sent: Tuesday, October 14, 2003 4:25 PM
To: bpel implementation
Subject: [wsbpel-implement] Fault tolerance considerations



    I was recently given an interesting question from one of my development teams, and I thought it would be of interest to this group, since it touches on universal implementation issues.

    The question is based on the following scenario: given a process something like this:

  <receive name="rcv" ... />
  <assign  name="as1" ... />
  <invoke  name="inv" ... />
  <assign  name="as2" ... />
  <reply   name="rep" ... />

The <receive> and <reply> activities are part of a request-response MEP, bound to SOAP, so that the request-response is synchronous (uses the same connection for request and response).

    Simple enough. But suppose that during execution of an instance of the above process, somewhere after the <receive> activity is completed but before the <reply> activity  is done, the BPEL engine suffers a crash. Since we have the full state persistence, recovery is simple enough. We can therefore finish creating the reply, but this is rather useless, since the client connection is lost.

    So what is the right thing to do under these circumstances? Should the engine, upon recovery in this situation, fault the running activity? Should it continue to the reply activity, and presumably fault because the connection is closed?

    What of the client program? It sees that the HTTP connection closed while awaiting a response to the request. It might reasonably resend the request (HTTP being what it is). If this is the expected behaviour, might it not be appropriate for the BPEL engine offering the service our client is using to, upon recover, "roll back" or otherwise compensate the completed activities in the sequence (not shown in the process above), to the point of the <receive> activity, and restart the receive?

    I know that some of these complexities are the result of using unreliable messaging, and you get what you pay for, right? On the other hand, this illustrates some interesting states that a BPEL implementation might have to deal with, which aren't discussed in the specification. At the very least, we have some unspecified faults to deal with -- presumably implementation specific.

    So what are other implementers doing in this case? Generating a fault of one sort of another, or performing more heroic efforts to recover from the crash? I'm just interested in general approaches, since we don't want to require NDAs here! My development team is busy trying to create some recovery mechanisms for the scenario above, based on some sort of client/server interaction (client retries being the most likely sort). These guys are pretty clever, so I wouldn't doubt that they could invent something that, in many cases, actually recover from the crash scenario above.

    Thoughts? Is anyone else concerned about crash recovery, perhaps with different scenarios?


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]