RE: [sca-bpel] Issue 3 - An amended proposal (take 2)

Hi Alex,

   >> Right now, I can see there are 2 options:
         >> (1) close the issue with no change
         >>(2) close the issue with the supplied proposal (maybe with some minor friendly amendment, including dropping the protocol timeout bullets)

I think we must address this issue, I don’t think that this issue must be closed with no action. I see only one indeterministic case:

- Correlation message arrives before the start message. I thought about it and in my opinion a correlation violation exception must be thrown as it is illogical to

join a conversation that has not been started or join a conversation that will be started in future. As the client already knows about the correlation id, it can be sent

later when process instance is created.

I think client not knowing that correlation mismatch has occurred is a poor design in my view.

Thanks,

Najeeb

From: Alex Yiu [mailto:alex.yiu@oracle.com]
Sent: Wednesday, December 05, 2007 5:22 PM
To: Danny van der Rijn
Cc: Najeeb Andrabi; ALEX.YIU@oracle.com; OASIS BPEL
Subject: Re: [sca-bpel] Issue 3 - An amended proposal (take 2) - Correlation Disagreement between SCA and BPEL proposal

Hi Danny and Najeeb,

[I]

I partially disagree. BPEL 2.0 Section 9.2 says:

Good catch. Invoke will be an exceptional case. I agree with you.

I do agree, though that those faults are only thrown internal to the BPEL process, and are not externally visible.

It's good that we are on the same page.

[II]

... or the instance matching the correlation *has yet to be started.*

Ok with adding your addtional statement.
"... or the process instance that potentially matches the correlation has yet to be started."
I will consider this as a friendly amendment.

[III]

This is the statement that I just don't buy as defining this issue worthy of standardizing. I'd like to have some more discussion about when/how the BPEL infra can make such a determination. I posit that it's enough of an edge case that it would be unwarranted to standardize what to do in this case. To restate myself, this standard is about the combination of SCA and BPEL. If we're positing behavior that is outside the scope of the BPEL standard (IMO *way* outside), there is no place in the BPEL C&I spec for how such a situation should be handled.

To say it yet another way, you're attempting to create a standard way of dealing with an event that many compliant (BPEL and SCA BPEL C&I) implementations will never see. You're trying to create a standard fault that we can't create a compliance test for.

Actually, you read my mind here. :-)
At the very beginning, when Najeeb wanted to open this issue, I have quite a bit of hesitation. I'd rather not opening this can of worm. But, it seems to me that I failed to convince Najeeb. The bar of opening a new issue is intentionally low. So, I did not want to oppose opening this issue that much.

I agree with you partially here. Without referring to a concrete protocol (e.g. some version of WS-BA), the scope of the condition for the infrastructure to raise this fault is:
(a) very small: I am referring to the "will never be matched" condition and "a request-response operation" bullet
(b) very abstract: I am referring to "one-way operation" and "this error situation MAY be notified" bullet.

That's why I tried to convince Najeeb not to open this issue. Because, what the SCA-BPEL spec can do here is very limited and the proposal value giving people a sense of direction.

At the same time, once the issue is open, I want to make sure the spec describes some correct and desirable behavior (allowing implementation to pick a right protocol), even if the behavior description is abstract.

How are you proposing this would occur? Timeouts are specified on a client. Providers don't send timeout faults. This fault is, in essence, a statement of "I bet you're tired of waiting. Instead of waiting for YOU to give up, I'm giving up on your behalf." While possibly more informative, it's bizarre, and, I would say, incorrect.

Copied from the proposal:

If the message endpoint address of a business process leverages or participates a (transport or above transport level) protocol that has a timeout or expiration duration value specified,

That is yet another abstract part of the proposal. First of all, there is a "IF" at the beginning - i.e. IF a protocol is used and IF a protocol has a timeout/expiration feature. If such a protocol is used, the initiator of the protocol (transport or above transport level) will typically specify the timeout. (An example is Expiration feature in WS-Coordination).

[IV]

Well, OK, I can buy what you said there. That an active, waiting, IMA can be uniquely identified by a correlation set. But I'm not sure what that buys you in this conversation. All you may know is that no IMA is *currently active* for a particular incoming message. I know you're not suggesting that that's enough to trigger a dead-letter situation. What more information would you need? Is that information anywhere in either standard?

Actually, I still fail to understand the point you are trying to express about whether BPEL CS identifies a process instance. IMHO, that question has no direct impact on this issue and its proposal. :-)

To me, the core of a dead-letter message is about whether an incoming message can be matched and dispatched to a process instance. BPEL CS is just a part of the formula to match a message to a process instance. :-)

[V]
Right now, I can see there are 2 options:
(1) close the issue with no change
(2) close the issue with the supplied proposal (maybe with some minor friendly amendment, including dropping the protocol timeout bullets)

And, I am happy with both options, even I spent quite some time in drafting the proposal.

Thanks!

Regards,
Alex Yiu

Danny van der Rijn wrote:

I blame my email client ;-)

Danny van der Rijn wrote:

Najeeb - here's my argument. I'd like to talk with you about it before I send it out. Perhaps my having put it in writing will help the conversation between you and me? If we don't talk about it before I leave today, I'm going to send it out, so it's on record for tomorrow's call.

Danny

I still think that this proposal is badly flawed. Comments below.

Alex Yiu wrote:

Hi all,

Here is an amended proposal (take 2) for this Issue 3.

Let me repeat a few points here:

"bpel:correlationViolation" is for BPEL's CorrelationSet lifecycle violation - not CS mismatch. I am not sure we want to overload it. All existing "bpel:*" fault thrown are for internally consumption only. Not directly visible through BPEL partnerLink.

I partially disagree. BPEL 2.0 Section 9.2 says:

When a bpel:correlationViolation is thrown by an <invoke> activity because of a violation on the response of a request/response operation, the response MUST be received before the bpel:correlationViolation is thrown. In all other cases of bpel:correlationViolation, the message that causes the fault MUST NOT be sent or received.

I do agree, though that those faults are only thrown internal to the BPEL process, and are not externally visible.

We need to address the difference in one-way or request-response MEP.

The proposal has two parts: background (non-normative) text and normative text.

My proposal is to add both parts into the spec text:

The background text will be added as a non-normative appendix titled: "Background about Dead Letter Messages in BPEL: (Non-normative)"
------------------------------------
[background text begins here ...]
When an inbound message comes into the SCA and BPEL infrastructure, such a message is normally consumed by a matching inbound message activity (IMA)(e.g. a <receive> activity). However, due to process model error or runtime message data error, there is no matching IMA at all or a matching IMA is not enabled within the expected time limit of the (system/business level) protocol between the message sender and receiver. This kind of messages, which do not have a matching IMA, are termed as "dead message messages"

Examples of process model error are:

matching IMAs are skipped by faults
matching IMAs blocked by other activities within a sequence or an impossible-to-fulfill control link transition condition.
IMAs cannot receive message due to incorrect usage of message correlation mechanism, including BPEL correlation set and SCA conversational interface

Examples of runtime message data error are similar to above, as the above error are not inside the process definition itself but caused by incorrect data values.

There might not be a universal way to determine a message is truly a "dead letter message" without any additional protocol between message senders and receivers. Consider the following example, an message is dispatched to a BPEL process instance by SCA conversational mechanism. At the moment when the message is matched with the BPEL process instance, there might be no <receive> activity enabled for the matching partnerLink and operation at all, or there is a <receive> activity enabled for the matching partnerLink and operation but with a mismatched correlation set. Some users might think this is certainly a dead letter message situation. However, a matching IMA may be enabled minutes, hours, or days later, as the matching IMA might be blocked in the process model.

... or the instance matching the correlation *has yet to be started.*

On the other hand, there might be some cases that the BPEL infrastructure can determine there will never be a matching IMA enable in future. And, some advanced features in BPEL infrastructure (e.g. process instance repair or process definition repair) might make the detection of "dead letter message" cases more difficult. However, with some additional system-level protocol coordination between the message sender and receiver, it might make detection easier.

------------------------------------

------------------------------------
[normative text begins here ...]
If the SCA or BPEL infrastructure is able to determine that a message, that has been sent to an endpoint address of a business process, will never be matched with a corresponding inbound message activity (IMA) (i.e. receive, onMessage or onEvent), then:

If the message is sent through a request-response operation, "sca:DeadLetterMessageError" fault SHOULD be replied to the message sender
If the message is sent through a one-way operation and additional system-level protocol is used between the message sender and receiver, this error situation MAY be notified to the message sender, according to the protocol used.

If the message endpoint address of a business process leverages or participates a (transport or above transport level) protocol that has a timeout or expiration duration value specified, and if no IMA can be matched with the inbound message within the timeout / expiration duration, then:

If the message is sent through a request-response operation, "sca:DeadLetterMessageError" fault MAY be replied to the message sender

If the message is sent through a one-way operation and additional system-level protocol is used between the message sender and receiver, this error situation MAY be notified to the message sender, according to the protocol used.

Again, why are we telling people how to deal with such a situation that's so far outside of the standard realm?

For example, if a message is sent through a request-response operation and HTTP is used as a transport protocol with the timeout duration set as 60 seconds, and if SCA/BPEL infrastructure can determine ino IMA can be matched within 60 seconds, a SCA+BPEL infrastructure might reply "sca:DeadLetterMessageError" fault as the response to pre-empt the transport level timeout error.

Again, the 60 seconds is a value known to the client implementation. How does the SCA/BPEL infrastructure get notified of what the timeout is? Even if it *could* know, it's going to have to determine some amount of time *before* the 60 seconds to send its fault, so it can be received before the client times out. What if the IMA can be matched in that window?

------------------------------------

Besides using the "never" wording suggested by Michael Rowley, there are some minor fine tuning of wordings in the first half of the normative text.

The second half of the normative text is newly added. The logic and style are very similar to the first half. It is explicitly targeting the "protocol time out" situation that Danny mentioned in the last email. If people do NOT want the spec to deal with "protocol time out" situation explicitly, I am OK to remove it.

Thanks!

Regards,
Alex Yiu

sca-bpel message