ws-tx message

Subject: Re: [ws-tx] Proposed AT CP state table

From: Alastair Green <alastair.green@choreology.com>
To: Max Feingold <Max.Feingold@microsoft.com>
Date: Wed, 19 Jul 2006 10:59:47 +0100

Max:

I'm going to pursue this one last time (before going away for a short holiday), primarily because of your response on points 2 and 3.

If the state table says

a) that the event that leads to Send Aborted or Send Committed stimulates a state transition to None, and
b) that the response to receipt of Commit or Rollback in None is to send Unknown Transaction,

then it follows that it is forbidden for a conformant implementation to "knowingly" send Aborted or Committed twice. It is also forbidden to send Invalid State in response to a duplicate instruction.

Of course, a transport-level duplication may cause Aborted or Committed to be duplicated. That can never happen with Invalid State, as it can never be sent. According to the state table, the only messages that can be sent in are Commit and Rollback, and the only ones allowed out are Committed, Aborted and Unknown Transaction. There is no possible reason to send Invalid State.

This allows an implementation to sneak an Aborted or Committed out if it chooses, in response to duplicate instructions. We cannot tell the difference between a witting and unwitting violation in this case. But it cannot properly send an IS: if it does so, it has to assume that the CP Participant may barf.

Put another way: a conformant CP participant must be prepared to absorb and discard duplicate outcome messages, but should not receive IS messages. In general, it should not receive any message other than those mentioned as coordinator-initiated in Section 4.2, or those WS-C faults that would result from sending invalid protocol (e.g. responding with WS-C Invalid Protocol to an attempt to send Prepared to a CP coordinator).

If a CP P receives IS then it should assume that the CP C is untrustworthy (has sent a response that should never arise, is bugged). All bets are off.

(I leave aside the need to report the "screwed" state that can arise from 2PC protocol errors, see discussion on issue 041.)

Do you agree?

* * *

I have no problem with faults. I have no problem with the extended table which carefully distinguishes outcome replays and protocol errors by separating three Aborting states. However, if the overweening desire for terseness and simplicity holds sway, it is legitimate to replay the outcome message, and not send a fault, because identical, accurate information is conveyed. That's all.

Alastair

Max Feingold wrote:

Alastair:

2. I don't think commit retries and their rational handling are
precluded by the spec.  That said, I think it's logically and rationally
precluded to rely on commit retries working reliably across
implementations, due to obvious race conditions.

3. Again, I don't think the spec is preventing you from implementing the
behavior that you desire.

4. I agree, but in general that is the price of receiving reliable
outcome.  If the guarantees of volatile 2PC are sufficient for you, then
you can register a volatile participant instead.  And if they aren't
sufficient, then completion was never going to make you happy anyway.

5. We've defined faults for protocol violations of this nature.  I'm not
too worried about scenarios where a Byzantine volatile participant isn't
re-sent an outcome.

-----Original Message-----
From: Alastair Green [mailto:alastair.green@choreology.com] 
Sent: Tuesday, July 18, 2006 4:37 PM
To: Max Feingold
Cc: Peter Furniss; Ram Jeyaraman; ws-tx@lists.oasis-open.org
Subject: Re: [ws-tx] Proposed AT CP state table

Max,

1. It has never been in doubt that a client has to be able to receive 
Unknown Transaction. (The input document said it would always receive 
Aborted when in None state, which was truly bizarre, but that's been 
fixed, thankfully). That's never been at issue in this discussion.

2. The motivation from Ram for the proposed state table was that it did 
not preclude replay of the outcome. It appears that you agree that this 
is precluded. It's important for us all to be clear on the permitted 
behaviour, which is the operational point of this phase of the 
discussion. Where did the resolution leave us?

3. Any protocol based on one-way messages without guarantees of delivery

(like this one) has to cope with message duplication. As it stands, the 
protocol now prevents any smarts in reaction to such duplication. That 
is a limitation on implementation which was not necessary. The "long 
table" placed no limitations on your implementation strategy/design. 
 From my experience of usage of transaction systems, it is useful to 
maximize the likelihood of delivering the actual outcome -- and indeed 
it is painful not to be able to find out what happened to a transaction 
when connectivity is lost.

4. Registering a durable 2PC participant to improve the chances of 
finding out the result is heavyweight. It's bad enough that one has to 
go through the hoop of registering a CP participant (the coordinator 
could, after all, handle a Commit or Rollback as its first message from 
the CP participant, in principle).

5. I am still mystified as to the nature of the crime that is committed,

if a Rollback engenders a Committed. Of course the application that 
engenders this condition is bugged, or ill-constructed, but the issue 
is: does the protocol convey the stupidity semantic adequately? How I 
map the "bozo bug" in an API, for example, is a free choice: does the 
exception say: "Protocol error" or "Tut, tut" or "It's already been 
committed"? That's a question of how much information you want to 
convey. The real point is that Committed = Protocol Error, and Protocol 
Error = Committed, in this case -- and that's why it doesn't matter what

name you give to the message.

I would have been perfectly happy to see your three Aborted states (the 
"extra long table") because I don't care about using another two square 
inches of spec space. But if you want to compress the table, then 
eliding the difference between expected Committed and unexpected 
Committed is perfectly workable: after all, the primary semantic is not 
"stupidity", but "it was committed".

Alastair

Max Feingold wrote:

I think the completion participant has two responsibilities:

   1. To vote on the outcome of the transaction.
   2. To ensure that (in the case of a 'commit' vote) transaction
      completion does not race with active work.

A completion participant that changes its mind on the outcome it 
wishes to see is no different than a volatile participant doing the 
same. This is a protocol violation. That is why I believe that 
replying to Rollback with Committed is wrong - it implies that the 
completion participant already voted to commit, but has now changed 
its mind.

...

In designing an API for completion that works against generic WS-AT 
implementations, a completion participant that resends Commit cannot 
rely on receiving accurate responses to Commit resends, given the 
volatility of the protocol and the short window in which the 
transaction outcome is generally known. Your own implementation is 
free to implement your desired lingering behavior, but your 
participant-side implementation has to be prepared to deal with 
UnknownTransaction faults if you're going to resend Commit against 
generic coordinators.

If your client-side API needs to _/reliably/_ know the outcome, it 
should implement a durable 2PC participant as well as a completion 
participant. But in general, completion APIs tend to be best-effort, 
since the outcome reported to the application is generally best-effort

and does not affect consistency.

------------------------------------------------------------------------

*From:* Alastair Green [mailto:alastair.green@choreology.com]
*Sent:* Monday, July 17, 2006 4:34 AM
*To:* Peter Furniss
*Cc:* Ram Jeyaraman; ws-tx@lists.oasis-open.org
*Subject:* Re: [ws-tx] Proposed AT CP state table

Peter,

These are important points you have raised.

On a) I believe that sending Aborted spontaneously is an unhelpful 
innovation: I did raise this point in writing and in the discussion, 
but it got short shrift in the discussion (I'm not sure anyone else 
addressed it all). This new behaviour prevents (or complicates) the 
writing of thin demarcating clients. It changes existing

implementations.

On b): "objection" is perhaps a mild term. Max described the notion of

replying to a Rollback with a Committed as "frivolous" and "deeply 
concerning" in the chat room.

I find this odd. Our chief concern is correctness. Within that 
framework, being informative is helpful and good. If I send Rollback 
and receive Commit then I know one of two things: either that I have 
previously sent Commit (i.e. that my application client has committed 
a logic error), or that some other CP participant has done so. If I 
receive Invalid State then I know either that my interlocutor is 
bugged in some odd way, or that I or another participant has 
previously sent Commit. Where is the information loss? Why is this 
"frivolous"?

If as Peter says, it is deemed (in line with Microsoft's motivating 
text and argumentation) that the new state table does not prevent 
communication to the participant of knowledge of the fact of having 
made a commit decision, or the fact of having been instructed to abort

or having made an abort decision -- despite having nominally moved to 
state None -- then the implementation presumably can send Committed in

response to Rollback. There is nothing in the spec to stop that 
happening. Nor is there anything in the spec to permit or prevent the 
sending of Invalid State.

On an alternative interpretation (the only sensible one of a reader 
coming cold to the spec), it is forbidden to send the actual outcome 
because the state None must be responded to with Unknown Transaction, 
even if the transaction is in fact still known.

This is not frivolous, but it is a mistake. If a participant sends 
Commit, and receives no response, then it may retry Commit. If the 
first message did get through, and the response was lost, then he will

now receive Unknown Transaction.

In designing an API for completion, there are some obvious things you 
can do to maximize the chance of the client knowing the outcome, /as 
long as the protocol is able to support replay of the outcome 
message/. To take one obvious example: remember the outcome at least 
as long as the transaction timeout. This is now precluded, in my view.

The outcome of this decision of the TC is to render the Completion 
Protocol fuzzy and different, which is not good news for

interoperation.

Alastair

Peter Furniss wrote:

Noting (and accepting) that the committee decided to use the table Ram

proposed,

a) we now have a mis-alignment between this table and the diagram for 
the completion protocol. The diagram does not show a 
coordinator-generated Aborted from Active state to Ended. Someone may 
want to raise an issue on this.

b) given the general interpretation rule for the tables that means, 
even with this table, a coordinator implementation can choose to 
return a repeat Committed or Aborted if it knows the state, it would 
seem the behaviour Alastair proposed, of replying to a Rollback with 
Committed (following an earlier Commit and CommitDecision) is also 
permitted. There was objection to that idea in the discussion, but 
since we have chosen silence and allowed implementations to send 
additional messages, there is nothing to prevent it.

Peter

------------------------------------------------------------------------

*From:* Ram Jeyaraman [mailto:Ram.Jeyaraman@microsoft.com]
*Sent:* 13 July 2006 16:09
*To:* Alastair Green; ws-tx@lists.oasis-open.org 
<mailto:ws-tx@lists.oasis-open.org>
*Subject:* RE: [ws-tx] Proposed AT CP state table

Thanks Alastair.

Here is the normalized state table that represents the case where 
duplicate Commit or Rollback message would receive Unknown 
Transaction. I believe that this does not prevent a coordinator 
implementation from returning Committed or Aborted, if it chooses to, 
during the transitory 2PC time window.

	

	

	

*Completion protocol*

*(Coordinator View) **[New]*

* *

	

*States*

	

* *

	

* *

*Inbound Events*

	

*None*

	

*Active*

	

*Completing*

*Commit*

	

/UnknownTransaction/

None

	

/Initiate User Commit/

Completing

	

/Ignore/

Completing

*Rollback*

	

/UnknownTransaction /

None

	

/Initiate User Rollback, Send Aborted/

None

	

/InvalidState/

Completing

*Internal Events*

	

* *

	

* *

	

* *

*Commit Decision*

	

N/A

	

N/A

	

/Send Committed/

None

*Abort Decision*

	

N/A

	

/Send Aborted/

None

	

/Send Aborted/

None

-----Original Message-----
From: Alastair Green [mailto:alastair.green@choreology.com]
Sent: Thursday, July 13, 2006 7:31 AM
To: ws-tx@lists.oasis-open.org <mailto:ws-tx@lists.oasis-open.org>
Subject: [ws-tx] Proposed AT CP state table

Dear all,

Ram and I have been working on the AT CP state table issue, and I 
think we've converged to a considerable degree, but not fully.

I'm attaching my proposal.

There are two points outstanding between Choreo and Microsoft:

1. Should a participant sending Aborted, leading to an active state 
rollback (cell Rollback Decision/Active) induce a "Send Aborted" 
action (not shown in my proposal)?

This would mean that a CP participant would receive a spontaneous 
Aborted outcome message before it had sent Commit or Rollback to the 
coordinator. I do not object to this per se, but am worried that this 
turns a request-response model into a full one-way model, which would 
preclude a thin client implementation of a CP participant (on which 
point I have raised a separate issue).

2. Should the state table show Committing and Aborting states, 
allowing a precise response (Committed or Aborted) to be returned 
during the processing of the 2PC protocol across the underlying 
participants? This is the approach in my proposal.

Ram, I believe, accepts that it is legitimate for an implementation to

do this (our product does so), but thinks that the transition to state

None should occur immediately that Aborted is received, or internal 
event Commit Decision arises, thereby removing the Committing and 
Aborting states in the proposed table.

This would mean that any duplicate Commit or Rollback message would 
receive Unknown Transaction. If I can be persuaded that this approach 
does not prevent returning Committed or Aborted after the None 
transition (i.e. that the implementation was free to communicate 
outcome knowledge if it happened to still have it) then I would be 
happy with that, but I believe that this approach would in fact make 
that illegal (because contrary to the state table).

Yours,

Alastair

References:
- RE: [ws-tx] Proposed AT CP state table
  - From: "Max Feingold" <Max.Feingold@microsoft.com>