ws-tx message

Subject: Re: [ws-tx] Issue 041 - WS-AT: Invalid events should not cause definedtransitions

From: Alastair Green <alastair.green@choreology.com>
To: Max Feingold <Max.Feingold@microsoft.com>
Date: Fri, 28 Jul 2006 15:52:56 +0100

Max,

This is another good example of the problem.

First, I assume that "it does the right thing" means both: "the participant does the right thing", and that "the coordinator does the right thing".

Here is what the participant should do according to the spec: it receives Invalid State, and there is no definition of what to do when IS is received (it is not mentioned as an input message or event in the tables). So the spec is silent.

This is reasonable: Invalid State is described as an unrecoverable error, which I take to mean: "we can't get out of this hole".

So the participant is now aware that it made a mistake (or that the C is bugged: it probably won't admit to itself that it made a mistake), and that there is nothing more it can do than alert an admin somehow. It cannot abort, as it has already committed its resources, we assume -- unless it is doubly bugged. It could try to abort its resources in a last desperate attempt to recover its error. That is a fairly pointless strategem: it is most likely involved in a deeper-rooted logic error if it has got to this state. In any event, these actions are all private, implementation-dependent. No more messages are emitted by P (we hope).

The coordinator now knows that one participant claims to have committed, despite receiving no instruction to do so.

Its actions subsequently depend on the view of the implementer.

Whatever C does now, it is facing a situation of uncertainty about the transaction outcome.

There are four possible scenarios, being the cross-product of {P did commit, P lied and did not really commit} and {C chooses to abort other Ps, C chooses to commit remaining Ps}.

1) A uniform commit is achieved, because the P did in fact commit, and told the truth when it sent Committed, and C chooses to proceed to commit all other Ps when they have prepared in the normal fashion. The premature Commit is deemed to equal Prepared + Commit.

2) A mixed outcome is achieved. C trusts P, so commits all other Ps. P was lying, receives Invalid State and chooses to abort.

3) A mixed outcome is achieved: P did in fact commit, but C decides to abort everyone else. Again, C cannot be certain this has happened.

4) A uniform abort is achieved: P lied when it sent commit, will now abort on receipt of Invalid State, and all other Ps are aborted by C choosing to instruct them to do so.

In any of these circumstances C cannot know which of these outcomes is in fact achieved. It does not know what P actually did (was it single-bugged or double-bugged? If double-bugged, will P react to IS by attempting abort?).

Whatever it's decision, it should let the controlling app (the transaction terminator, CP participant) know that the outcome is unknown, and that atomicity is in doubt (the state of the overall coordinator is what I have called "Inconsistency Hazard")

The sensible thing for a TM to do in this circumstance is to let someone know, so the actual result can be investigated. Sending Aborted as the CP outcome message, or sending Committed as the CP outcome message would be to send an outcome that might be true, but might also be false.

The C could of course decide to do nothing: to wedge the other resources in the prepared state.

[A secondary point: Invalid State is the wrong fault to send when late registrations occur: the correct fault would be: CannotRegisterParticipant. This is a legitimate race condition, which can be survived. This is an editorial matter: we have already decided to do this by virtue of resolution of 037 (the state tables lag the descriptive text). I will raise this again in a separate thread.]

Alastair

Max Feingold wrote:

Let me pose a scenario question here.

A participant sends Committed to a coordinator during the active state,
leading to a protocol violation.  The coordinator sends a fault to the
participant.  It then "does the right thing" (precisely what being the
topic of this discussion, of course).

The question is this:  from the perspective of the other participants
who have not committed or noticed a protocol violation, what do we think
will happen to the transaction?  Does it depend on what the coordinator
does?  If so, how?

-----Original Message-----
From: Alastair Green [mailto:alastair.green@choreology.com] 
Sent: Wednesday, July 19, 2006 2:22 AM
To: Mark Little
Cc: Peter Furniss; Ram Jeyaraman; ws-tx@lists.oasis-open.org
Subject: Re: [ws-tx] Issue 041 - WS-AT: Invalid events should not cause
defined transitions

Mark, Ram, Peter --

The original issue centres on the view that it is incorrect, 
meaningless, pointless, misleading to mandate state transitions in the 
face of a protocol error (non-conformant counterpart).

I agree with Peter and Mark on this point, and disagree with Ram. I 
think it is obvious that a protocol error means that all bets are off: 
the only meaningful state that could be transited to is "Screwed" or 
some more polite formulation thereof.

I would note that useful discussion of the same ground is in the archive

back in mid-May.

This is the primary debate. It affects the state tables. The retention 
or otherwise of IIS is orthogonal. If, however, you accept that IIS and 
IS are synonyms, then we face the same kind of redundancy that we saw 
with Replay. Changing the fault message to one that an implementation is

already tooled to send, is hardly onerous.

I do want to add an additional point to the discussion. Much emphasis 
has been laid on avoiding implementation assumptions or restrictions, 
and on avoiding prescribing internal behaviours, when looking at the 
state tables.

The decision as to what to do in the face of a protocol error is 
/pre-eminently/ a "local" (unilateral) one. No deductions about global 
state or outcome can be drawn from the receipt of Invalid State. A 
coordinator cannot assume anything once it receives notification of a 
protocol error. Either it is being lied to, or more likely, it is being 
informed of its own bug. There is no point going on, and anything other 
than the moral equivalent of a core dump is pointless.

The specification has no business telling implementers what to do in 
this case. And the instruction currently  given cannot be enforced or 
observed interoperably, and is therefore inappropriate. This has the 
smack of mechanical product-to-spec translation, carrying with it 
excessive implementation detail.

It is also interesting that the Completion Protocol has no way of 
informing a client that the Coordinator has got itself into the Screwed 
state. It certainly must not send Aborted, because that may be highly 
misleading. Example: bugged Coordinator sends Commit to all registered 
Participants (that's the bug), all of whom but one have already sent 
Prepared. All but one commit as instructed, one (currently) sends 
Invalid State and aborts. What is the outcome of the transaction?

We need to be able to report a transaction which has gone haywire to a 
CP participant.

Proposed resolution:

1. Remove AT-specific fault IIS.

2. Wherever we send IS (or IIS) make the state transit to new state 
"Inconsistency Hazard".

3. Allow CP coordinator to send Invalid State or another fault to 
indicate that the Coordinator has gone haywire.

Alastair

Mark Little wrote:

On 17 Jul 2006, at 15:42, Peter Furniss wrote:

Ram and others,
 
The InconsistentInternalState issue is a secondary aspect of 041, 
whose main point is not about the fault sent but about what state the

table is in after the event occurs.
 
First, I better make sure we agree that a protocol violation means 
that one implementation has stepped off the path and either sent a 
message it is not allowed to in the current state, or has performed a

state transition it is not allowed to. Assuming that is what is meant

by protocol violation, it means the entire contract has broken down -

the rules have been broken and the implementations are no longer 
following our specification. There is no shared semantic any more.

That would be consistent (!) with my understanding. Something like a 
heuristic ;-) OK, forget I said that!

 
Accordingly, there is no way the specification can ever expect to 
achieve global consistency, whereever the parties are in the 
lifecycle - at least one of them isn't obeying the rules.
 
In consequence, the specification should not *require* a transition 
to another normal state for ANY protocol violation.

I don't see how it can. What state makes sense here when there is 
inconsistency?

Implementations should be left completely free to devise their own 
strategy for minimise the local damage and reporting to management. 
They might choose to trigger local abort (or do so only for certain 
states) but there isn't any general real guarantee that this will 
increase the chance of a consistent outcome across the transaction. 
(e.g. if Committed is received in Active state, it would seem 
possible that the participant has already committed - so aborting 
would make matters worse!)

Inconsistent internal state implicitly means that there can be no 
attempt to global consistency within the scope of the protocol (WS-AT 
in this case). As you say, implementations may try some implementation

specific (aka outside of the specification) ways to achieve 
consistency, but there can be no guarantees. Unless we want to add 
some new protocol messages the WS-AT to all for interoperable attempts

at resolution in this case, which I don't believe we do and even then 
there are no guarantees.

 
However, it's possible that we see this differently because our 
different understandings of the state table. If the understanding is 
that implementations are free to send protocol messages and make 
(abstract) state transitions as they feel fit, then my definition of 
protocol violation is unsound. An implementation sending surprising 
messages or changing state hasn't violated the protocol, because 
there is no rule to say it can't do that - it's just exercising it's 
right and the "protocol violation" would be just that the receiver 
didn't expect the message. But then it is even more certain that the 
parties don't know what the other is up to when the surprise message 
arrives.

I don't like the sound of this definition because it seems easy to go 
from here into Bizantine failure mode!

 
 
Returning to the secondary point of the definition of IIS, on either 
understanding, the distinction "no longer possible to change the 
outcome" (from an earlier condition, "is possible to change the 
outcome") would seem to be spurious.

I agree.

Once the protocol has been violated or the receiver is surprised, 
there is no way of knowing what the other side is up to or what they 
perceive the outcome to be.
 
I'm not sure whether your definition of IIS assumes that there are 
some additional semi-permitted state transitions that correspond to 
anonymous, but actually well-defined, internal events. For example, 
do you believe that the arrival of Commit at a Participant in 
Aborting state (for example) will occur in semi-normal (i.e. unusual 
but bug-free) circumstances when the Participant in PreparedSuccess 
has made an internal decision to locally initiate rollback - and by 
doing so transitions itself to Aborting. Such an action isn't defined

in the state table, but, if the "anything-is-permitted" understanding

is followed might be an implementation-defined action in the case of 
a heuristic decision. IIS would then be a (non-reliable) heuristic 
report. But to explain the why of that, there ought to be an internal

event that reflects the transition to Aborting. Otherwise there is no

reason to believe that an implementation that had locally initiated 
rollback would be in Aborting state - it would be equally rational to

say that such an implementation was still in PreparedSuccess, but had

released resources.

However, saying all of that, if we return to the actual issue (41, in 
case anyone has forgotten it), what is the problem with retaining IIS?

As you said originally, IS from WS-C could perhaps be used, but we 
have IIS already and there are existing implementations that use it. 
As long as we are clear on the reasons behind it, even if IIS 
duplicates WS-C's IS, it's not a bug in the protocol as such and I'd 
prefer to leave it in.

Mark.

 
Peter

------------------------------------------------------------------------

*From:* Ram Jeyaraman [mailto:Ram.Jeyaraman@microsoft.com]
*Sent:* 12 July 2006 07:35
*To:* ws-tx@lists.oasis-open.org <mailto:ws-tx@lists.oasis-open.org>
*Subject:* RE: [ws-tx] Issue 041 - WS-AT: Invalid events should not 
cause defined transitions

The AT specification's specific use of InconsistentInternalState 
(IIS) fault is to indicate protocol violations that occur after it is

no longer possible to change the outcome of the transaction; IIS is 
used in the PV table in the cells { Commit; Aborting } and { 
Rollback, Committing }.

 

The current definition of IIS does not correctly reflect its intended

use. Hence, rewording its definition consistent with the 
aforementioned use:

 

"This fault is sent by a participant or coordinator to indicate that 
a protocol violation has been detected after it is no longer possible

to change the outcome of the transaction. This is indicative of a 
global consistency failure and is an unrecoverable condition."

 

Further, the cells in the CV table should throw IIS fault (instead of

InvalidState) from the following cells:

 

Format: {Row; Column1, Column2}

 

1. {ReadOnly; PreparedSuccess, Committing}
2. {Aborted; PreparedSuccess, Committing}

3. {Committed; PreparedSuccess, Aborting}

 

 

*From:* Ram Jeyaraman [mailto:Ram.Jeyaraman@microsoft.com]
*Sent:* Tuesday, March 28, 2006 10:25 AM
*To:* ws-tx@lists.oasis-open.org <mailto:ws-tx@lists.oasis-open.org>
*Subject:* [ws-tx] Issue 041 - WS-AT: Invalid events should not cause

defined transitions

 

This is identified as WS-TX issue 041.

 

Please ensure follow-ups have a subject line starting "Issue 041 - 
WS-AT: Invalid events should not cause defined transitions".

------------------------------------------------------------------------

*From:* Peter Furniss [mailto:peter.furniss@erebor.co.uk]
*Sent:* Monday, March 27, 2006 1:33 PM
*To:* ws-tx@lists.oasis-open.org <mailto:ws-tx@lists.oasis-open.org>
*Subject:* [ws-tx] New issue: WS-AT: Invalid events should not cause 
defined transitions

 

Issue name -- WS-AT: Invalid events should not cause defined

transitions

 

PLEASE DO NOT REPLY TO THIS EMAIL OR START A DISCUSSISON THREAD UNTIL

THE ISSUE IS ASSIGNED A NUMBER.

 

The issues coordinators will notify the list when that has occurred.

 

Target document and draft:

 

Protocol:  WS-AT

 

Artifact:  spec

 

Draft:

 

AT spec cd 1

 

Link to the document referenced:

http://www.oasis-open.org/committees/download.php/17311/wstx-wscoor-1.1-
spec-cd-01.pdf

http://www.oasis-open.org/committees/download.php/17325/wstx-wsat-1.1-sp
ec-cd-01.pdf

 

Section and PDF line number:

 

ws-at section 10, lines 503/505
 coordinator table: Committed/Active, Committed/Preparing
 pariticipant table: Commit/Active, Commit/Preparing
ws-at: seciton 6.1, line 371

 

Issue type:

 

Design/Editorial

 


Related issues:

 


Issue Description:

 

The receipt of a message when the receiver is in a state such that 
the event cannot occur between correct implementations should not 
cause a state transition and allow the transaction to complete 
"successfully".

 

There is no need to distinguish "InvalidState" and 
"InconsistentInternalState".

 

Issue Details

 

Background

 

InvalidState is defined in WS-Coordinator as being an unrecoverable 
condition, and in all the cases  where it is a defined response in 
the WS-AT tables can only occur if one of the implementations is 
broken/bugged (apart than the volatile Prepared/None case, see 
separate issue).  Providing a defined state transition, as if the 
circumstance were expected and could be recovered from is 
inappropriate.  There can be no graceful completion of the protocol -

it has gone fundamentally wrong. This does not preclude an 
implementation from attempting to tidy up and protecting its own 
resources, but there should be no required state transition for the 
implementation. The protocol exchange has gone off the map.

 

The use of InconsistentInternalState to distinguish two cases where 
an invalid event occurs is unnecessary (and the definition in line 
371 does not align with the use in the table - it is probably the 
coordinator that has been sending wrong messages). 

 

The use of InvalidState is appropriate in all cases.

 

Proposed resolution

 

The clearest solution would be to make invalid cells in the state 
tables empty, for the cells currently shown as InvalidState or 
InconsistentInternalState, and also for the N/A cells and explain 
this with text:

 

 "Where a cell is shown as empty
  
  - if the row is for an Inbound Event, an WS-C Invalid State fault 
should be returned. The subsequent behaviour of the implementation is

undefined.
  
  - if the row is for an Internal Event, event cannot occur in this 
state. A TM should view these occurences as serious internal 
consistency issues."

 

Having invalid cells empty makes it significantly easier to read and 
check the state tables. It becomes much clearer that they are 
essentially "sparse" and the path through the table can be followed 
more easily.

Follow-Ups:
- RE: [ws-tx] Issue 041 - WS-AT: Invalid events should not cause defined transitions
  - From: "Max Feingold" <Max.Feingold@microsoft.com>

References:
- RE: [ws-tx] Issue 041 - WS-AT: Invalid events should not cause defined transitions
  - From: "Max Feingold" <Max.Feingold@microsoft.com>