ws-tx message

Subject: RE: [ws-tx] Issue 041 - WS-AT: Invalid events should not cause defined transitions
From: "Max Feingold" <Max.Feingold@microsoft.com>
To: "Alastair Green" <alastair.green@choreology.com>,"Mark Little" <mark.little@jboss.com>
Date: Thu, 27 Jul 2006 10:32:13 -0700
Let me pose a scenario question here.

A participant sends Committed to a coordinator during the active state,
leading to a protocol violation.  The coordinator sends a fault to the
participant.  It then "does the right thing" (precisely what being the
topic of this discussion, of course).

The question is this:  from the perspective of the other participants
who have not committed or noticed a protocol violation, what do we think
will happen to the transaction?  Does it depend on what the coordinator
does?  If so, how?

-----Original Message-----
From: Alastair Green [mailto:alastair.green@choreology.com] 
Sent: Wednesday, July 19, 2006 2:22 AM
To: Mark Little
Cc: Peter Furniss; Ram Jeyaraman; ws-tx@lists.oasis-open.org
Subject: Re: [ws-tx] Issue 041 - WS-AT: Invalid events should not cause
defined transitions

Mark, Ram, Peter --

The original issue centres on the view that it is incorrect, 
meaningless, pointless, misleading to mandate state transitions in the 
face of a protocol error (non-conformant counterpart).

I agree with Peter and Mark on this point, and disagree with Ram. I 
think it is obvious that a protocol error means that all bets are off: 
the only meaningful state that could be transited to is "Screwed" or 
some more polite formulation thereof.

I would note that useful discussion of the same ground is in the archive

back in mid-May.

This is the primary debate. It affects the state tables. The retention 
or otherwise of IIS is orthogonal. If, however, you accept that IIS and 
IS are synonyms, then we face the same kind of redundancy that we saw 
with Replay. Changing the fault message to one that an implementation is

already tooled to send, is hardly onerous.

I do want to add an additional point to the discussion. Much emphasis 
has been laid on avoiding implementation assumptions or restrictions, 
and on avoiding prescribing internal behaviours, when looking at the 
state tables.

The decision as to what to do in the face of a protocol error is 
/pre-eminently/ a "local" (unilateral) one. No deductions about global 
state or outcome can be drawn from the receipt of Invalid State. A 
coordinator cannot assume anything once it receives notification of a 
protocol error. Either it is being lied to, or more likely, it is being 
informed of its own bug. There is no point going on, and anything other 
than the moral equivalent of a core dump is pointless.

The specification has no business telling implementers what to do in 
this case. And the instruction currently  given cannot be enforced or 
observed interoperably, and is therefore inappropriate. This has the 
smack of mechanical product-to-spec translation, carrying with it 
excessive implementation detail.

It is also interesting that the Completion Protocol has no way of 
informing a client that the Coordinator has got itself into the Screwed 
state. It certainly must not send Aborted, because that may be highly 
misleading. Example: bugged Coordinator sends Commit to all registered 
Participants (that's the bug), all of whom but one have already sent 
Prepared. All but one commit as instructed, one (currently) sends 
Invalid State and aborts. What is the outcome of the transaction?

We need to be able to report a transaction which has gone haywire to a 
CP participant.

Proposed resolution:

1. Remove AT-specific fault IIS.

2. Wherever we send IS (or IIS) make the state transit to new state 
"Inconsistency Hazard".

3. Allow CP coordinator to send Invalid State or another fault to 
indicate that the Coordinator has gone haywire.

Alastair

Mark Little wrote:
>
> On 17 Jul 2006, at 15:42, Peter Furniss wrote:
>
>> Ram and others,
>>  
>> The InconsistentInternalState issue is a secondary aspect of 041, 
>> whose main point is not about the fault sent but about what state the

>> table is in after the event occurs.
>>  
>> First, I better make sure we agree that a protocol violation means 
>> that one implementation has stepped off the path and either sent a 
>> message it is not allowed to in the current state, or has performed a

>> state transition it is not allowed to. Assuming that is what is meant

>> by protocol violation, it means the entire contract has broken down -

>> the rules have been broken and the implementations are no longer 
>> following our specification. There is no shared semantic any more.
>
> That would be consistent (!) with my understanding. Something like a 
> heuristic ;-) OK, forget I said that!
>
>>  
>> Accordingly, there is no way the specification can ever expect to 
>> achieve global consistency, whereever the parties are in the 
>> lifecycle - at least one of them isn't obeying the rules.
>>  
>> In consequence, the specification should not *require* a transition 
>> to another normal state for ANY protocol violation.
>
> I don't see how it can. What state makes sense here when there is 
> inconsistency?
>
>
>> Implementations should be left completely free to devise their own 
>> strategy for minimise the local damage and reporting to management. 
>> They might choose to trigger local abort (or do so only for certain 
>> states) but there isn't any general real guarantee that this will 
>> increase the chance of a consistent outcome across the transaction. 
>> (e.g. if Committed is received in Active state, it would seem 
>> possible that the participant has already committed - so aborting 
>> would make matters worse!)
>
>
> Inconsistent internal state implicitly means that there can be no 
> attempt to global consistency within the scope of the protocol (WS-AT 
> in this case). As you say, implementations may try some implementation

> specific (aka outside of the specification) ways to achieve 
> consistency, but there can be no guarantees. Unless we want to add 
> some new protocol messages the WS-AT to all for interoperable attempts

> at resolution in this case, which I don't believe we do and even then 
> there are no guarantees.
>
>>  
>> However, it's possible that we see this differently because our 
>> different understandings of the state table. If the understanding is 
>> that implementations are free to send protocol messages and make 
>> (abstract) state transitions as they feel fit, then my definition of 
>> protocol violation is unsound. An implementation sending surprising 
>> messages or changing state hasn't violated the protocol, because 
>> there is no rule to say it can't do that - it's just exercising it's 
>> right and the "protocol violation" would be just that the receiver 
>> didn't expect the message. But then it is even more certain that the 
>> parties don't know what the other is up to when the surprise message 
>> arrives.
>
> I don't like the sound of this definition because it seems easy to go 
> from here into Bizantine failure mode!
>
>>  
>>  
>> Returning to the secondary point of the definition of IIS, on either 
>> understanding, the distinction "no longer possible to change the 
>> outcome" (from an earlier condition, "is possible to change the 
>> outcome") would seem to be spurious.
>
> I agree.
>
>> Once the protocol has been violated or the receiver is surprised, 
>> there is no way of knowing what the other side is up to or what they 
>> perceive the outcome to be.
>>  
>> I'm not sure whether your definition of IIS assumes that there are 
>> some additional semi-permitted state transitions that correspond to 
>> anonymous, but actually well-defined, internal events. For example, 
>> do you believe that the arrival of Commit at a Participant in 
>> Aborting state (for example) will occur in semi-normal (i.e. unusual 
>> but bug-free) circumstances when the Participant in PreparedSuccess 
>> has made an internal decision to locally initiate rollback - and by 
>> doing so transitions itself to Aborting. Such an action isn't defined

>> in the state table, but, if the "anything-is-permitted" understanding

>> is followed might be an implementation-defined action in the case of 
>> a heuristic decision. IIS would then be a (non-reliable) heuristic 
>> report. But to explain the why of that, there ought to be an internal

>> event that reflects the transition to Aborting. Otherwise there is no

>> reason to believe that an implementation that had locally initiated 
>> rollback would be in Aborting state - it would be equally rational to

>> say that such an implementation was still in PreparedSuccess, but had

>> released resources.
>
> However, saying all of that, if we return to the actual issue (41, in 
> case anyone has forgotten it), what is the problem with retaining IIS?

> As you said originally, IS from WS-C could perhaps be used, but we 
> have IIS already and there are existing implementations that use it. 
> As long as we are clear on the reasons behind it, even if IIS 
> duplicates WS-C's IS, it's not a bug in the protocol as such and I'd 
> prefer to leave it in.
>
> Mark.
>
>
>>  
>> Peter
>>  
>>  
>>  
>>  
>>
>>
------------------------------------------------------------------------
>> *From:* Ram Jeyaraman [mailto:Ram.Jeyaraman@microsoft.com]
>> *Sent:* 12 July 2006 07:35
>> *To:* ws-tx@lists.oasis-open.org <mailto:ws-tx@lists.oasis-open.org>
>> *Subject:* RE: [ws-tx] Issue 041 - WS-AT: Invalid events should not 
>> cause defined transitions
>>
>> The AT specification's specific use of InconsistentInternalState 
>> (IIS) fault is to indicate protocol violations that occur after it is

>> no longer possible to change the outcome of the transaction; IIS is 
>> used in the PV table in the cells { Commit; Aborting } and { 
>> Rollback, Committing }.
>>
>>  
>>
>> The current definition of IIS does not correctly reflect its intended

>> use. Hence, rewording its definition consistent with the 
>> aforementioned use:
>>
>>  
>>
>> "This fault is sent by a participant or coordinator to indicate that 
>> a protocol violation has been detected after it is no longer possible

>> to change the outcome of the transaction. This is indicative of a 
>> global consistency failure and is an unrecoverable condition."
>>
>>  
>>
>> Further, the cells in the CV table should throw IIS fault (instead of

>> InvalidState) from the following cells:
>>
>>  
>>
>> Format: {Row; Column1, Column2}
>>
>>  
>>
>> 1. {ReadOnly; PreparedSuccess, Committing}
>> 2. {Aborted; PreparedSuccess, Committing}
>>
>> 3. {Committed; PreparedSuccess, Aborting}
>>
>>  
>>
>>  
>>
>> *From:* Ram Jeyaraman [mailto:Ram.Jeyaraman@microsoft.com]
>> *Sent:* Tuesday, March 28, 2006 10:25 AM
>> *To:* ws-tx@lists.oasis-open.org <mailto:ws-tx@lists.oasis-open.org>
>> *Subject:* [ws-tx] Issue 041 - WS-AT: Invalid events should not cause

>> defined transitions
>>
>>  
>>
>> This is identified as WS-TX issue 041.
>>
>>  
>>
>> Please ensure follow-ups have a subject line starting "Issue 041 - 
>> WS-AT: Invalid events should not cause defined transitions".
>>
>>  
>>
>>
------------------------------------------------------------------------
>>
>> *From:* Peter Furniss [mailto:peter.furniss@erebor.co.uk]
>> *Sent:* Monday, March 27, 2006 1:33 PM
>> *To:* ws-tx@lists.oasis-open.org <mailto:ws-tx@lists.oasis-open.org>
>> *Subject:* [ws-tx] New issue: WS-AT: Invalid events should not cause 
>> defined transitions
>>
>>  
>>
>> Issue name -- WS-AT: Invalid events should not cause defined
transitions
>>
>>  
>>
>> PLEASE DO NOT REPLY TO THIS EMAIL OR START A DISCUSSISON THREAD UNTIL

>> THE ISSUE IS ASSIGNED A NUMBER.
>>
>>  
>>
>> The issues coordinators will notify the list when that has occurred.
>>
>>  
>>
>> Target document and draft:
>>
>>  
>>
>> Protocol:  WS-AT
>>
>>  
>>
>> Artifact:  spec
>>
>>  
>>
>> Draft:
>>
>>  
>>
>> AT spec cd 1
>>
>>  
>>
>> Link to the document referenced:
>>
>>  
>>
>>
http://www.oasis-open.org/committees/download.php/17311/wstx-wscoor-1.1-
spec-cd-01.pdf
>>
http://www.oasis-open.org/committees/download.php/17325/wstx-wsat-1.1-sp
ec-cd-01.pdf
>>
>>  
>>
>> Section and PDF line number:
>>
>>  
>>
>> ws-at section 10, lines 503/505
>>  coordinator table: Committed/Active, Committed/Preparing
>>  pariticipant table: Commit/Active, Commit/Preparing
>> ws-at: seciton 6.1, line 371
>>
>>  
>>
>> Issue type:
>>
>>  
>>
>> Design/Editorial
>>
>>  
>>
>>
>> Related issues:
>>
>>  
>>
>>
>> Issue Description:
>>
>>  
>>
>> The receipt of a message when the receiver is in a state such that 
>> the event cannot occur between correct implementations should not 
>> cause a state transition and allow the transaction to complete 
>> "successfully".
>>
>>  
>>
>> There is no need to distinguish "InvalidState" and 
>> "InconsistentInternalState".
>>
>>  
>>
>> Issue Details
>>
>>  
>>
>> Background
>>
>>  
>>
>> InvalidState is defined in WS-Coordinator as being an unrecoverable 
>> condition, and in all the cases  where it is a defined response in 
>> the WS-AT tables can only occur if one of the implementations is 
>> broken/bugged (apart than the volatile Prepared/None case, see 
>> separate issue).  Providing a defined state transition, as if the 
>> circumstance were expected and could be recovered from is 
>> inappropriate.  There can be no graceful completion of the protocol -

>> it has gone fundamentally wrong. This does not preclude an 
>> implementation from attempting to tidy up and protecting its own 
>> resources, but there should be no required state transition for the 
>> implementation. The protocol exchange has gone off the map.
>>
>>  
>>
>> The use of InconsistentInternalState to distinguish two cases where 
>> an invalid event occurs is unnecessary (and the definition in line 
>> 371 does not align with the use in the table - it is probably the 
>> coordinator that has been sending wrong messages). 
>>
>>  
>>
>> The use of InvalidState is appropriate in all cases.
>>
>>  
>>
>> Proposed resolution
>>
>>  
>>
>> The clearest solution would be to make invalid cells in the state 
>> tables empty, for the cells currently shown as InvalidState or 
>> InconsistentInternalState, and also for the N/A cells and explain 
>> this with text:
>>
>>  
>>
>>  "Where a cell is shown as empty
>>   
>>   - if the row is for an Inbound Event, an WS-C Invalid State fault 
>> should be returned. The subsequent behaviour of the implementation is

>> undefined.
>>   
>>   - if the row is for an Internal Event, event cannot occur in this 
>> state. A TM should view these occurences as serious internal 
>> consistency issues."
>>
>>  
>>
>> Having invalid cells empty makes it significantly easier to read and 
>> check the state tables. It becomes much clearer that they are 
>> essentially "sparse" and the path through the table can be followed 
>> more easily.
>>
>>  
>>
>>
>
Follow-Ups:
- Re: [ws-tx] Issue 041 - WS-AT: Invalid events should not cause definedtransitions
  - From: Alastair Green <alastair.green@choreology.com>
References:
- Re: [ws-tx] Issue 041 - WS-AT: Invalid events should not cause definedtransitions
  - From: Alastair Green <alastair.green@choreology.com>