Re: [ws-tx] State table intent (was RE: [ws-tx] Issue 036

ws-tx message

Subject: Re: [ws-tx] State table intent (was RE: [ws-tx] Issue 036 - WS-AT:Coordinator state machine incomplete)

From: Alastair Green <alastair.green@choreology.com>

To: Peter Furniss <peter.furniss@erebor.co.uk>

Date: Mon, 12 Jun 2006 16:03:51 +0100

Peter,

It's worth us all considering what would happen if there were no state tables.. Possibly, "given the literature", a (needed) statement that this is presumed abort, and a defined set of XML messages, then we would "all know what to do".

Or would we? For example: I do not believe that the normal, well-understood thing to do on receiving an invalid message (one that is a protocol error, arrives in an invalid state) is to move to Aborting. The possibility of misordered one-way messages is something that needs to be taken into account with a WS protocol.

This is the subject of one of your issues, and is a good example of "common knowledge" turning out to be treacherous ground.

If the state tables are going to be there then I agree that they need to follow a consistent model, and all the logical entities, semantics, constraints etc need to properly described.

The half-way house of "we all know what they mean", and "this bit is meant to be illustrative, but let's not bother write that fact down" is the worst way of proceeding, in my view.

I do believe you can manage this with two tables (CV and PV), as long as you explicitly recognize other (simpler) entities that communicate with those tables, and describe and define the meaning of those communications.

The state tables express the legitimate type and viable sequences of signals emitted and received by an abstract (model) state machine.

The signals certainly do travel (in the form of specified XML infoset messages) between a machine called the Participant View, and a machine called the Coordinator View. That is part of the externally visible behaviour of the state machines.

But signals, which are also visible outside the state machine, also travel (in an undefined, abstract way) between:

* the CV and a B-Coordinator (a coordinator of the whole of a transaction sub-tree, which may be the root) -- another view of that B-Coordinator is that it is a recording entity (RE), although that raises interesting questions for sub-coordinators
* the PV and a different RE
* the PV and an application entity (AE), an example of which is a two-phase aware resource manager.

All of these signals are "externally visible", in the sense that any entity (CV, CV-RE, PV, PV-RE, PV-AE) may only send a defined set of signals, some of which must only be sent in defined orders, and may only usefully receive a defined set of signals in defined orders (though it may receive duplicates and misordered signals).

Some send behaviours must be ordered (and must be defined by the tables). Sending a Committed ahead of Prepared, and ahead of receiving Commit is fatal (will either wedge the two ends, or cause early abortion, depending on your view). That ordering results from CV-PV messages. Sending Prepared before preparing the AE is also fatal: this ordering (which involves internal events, and has nothing to do with what is visible to the CV) is also critical to the correctness of the protocol, if running durable.

So, "external visibility" cannot be a term which is deemed to be restricted to the CV and PV signals. That just won't work.

To elaborate on the latter example: a PV must not send a Prepared signal to the CV until it has a) received Prepare from the CV, b) decided that the reaction should be to attempt to go prepared, c) sent a "complete work and prepare" signal to to the PV-AE, d) received a "work completed and prepared" signal from the PV-AE, e) sent a "write prepared log" signal to the PV-RE, and f) received a "write succeeded" signal in response.

For a participant which means by sending Prepared: "Semantic A: I will, absent catastrophic failures of the AE or RE, always be able to process the coordinator's outcome message, irrespective of PV/AE crashes", this sequence is necessary (cannot be abridged).

If the participant means by sending Prepared: "Semantic B: I hope to be able to process the coordinator's outcome, and am currently able to do so", then there would be no need to specify actions e) and f). They simply shouldn't be modelled. But actions c) and d) are necessary, because we have to show a fork, between going prepared, rolling back and going read-only.

My first point, therefore, is that the state tables either mean something by the internal events (signals to the RE and AE entities) or they do not. If they are optional then they should not be there. At minimum it should be stated which ones are optional, and if they are that their presence is non-normative, illustrative of a particular implementation strategy or whatever else they are meant to be. If they are to mean something then we need to surface these unnamed entities, and explain what they do and their role -- because their actions have externally visible effects.

For example, a log write failure turns "intent to prepare" into "abort locally" (signal to AE) and send Aborted to CV. It then turns out, of course, that you do need e) and f) because you have to cater for an implementation that is running persistent (is trying to deliver the D of ACID), i.e. is trying to implement Semantic A above.

A PV implementation that is trying to send Semantic B can simply omit e) and f): if it does so it will never send an invalid message to CV -- but that does not mean we can do without signals e) and f) in the tables.

My second point would then be: it is not possible to write state tables that show legitimate sequences of sending by deduction simply from invalid reception states in the other state machine. You must a) show send behaviours to achieve correct histories, and b) those send (and receive) behaviours cannot be fully described without reference to the "internal" modelling entities AE and RE.

Alastair

Peter Furniss wrote:

Following the discussion at the end of the f-t-f, I believe the first thing we need to get quite clear is what the state tables are intended to do - the second being to make sure the text makes it clear what that is (if it doesn't currently) and the third to make sure the tables correctly deliver their intent (if they don't currently).

I believe it was the consensus that

the tables are to define the externally visible behaviour on a single coordinator:participant relationship

there is a separate Coordinator view state machine for each participant (with independent states)

It was also said that the tables "did not cover all the internal behaviour" of an implementation

Although that last statement is undoubtedly true, I think it needs a bit of clarification. In particular, we cannot state that the act of sending a message is "internal" - it is obviously externally visible. Thus any sending of a message needs to be justified by an entry in the state table that permits the sending. In consequence, although states are in one sense internal, if the state table is to mean anything, the transitions between states have to be regarded as externally visible. The actual stimulus both to change state, and to send the message may be internal. And of course the state itself is a modelling abstraction - it may not really correspond to a value held in the machine (it might be implicit in a "program counter" for example).

If we don't regard the state transitions as external - i.e. we allow that an implmentation may make arbitrary transitions that are not reflected in the state table, then the table becomes a nonsense. For example, an implmentation could do the sequence send Prepare, send Commit, receive Prepared, send Rollback, send Commit by asserting that it had made legitimate (but hidden) internal transitions thus permitted such.

Is this the consensus ? If it is, then I think it would be helpful to expand the introduction text to the state tables to say so (if it isn't, then it is absolutely essential to expand that introduction to say what we eventually agree the table scope is).

I'd suggest adding text at the end of the first paragraph of clause 10 :

The following state tables specify the behavior of coordinators and participants when presented with protocol messages or internal events. These tables present the view of a coordinator or participant with respect to a single partner. A coordinator with multiple participants can be understood as a collection of independent coordinator state machines <addition>, each with its own state. Although the states and internal events are modelling abstractions, an implementation must not make state transitions or send protocol messages other than as permitted by these tables</addition>.

We may want to re-arrange or expand that further, especially to explain more fully what sort of real events are abstracted by the internal events.

If we are agreed that all sending and transitions have to be supported by the state table, we then need to make sure all legitimate events are shown. If we don't agree, I will propose deleting the state tables as they are only confusing.

Peter

From: Ram Jeyaraman [mailto:Ram.Jeyaraman@microsoft.com]
Sent: 28 March 2006 19:17
To: ws-tx@lists.oasis-open.org
Subject: [ws-tx] Issue 036 - WS-AT: Coordinator state machine incomplete

This is identified as WS-TX issue 036.

Please ensure follow-ups have a subject line starting "Issue 036 - WS-AT: Coordinator state machine incomplete".

From: Peter Furniss [mailto:peter.furniss@erebor.co.uk]
Sent: Monday, March 27, 2006 1:06 PM
To: ws-tx@lists.oasis-open.org
Subject: [ws-tx] New issue: WS-AT: Coordinator state machine incomplete

Issue name -- WS-AT: Coordinator state machine incomplete

PLEASE DO NOT REPLY TO THIS EMAIL OR START A DISCUSSISON THREAD UNTIL THE ISSUE IS ASSIGNED A NUMBER.

The issues coordinators will notify the list when that has occurred.

Target document and draft:

Protocol: WS-AT

Artifact: spec

Draft:

AT spec cd 1

Link to the document referenced:

http://www.oasis-open.org/committees/download.php/17325/wstx-wsat-1.1-spec-cd-01.pdf

Section and PDF line number:

section 10, lines 493 - 510

Issue type:

Design / Editorial

Related issues:

New issue: WS-C, WS-AT, WS-BA: Term "Coordinator" overloaded
New issue: WS-AT: Register/Preparing in coordinator state table problematic

Issue Description:

The WS-AT coordinator state tables do not provide explanation of what the states or internal events are, and do not provide specification of the interactions between the completion, volatile and durable protocols.

Issue Details:

The WS-AT state tables are described as showing " the view of a coordinator or participant with
respect to a single partner. A coordinator with multiple participants can be understood as a collection of
independent coordinator state machines". In fact they appear to use the states and internal events of the multi-lateral coordinator but inbound events for a single partner. In the absence of any explanation, especially of the internal events, this is confusing (and there seem to be some inconsistencies).

There is also little or no specification in the state tables of the interactions between the completion, volatile and durable protocols (which essentially determine that the transaction is atomic).

Evidence of multilateralism:
A bilateral state engine will move from None to Active on receiving Register. "Active" thus has to be interpreted as the state of the multilateral coordinator.
Receipt of ReadOnly would cause a bilateral state engine to go from Active to None - the tables show it as having no effect on the state.

Lack of protocol interaction
the tables provide no specification of the volatile-first behaviour - except perhaps in the Register/Preparing cell, which is cannot be harmonised with the rest of the table (see separate issue)

if "User Commit" is to be interpreted as Commit on the Completion protocol (or its non-standardised equivalent), then a Prepare shouldn't be sent to a Durable Participant until all the Volatiles have replied.

The Commit Decision event is presumably meant to occur when all participants have replied Prepared or ReadOnly, but according to the state table it could occur any time after Prepare has been sent.

Proposed Resolution:

Create separate tables for the multilateral and bilateral relationships, and define what all the states and events mean.