ws-tx message

Subject: Incomplete resolution of 007

From: Alastair Green <alastair.green@choreology.com>
To: ws-tx@lists.oasis-open.org
Date: Wed, 01 Mar 2006 20:53:07 +0000

Dear colleagues,

I excerpt below a mail I sent in discussion with the editors, arising from the attempt to fulfil the AI on resolution for 009.

Andrew Wilkinson pointed out that this feels more like the property of the whole TC than a working group discussion, and I agree.

Originally in reply to a mail from Mark Little:

"... I don't think we can ignore the the duplicate RegisterResponse issue or hope it will be dealt with at the infrastructure level without a bit of extra specification, in WS-AT and WS-BA.

To recap: duplicate Registers are deemed to be OK by resolution of 007: the Coordinator generates a new EPR for the deemed "new participant".

Duplicate registers can arise either by impatient retry, or by transport redelivery. The ensuing RegisterResponses will both be delivered to the same EPR, so the receiving end can work out that it's received one twice (ignore the second one).

The rule is: if an RR message is received twice targeted on the same EPR then it has to be thrown away. This is the same kind of rule that is expressed in the WS-AT state tables for e.g. duplicate Prepares. Not quite the same -- the action is not to resend a response, but the fact that this may happen has to be captured somewhere.

As Max points out, the current PV state table assumes that RegisterResponse will arrive once. It doesn't cope with duplicate RegisterResponses.

This is only OK if the "throw away" (no-op) rule is stated elsewhere.

Here are two implementaton strategies that might be adopted:

A. Set a participant state machine to a state of "initial" or "registering", and send Register to C. Keep a vector of all message ids for all Registers sent for the current P EPR, with a vector status of "live". If a RegisterResponse arrives whose reply-to value is equal to one of the stored message ids, and the vector is "live" then set the participant state machine to "active", mark the vector as "dead". If the RR arrives when the vector is "dead" then ignore the inbound message (no-op). [This is very artificial: I am trying to imagine why and how you would actually use the values of message id and reply to.]

B. Set a participant state machine to a state of "initial" or "registering" and send Register to C. If a RegisterResponse arrives at the current P EPR, and the state machine is in state "registering" set the state machine state to "active", and proceed. If an RR arrives when the state machine is "active" then ignore the inbound message (no-op).

Logically, these are the same state machine. In the first case we have created an ancillary mini-machine that uses the Request-Reply MEP features. In the second case the implementation state machine is a direct reflection of the logical state machine (that does not use the RR MEP features). .

In my view the specification describe the logical state machine, and should leave the implementation strategy to the implementer (especially as implementation strategy A is so unnatural).

Note that this problem is created by the fact that we are potentially processing a sequence of messages, each with its own message id. There is no concept in WS-A of such a sequence. Therefore, we need to say -- here, in these specs -- that such a sequence can exist, and how to deal with it. Otherwise it becomes one of those cases where "we all know what we meant to say", which is not a good practice. Right now, if you look at the row RegisterResponse, column Active, in the 2PC PV of WS-AT you will read the following: Invalid State/Active. And according to the text immediately above, Invalid State means: "send an Invalid State fault" -- which is not what we want.

Either we change the state table, or we write text enforcing an approach similar to strategy A. On grounds of consistency, minimalism, and freedom of implementation choice I would prefer to change the WS-AT state table.

It's an unfortunate fact that the RR MEP is not doing anything fundamental here except forcing implementers into a particular (unspecified) behaviour. As I am tired of fighting City Hall, I don't mind acceding to the (pointless, harmless) presence of RR MEP, but it isn't a finished job, unless we address this possibility in one of the two ways I have raised. There is nothing in the current spec to stop a faithful implementation receiving a duplicate RR and directing it at a state machine that will fault it.

Furthermore, and taking off from another of your comments: we could introduce a statement into WS-C (there is none there now) which stated that duplicate RegisterResponses are discarded. This would be contrary to the resolution of 007 which contains the statement:

The manner in which the participant handles duplicate protocol messages depends
on the specific coordination type and coordination protocol.

Even if we introduced a textual statement on discards in the AT and BA specs, we are not finished with the problem. The whole RegisterResponse row of the AT state table has to cope with the arrival of a duplicate RegisterResponse (late, out of order). At present that row incorrectly faults a late duplicate, when in fact the duplicate RR should always be thrown away. This strongly indicates that the AT state table is the place to define all duplicate RR behaviours.

I assume that the same will apply to BA."

I have also raised the possibility, in the light of the above, that use of RR MEP versus one-way, for WS-C exchanges, might be made optional (an implementation choice).

Alastair