ws-tx message

Subject: Re: [ws-tx] Issue 007 - WS-C: Make Register/RegisterResponse retriable

From: Alastair Green <alastair.green@choreology.com>
To: Max Feingold <Max.Feingold@microsoft.com>
Date: Sat, 07 Jan 2006 21:30:35 +0000

Max,

I want to focus on two points you raise:

a) the "triviality" of avoiding duplicate terminator registration in the WS-AT Completion protocol

b) the issue of generality (WS-C versus WS-X which references WS-C)

Triviality

I think you are masking a substantive problem with the label of "triviality" applied to the WS-AT Completion protocol. The current scheme (message ids combined with an AlreadyRegistered [or CannotRegisterParticipant] fault) is not the best solution. I am also worried about reliance on SOAP stack implemention characteristics (which are not standardized), and on estimates of retry or failure likelihood, based on "topological closeness" (which seems like a contradiction in terms to me).

A coordinator receives a registration from a participant. Its view is: "I will not allow more than one registration: there must only be one participant." Restated: allow only one agent to adopt the role of transaction terminator [Initiator in WS-AT terminology] in the Completion protocol. I assume here that we need to avoid the situation where two programs register themselves to play the terminating role.

This rule can only be fulfilled properly if the coordinator can a) handle repeat registrations of the same participant, and b) distinguish between repeat registrations of the same thing, on the one hand, and registrations of different things, on the other. It seems very easy to handle both of these requirements using participant ids.

Repeat registrations of the same participant must be responded to, as there is no way of knowing if the repeat occurs because the original response was lost. Making a rule that retries will not be made, to avoid this problem, seems unnecessary, and inconsistent. Both WS-AT and WS-BA, in their coordination protocols proper, use replays to achieve failure tolerance. It is true that a "one shot" approach would "work" in WS-AT, in the sense that a transaction timeout could be used to garbage collect the transaction, but I see no good reason to have such a fragile approach.

Either we must make an unambiguous statement that retry will not occur, or we must state how retries will be handled. A SOAP stack that had a retry strategy based on replaying the same request with the same message id until a related reply was received would violate a rule prohibiting retries. What is required is a rule in this specification that does permit retries, with defined means of identifying replays: this rule can then be implemented at whatever level makes sense for a particular product. Any other approach will not define interoperable behaviour correctly.

(If we do repeat, I think we are all agreed that it is not appropriate to use a fault as the response replay: a fault is the wrong way to carry the required EPR, whose transmission is required to terminate the exchange.)

What is the best way of identifying replays on behalf of a given Participant? Idempotence via reuse of message ids for replays is contrary to the spirit of WS-Addressing, as I have pointed out in a prior posting. A separated WS-Addressing implementation is quite likely to generate a unique message-id for each RR MEP exchange. To demand that it allow "chaining" (repetition of the same message id as a prior exchange) is to introduce a non-WS-Addressing concept. Which is fine, but then we are not relying upon another specification's approach or stipulations: we have a free hand to achieve our requirement optimally, and we must write the rules.

Your comment that it is easy to eliminate duplicates at the transport layer (ignore the second delivery of the same message id) dovetails with your view that it is unlikely that deliberate retries will be attempted. But deliberate retrying is perfectly likely -- I think you anticipate it happening in a SOAP/WS-A stack.

An implementation may have some method such as Transaction.commit(). The implementation of this API call will logically cause a) registration for AT CP, and b) transmission of AT Commit. If the registration fails to receive a response (we assume that communications can fail) then I would want to retry (for some configurable number of times) before blowing out the client. Assumptions of deployment "closeness" ("topological closeness") have no place in a distributed interoperation protocol: we cannot rely on high hopes of reliability relating to "closeness" of two agents. If they are connected by an unreliable transport of unknown quality (which the specs otherwise assume) then any message send can fail, and we must take account of that. There is no "connection break" to inform us that the attempted exchange is out of the water: we must do that job at our level in the stack.

Even if the questionable technique of reusing WS-A message ids to identify a sequence is used, it is unclear why it should be deemed to be the best solution. Participant ids are lighter weight, more obvious in their intent and purpose, and more generally useful.

The elimination of multiple terminator registrations also requires identifying the logical entity on whose behalf a Register/RegisterResponse exchange operates. Again, message ids as a means of discrimination could be used. But what is being identified here is not the message exchange, but the sequence of message exchanges required to achieve, in a reliable way, the registration of the Initiator. And (in this context) the identity of the sequence R1/RR1, R2/RR2 .. Rn/RRn is tantamount to the identity of the registrant (i.e. the Participant in WS-C terms).

Method A: We can bend the meaning of message id (stating that it must be reused for retries), and add a message id (URL) to the request, and reference it in the reply by use of the request-reply MEP.

Method B: We can add a U/IRI participant id to the request, and only use one MEP (one-way with full addressing). The reply is not affected.

One might say: on a scale of triviality, B is more trivial than A. Stone B also kills several other birds in passing.

Generality

I am much more sympathetic to your points on avoiding false generality in the "base class" of WS-C. This is a classic design choice: how many reuses justify depression to the base of a given piece of functionality? To which there is no "right" answer.

I believe that both WS-AT and WS-BA require the same feature (to be precise, all known issues relating to identification, duplicate/multiple registration of participants for both protocols can best be resolved by one solution: participant ids). I think one could put this feature in WS-C, or restate the feature in each referencing specification. Personally, I would prefer to do it in WS-C.

Alastair

Max Feingold wrote:

Merry Christmas and happy holidays to all!

 

There are a few observations I would like to make on this topic before I head out on vacation.

 

First, it is perfectly possible to implement WS-AT without participant identifiers in a manner that does not prohibit deliberate resends generate undesired transaction aborts.  There are two generally interesting cases:  one in which the participant has forgotten and is not aware that it is sending a duplicate Register, and another in which the participant has deliberately decided to resend Register.  Both can be made to work in an interoperable fashion.  I'll send a separate message containing that discussion.

 

Second, I do not believe that anyone in this TC wishes to prohibit the possibility of creating a coordination protocol that relies on participant identifiers or any other mechanism in order to ensure correctness.  On the other hand, it seems unwise to me to attempt to enforce a single model for registration for every coordination protocol, regardless of their specific requirements.

 

The design spirit of WS-Coordination, which we applied quite successfully in the last telephone discussion (concerning the appropriate definition and placement of faults), is to include two general sets of mechanisms in WS-C:

 

1) Those that are used by virtually all protocols

2) Broad extensibility that allows derived protocols to cover their other specific needs.  That philosophy, applied to this discussion, would suggest that if a given mechanism is not needed by our current coordination protocols, it is not a good candidate for inclusion in the base specification.

 

Consequently, the participant identifier mechanism is a perfect example of a mechanism that should make use of WS-Coordination extensibility.  Any protocol that requires the ability to detect duplicate registrations and uniquely identify participants can simply leverage the open content that is provided in the Register message.

 

I think that the interesting discussion is not whether such a mechanism belongs in WS-Coordination (I think it is pretty clear that it does not), but whether specific coordination protocols need such a mechanism.  The ones that do should not be prohibited by WS-C;  the ones that don't should not suffer any additional complexity.  I believe that is the case with the current language in the specifications, although some editorial language clarifying this freedom may be appropriate (e.g. Ian's suggested text).

 

Third, some odds and ends in response to several previous messages: 

 

- AlreadyRegistered was intended mostly for protocols such as WS-AT Completion where duplicate detection is trivial.  Given that we have already determined that a RegistrationFailed fault is desirable, we can probably just delete the AlreadyRegistered fault.  For protocols that can detect duplicates, the appropriate response for a duplicate registration is likely either a RegistrationFailed fault with a specific reason, a standard RegisterResponse or some protocol-specific message.

 

- Register messages that are duplicated by the transport are not likely to be of concern to a coordination protocol;  duplicate detection can be trivially performed at the SOAP layer by filtering on message ids.  It's just as easy as filtering on some other identifier, and it's likely that many stacks will already do this.

 

- WS-AT Completion registrations are restricted to a single participant.  It is true that adding participant identifiers would allow a completion participant to re-send register.  However, Completion participants are (a) unlikely to be recoverable or tolerant of failures and (b) unlikely to be topologically distant from their coordinator.  Consequently, I do not sense a strong need to allow the Completion registrations to be re-sent.


________________________________

From: Mark Little [mailto:mark.little@jboss.com]
Sent: Sun 12/18/2005 7:24 AM
To: Peter Furniss
Cc: Max Feingold; ws-tx@lists.oasis-open.org
Subject: Re: [ws-tx] Issue 007 - WS-C: Make Register/RegisterResponse retriable





Peter Furniss wrote:

A failure to receive a register response could trigger a
completely new
register message with a new EPR (on the assumption a retry of
the first
attempt caused the already-registered fault to be returned). The only
problem I can see at present with this mechanism is that
manufacturing a
new EPR for the "same" participant may not be feasible in some
environments. However, that could be seen as an
implementation problem.
The advantage would be that no changes to the specification
are required
- other than a clarification of the text to call out this possibility.

With no change to the current texts, I don't see how you can get
already-registered unless the coordinator does an illegal EPR
comparison. (that is really part of 014 - whatever we specify as the
reaction, there needs to be a sound way of detecting duplicates - no
change is not an option).

I'm trying to consider the issues in isolation, but I'll admit that's
difficult ;-)

But apart from that (i.e. assume we have a duplicate detection means),
and back to
the conceptual point of this issue,  why specify that a coordinator
detecting that Register is for the same Participant as as one already
registered must fault with AlreadyRegistered ? Just assume that the
transport, or the sending implementation has caused the duplicate to
turn up, and reply with a RegisterResponse reflecting the Coordinator's
endpoint.

My intention was to point out that a solution is possible within the
scope of the current specification. Whether or not that solution is one
we wish to adopt, is the subject of this and other discussions, just as
the other proposed solutions have been.

In 95% of cases the EPR's will be unchanged.
If they have changed (which would only be because the endpoint owner
"wanted" to change it), the most recent SHOULD be used for sending by
the peer (not MUST because that would impose complications for some
persistence strategies).

The alternative of trying to make multiple registrations for

what is in

fact the same participant work would seem to cause considerable
complications. For atomic cases, the coordinator may not mind - it
just sees two (or more) registrations and they must both be committed

(or

rolledback). But Max's

"The participant

simply needs to behave correctly[1] by distinguishing its multiple
enlistments.

is very questionable, because it will receive two Prepare's

(say), both

delivered to the same EPR, but must reply to different coordinator
endpoints, one given on
the succesful RegisterResponse, one on the lost one. As in Alastair's
diagrams sent earlier today, it would have to use the

Reply-To EPR (in

which case, why not use that anyway and get rid of the

RegisterResponse

altogether) [this is completely impossible for coordination protocols

where the first message is participant to coordinator - see

Alastair's

diagram 3]

I agree all of this is possible and may be sub-optimal in certain
degenerate situations. However, when weighed against the timeline
imposed for getting WS-C through to standardisation, it may
be that the
"do nothing" approach I mentioned above is the best option.

Gosh, this has ended up rather long (and will probably now

cross with

other messages saying the same thing or rendering it out of date)

To be honest I don't have a hard stance on any solutions to
this issue
at the moment. My only concern is time spent so far and the fact that
there are other issues to work through that may be equally, or more,
contentious. I hope we can bring this to a conclusion (a vote) soon.

Well, we closed a quarter of the issues list yesterday, and this one is
related to at least two of the others, and the discussion has made good
progress. I think it's a little early to
be worrying about timescales.

I disagree that it is too early. Several of the companies on this list
have implementations that are already interoperable and, speaking as the
representative of one of them, we'd like to get reduce the amount of
time this TC takes to standardise.

Mark.

Follow-Ups:
- RE: [ws-tx] Issue 007 - WS-C: Make Register/RegisterResponse retriable
  - From: "Max Feingold" <Max.Feingold@microsoft.com>