ws-tx message

Subject: Re: [ws-tx] Issue 007 - WS-C: Make Register/RegisterResponse retriable

From: Alastair Green <alastair.green@choreology.com>
To: Mark Little <mark.little@jboss.com>
Date: Tue, 13 Dec 2005 16:10:42 +0000

Mark,

I feel we should be careful about this. I'm sorry, but I'm not smart enough to write you a short letter on this subject. Hopefully someone else will come up with a more succinct way of expressing what follows.

The only message exchanges defined to use reply-response and therefore to mandatorily use message id/relates-to in the current specs (WS-AT specifically) are CCC/CCCR, and R/RR. These exchanges are not currently designed to act in an idempotent manner. Indeed, a conformant January 2005 implementation must have responded to a duplicate Register with a Fault/AlreadyRegistered, i.e. behaved non-idempotently, and the issue of retry cannot therefore have arisen.

Any retry behaviour in a Jan 2005 implementation that relied upon id/relates-to correlation for any other exchange (e.g. the 2PC exchanges) cannot have been interoperable other than by accident or by excessive latitude on the part of its interlocutors.

The initiator of an exchange cannot be relied upon to supply message id, and the responder cannot be relied upon to specify the correlation via relates-to, because nothing in the WS-A or WS-TX specs mandates such elements to be present.

Indeed, and this goes to the root of the whole discussion, the only way that retriable exchanges were and are possible for e.g. 2PC exchanges, is that the retry was identified as new, or as duplicate (i.e. state-changing or non-state-changing) by virtue of the deemed identity of the message sender. And in WS-TX protocols, that identity is encoded in the EPRs exchanged during registration.

This is not a problem, because (given appropriate Register/RegisterResponse exchanges) identity comparison is not required for subsequent coordination protocol exchanges. Identification of state can be handled by mapping from an EPR to its associated state. The state machine governs duplicate processing, for any number of retries.

Example. Coordinator C for transaction T, Participant P.

C emits RS EPR in CoordinationContext. P stores RS EPR, keyed by context /Identifier (a unique id for T).

P generates C-to-P EPR (EPR for inbound-toParticipant coord protocol messages). P stores CP EPR, which is a unique identifier for the tuple {P, T}, i.e. this Participant for this Transaction, and ensures that the state of {P,T} is accessible, given the value of CP EPR.

P sends Register to RS EPR, embedding CP EPR.

C generates P-to-C EPR (for inbound-to-Coordinator coord protocol messages), and stores CP EPR, keyed by new PC EPR.

C sends RegisterResponse to P, embedding PC EPR. P stores PC-EPR, keyed by CP-EPR.

Coord protocol messages flow. Assume Prepare/Prepared (retriable exchange).

Prepare sent first time to CP EPR. P checks state for that EPR, i.e. for {P, T}, logs, changes state, looks up PC EPR and sends Prepared to PC EPR.

Prepared drops (is not received by C).

Retry: Prepare sent second time to CP EPR. P uses CP EPR to find state of {P, T}, finds it is Prepared, looks up PC EPR and resends Prepared to PC EPR.

You can flip this around with BA Exit/Exited, if you want to see initiator and responder roles reversed between C and P.

Generalizing, for Initiator I and Responder R (which must recognize duplicates).

I generates RI EPR, which identifies state of I for R, and stores IR EPR, keyed by its generated RI EPR.

R generates IR EPR, which identifies state of R for I, and stores RI EPR, keyed by its generated IR EPR

I sends one or more messages to IR EPR (address of R for this exchange)

R figures out if message n is a duplicate by checking state of R for I, identified by IR EPR, and then looks up RI EPR using IR EPR as key, to send reply to RI EPR

I figures out if message m is already processed by checking state of I for R, identified by RI EPR, etc etc.

Note that there is not a message id in sight.

[The whole of the scheme above, which I believe to be the spec author's design intent, depends on preventing duplicate registrations of P, i.e. C must never have two PC EPRs which it treats as distinct for one P. Back to how to prevent duplicate registrations, on which a separate post.]

If the spec authors' intent was to allow two separate parallel mechanisms for identifying actors in such retriable exchanges then they should have explicitly stated that every message has an id, and every reply has a relates-to=<that id>. This is not stated in WS-AT Section 9, and I therefore believe any implementation that relied upon that scheme was interoperable by accident or by private contract, but not by spec conformance.

Let us imagine that we do adopt the id/relates-to approach as a result of this discussion. We now have a situation where every exchange has an id. This id must be retained for all retries. This is at least contrary to the spirit of WS-A, if not provably illegal. The message is supposed to be uniquely identified by the id, as I read the WS-A spec:

[message id] : IRI (0..1): An absolute IRI that uniquely identifies the message. When present, it is the responsibility of the sender to ensure that each message is uniquely identified. The behavior of a receiver when receiving a message that contains the same [message id] as a previously received message is unconstrained by this specification.

This does not seem a good route to follow. Unintentional duplication (more than once delivery by the transport) is one thing; deliberately sending duplicates with the same message ids seems wrong-headed.

Finally, why have two ways of doing the same thing? No-one is suggesting we remove EPRs, and they provide a sufficient mechanism in all cases of retriable idempotent messages, except the Register/RegisterResponse exchange, which require additional identity to correctly form the bridge or channel. To interoperate two completely different ways cannot be good practice, and certainly complicates implementation and interoperability testing. This is the danger of defining conformance as equalling a 5 x 5 matrix "worked" for 5 implementations. If they all evolve to handle all variants then they can evolve away from the spec and the design intent. A non-conformant feature can spread by excessive tolerance.

Alastair

Mark Little wrote:

BTW, to your point of ease: the interop scenarios we had to do in January/February this year had many situations requiring timeouts and retries. I certainly can't say I canvased everyone present, but I didn't get the impression that that aspect was considered too much of an implementation headache.

Mark.

Christopher B Ferris wrote:

I'm not sure that using ws-a messageId is the easiest... it means that impls need to remember messageId
which can get onerous.

The WS-A WG avoided the issue of EPR equivalence mostly because of issues related to use of
EPRs to identify something. IMO, in that spirit, EPR comparison becomes one of comparing the
<Address> element which comes down to URI equivalence issues which can go in a number of
directions... the namespace URI approach (straight string comparison) or the approach which normalizes the URI
first before comparing.

Cheers,

Christopher Ferris
STSM, Emerging e-business Industry Architecture
email: chrisfer@us.ibm.com
blog: http://webpages.charter.net/chrisfer/blog.html
phone: +1 508 377 9295

Mark Little <mark.little@arjuna.com> wrote on 12/12/2005 03:20:16 PM:



> There are multiple ways of making the operation idempotent. Using WS-A
> semantics is one and IMO is probably the easiest way of doing it: it
> goes back to traditional Retained Results RPC mechanisms of the late
> 1980's, where idempotency was imposed at the comms level. If we try to
> do it higher up the stack, within the actual implementation, then we're
> going to have to address the issue of EPR comparisons: how can I ensure
> this is the same operation if I can't determine that the parameters are
> identical?
>
> So, I think we're agreed that it needs to be idempotent. But
> until/unless we address EPR comparisons, I think the WS-A retry route
> gets my vote.
>
> Mark.
>
>
> Christopher B Ferris wrote:
>






> >
> > That is one way, the other is to make the Register message idempotent.
> >
> > Seems to me that Register SHOULD be idempotent. It is much simpler to
> > simply process
> > the Register as if it had never been received... makes the
> > implementation of the client
> > a bit simpler.
> >
> > I also think that the "AlreadyRegistered" fault is probablematic. It
> > doesn't reflect
> > back the CoordinationProtocolService EPR that the RegisterResponse
> > message does.
> > So, from the perspective of the registrant, it ISN'T registered if it
> > doesn't receive the
> > RegisterResponse message since it doesn't know the
> > CoordinationProtocolService
> > EPR.
> >
> > From the perspective of the registration service, overlaying the
> > previous registered
> > EPR is effectively an idempotent operation, and the response can be
> > the same as if
> > it didn't have the registration beforehand.
> >
> > IMO, making the operation idempotent makes the implementation much
> > simpler and
> > more robust in the long run.
> >
> > Cheers,
> >
> > Christopher Ferris
> > STSM, Emerging e-business Industry Architecture
> > email: chrisfer@us.ibm.com
> > blog: http://webpages.charter.net/chrisfer/blog.html
> > phone: +1 508 377 9295
> >
> >
> > *Mark Little <mark.little@jboss.com>*
> >
> > 12/12/2005 11:36 AM
> >
> >    > > To
> >    ws-tx@lists.oasis-open.org
> > cc
> >    > > Subject
> >    Re: [ws-tx] Issue 007 - WS-C: Make Register/RegisterResponse






retriable



> >
> >
> >
> >    > >
> >
> >
> >
> >
> > Actually I'll retract this. As Kevin just reminded me, we're using
> > WS-Addressing anyway, so surely lost messages and retries can be coped
> > with at that level: using the same wsa:MessageID for example, should
> > sort this.
> >
> > Mark.
> >
> >
> >
> > Mark Little wrote:
> >






> > > I think this makes proposal makes sense.
> > >
> > > Mark.
> > >
> > >
> > > Peter Furniss wrote:
> > >






> > >> This is hereby declared to be ws-tx Issue 007.
> > >>
> > >> Please follow-up to this message or ensure the subject line starts






> > Issue






> > >> 007 - (ignoring Re:, [ws-tx] etc)
> > >>
> > >> The Related Issues list has been updated to show the issue numbers.
> > >>
> > >> Issue name -- WS-C: Make Register/RegisterResponse retriable
> > >>
> > >> Owner: Alastair Green [mailto:alastair.green@choreology.com]
> > >>
> > >> Target document and draft:
> > >>
> > >> Protocol: Coord
> > >>
> > >> Artifact: spec
> > >>
> > >> Draft: Coord spec working draft uploaded 2005-12-02
> > >>
> > >> Link to the document referenced:
> > >>
> > >>






> >






http://www.oasis-open.org/committees/download.php/15738/WS-Coordination-



> > >> 2005-11-22.pdf
> > >>
> > >> Section and PDF line number:
> > >>
> > >> WS-Coordination spec, Section 3.2 "Registration Service" l. 294
> > >>
> > >>
> > >> Issue type: Design
> > >>
> > >>
> > >> Related issues:
> > >>
> > >> Issue 008 - WS-C: Remove fault 4.6 AlreadyRegistered
> > >> Issue 014 - WS-C: EPR equality comparison is problematic Issue






009 -



> > >> WS-C/WS-AT: Is request-reply MEP useful?
> > >>
> > >>
> > >> Issue Description:
> > >>
> > >> Register/RegisterResponse should be retriable exchange
> > >>
> > >>
> > >> Issue Details:
> > >>
> > >> [This issue stems from Choreology Contribution issue TX-20.]
> > >>
> > >> Section 9 of WS-AT defines the WS-Coordination exchanges
> > >> > > >>     CreateCoordinationContext/CreateCoordinationContextResponse
> > >>     Register/RegisterResponse
> > >>
> > >> as request-reply exchanges.
> > >>
> > >> (Whether this request reply MEP should be used at all in the WS-TX
> > >> specs is addressed in a separate issue: see "Issue 009 -






WS-C/WS-AT:



> > >> Is request-reply MEP
> > >> useful?".)
> > >>
> > >> Substantively, it may be particularly misleading to think of the
> > >> Register/RegisterResponse
> > >> exchange as a request-reply pattern. The implication of using this
> > >> pattern is that there is a simple one message in, one message out
> > >> exchange. The presence of a fault
> > >> (AlreadyRegistered) as a potential response to Register hardens
> > >> that implication.
> > >>
> > >> Current behaviour would lead to service being informed it has






already



> > >> registered a
> > >> Participant, when it has in fact simply succeeded in registering a
> > >> Participant. Superficially, the
> > >> AlreadyRegistered fault could simply be
> > >> viewed as being unnecessarily verbose: the reaction of the






service to



> > >> the fault at run-time must be to treat
> > >> it as uninteresting, i.e. as equal in effect to a successful
> > >> registration.
> > >>
> > >> In fact there is a deeper problem. Consider the following scenario:
> > >>
> > >> A Coordination Service (CS) creates a Coordinator (C) for a new
> > >> atomic transaction (AT), and emits a CoordinationContext (CC).
> > >>
> > >> The CC is transmitted to an application service (AS). AS






(logically)



> > >> creates a P which sends Register (R) to the Registration






Service (RS)



> > >> EPR for AT, embedding the EPR for receipt
> > >> of protocol messages outbound from C to P (CP EPR).
> > >>
> > >> The RS, on receiving Register, creates an EPR for inbound protocol
> > >> messages from P to C (PC EPR), and embeds this in the
> > >> RegisterResponse (RR), which it sends to P.
> > >>
> > >> AS and P crash before the RR message is received by P, or the RR






> > message






> > >> drops and is never received by P. Either way, AS (on recovery,






or after



> > >> waiting) causes P to resends R to RS. RS examines the inbound






Register,



> > >> and determines that it has come from a known P (see "Related






Issues",



> > >> "WS-C: EPR equality comparison should
> > >> not be relied upon"), i.e. that it is a duplicate registration.
> > >>
> > >> Currently, RS replies with an AlreadyRegistered fault, sent to P. P
> > >> now knows that he is registered with C, but has never received






the PC



> > >> EPR (/RegisterResponse/CoordinationProtocolService element). Any
> > >> further retries of P send R to C will result in the same situation.
> > >>
> > >> C will never be able to receive messages from P. P will never






become



> > >> Prepared. The transaction will eventually collapse through timeout.
> > >>
> > >> Therefore, the Register/RegisterResponse exchange must tolerate
> > >> duplicates. If a Register message is delivered more than once






(either



> > >> by the transport, or through comms-failure- or recovery-induced
> > >> retry) then the Registration Service should respond on each






occasion



> > >> with a RegisterResponse containing the same PC EPR, to ensure
> > >> reliable completion of the EPR exchange that permits the subsequent
> > >> coordination protocol to operate correctly.
> > >>
> > >> NOTE.
> > >>
> > >> This change brings the R/RR exchange in line with the behaviour of
> > >> the CreateCoordinationContext/...Response
> > >> exchange. There is a difference. R/RR is likely to be






implemented as



> > >> a true idempotent operation. CCC/CCCR is
> > >> not: each CCCR embeds a new RS EPR, and a new /Context/Identifier.
> > >> But each exchange can be harmlessly
> > >> replayed indefinitely, in the event of failure to receive the
> > >> response message.
> > >>
> > >>
> > >> Proposed Resolution:
> > >>
> > >> Insert the following text in WS-Coordination spec, Section 3.2
> > >> "Registration Service" immediately following current l. 294
> > >>
> > >> "[New paragraph]The requester MAY send a Register message for a






given



> > >> Participant more than once, and the underlying transport could
> > >> deliver the Register message more than once.
> > >> On receipt of a Register message for a
> > >> given Participant, which has already been processed






succesfully, the



> > >> Registration Service MUST send to the
> > >> requester a RegisterResponse containing the same
> > >> CoordinationProtocolService element (Endpoint Reference for
> > >> Participant to Coordinator protocol messages) as that contained in
> > >> all previous RegisterResponses generated by
> > >> the Registration Service which relate to the Participant's






request to



> > >> register for this activity.
> > >> [New paragraph]"
> > >>
> > >>
> > >>
> > >> > > >>






> > >






> >

Follow-Ups:
- Re: [ws-tx] Issue 007 - WS-C: Make Register/RegisterResponse retriable
  - From: Mark Little <mark.little@jboss.com>

References:
- Re: [ws-tx] Issue 007 - WS-C: Make Register/RegisterResponse retriable
  - From: Christopher B Ferris <chrisfer@us.ibm.com>
- Re: [ws-tx] Issue 007 - WS-C: Make Register/RegisterResponse retriable
  - From: Mark Little <mark.little@jboss.com>