Mark,
I feel we should be careful about this. I'm sorry, but I'm not smart
enough to write you a short letter on this subject. Hopefully someone
else will come up with a more succinct way of expressing what follows.
The only message exchanges defined to use reply-response and therefore
to mandatorily use message id/relates-to in the current specs (WS-AT
specifically) are CCC/CCCR, and R/RR. These exchanges are not currently
designed to act in an idempotent manner. Indeed, a conformant January
2005 implementation must have responded to a duplicate Register with a
Fault/AlreadyRegistered, i.e. behaved non-idempotently, and the issue
of retry cannot therefore have arisen.
Any retry behaviour in a Jan 2005 implementation that relied upon
id/relates-to correlation for any other exchange (e.g. the 2PC
exchanges) cannot have been interoperable other than by accident or by
excessive latitude on the part of its interlocutors.
The initiator of an exchange cannot be relied upon to supply message
id, and the responder cannot be relied upon to specify the correlation
via relates-to, because nothing in the WS-A or WS-TX specs mandates
such elements to be present.
Indeed, and this goes to the root of the whole discussion, the only way
that retriable exchanges were and are possible for e.g. 2PC exchanges,
is that the retry was identified as new, or as duplicate (i.e.
state-changing or non-state-changing) by virtue of the deemed identity
of the message sender. And in WS-TX protocols, that identity is encoded
in the EPRs exchanged during registration.
This is not a problem, because (given appropriate
Register/RegisterResponse exchanges) identity comparison is not
required for subsequent coordination protocol exchanges. Identification
of state can be handled by mapping from an EPR to its associated state.
The state machine governs duplicate processing, for any number of
retries.
Example. Coordinator C for transaction T, Participant P.
C emits RS EPR in CoordinationContext. P stores RS EPR, keyed
by context /Identifier (a unique id for T).
P generates C-to-P EPR (EPR for inbound-toParticipant coord protocol
messages). P stores CP EPR, which is a unique identifier for the tuple
{P, T}, i.e. this Participant for this Transaction, and ensures that
the state of {P,T} is accessible, given the value of CP EPR.
P sends Register to RS EPR, embedding CP EPR.
C generates P-to-C EPR (for inbound-to-Coordinator coord protocol
messages), and stores CP EPR, keyed by new PC EPR.
C sends RegisterResponse to P, embedding PC EPR. P stores
PC-EPR, keyed by CP-EPR.
Coord protocol messages flow. Assume Prepare/Prepared (retriable
exchange).
Prepare sent first time to CP EPR. P checks state for that EPR,
i.e. for {P, T}, logs, changes state, looks up PC EPR and sends Prepared
to PC EPR.
Prepared drops (is not received by C).
Retry: Prepare sent second time to CP EPR. P uses CP EPR to
find state of {P, T}, finds it is Prepared, looks up PC EPR and resends
Prepared to PC EPR.
You can flip this around with BA Exit/Exited, if you want to
see initiator and responder roles reversed between C and P.
Generalizing, for Initiator I and Responder R (which must recognize
duplicates).
I generates RI EPR, which identifies state of I for R, and stores IR
EPR, keyed by its generated RI EPR.
R generates IR EPR, which identifies state of R for I, and stores RI
EPR, keyed by its generated IR EPR
I sends one or more messages to IR EPR (address of R for this exchange)
R figures out if message n is a duplicate by checking state of R for I,
identified by IR EPR, and then looks up RI EPR using IR EPR as key, to
send reply to RI EPR
I figures out if message m is already processed by checking state of I
for R, identified by RI EPR, etc etc.
Note that there is not a message id in sight.
[The whole of the scheme above, which I believe to be the spec author's
design intent, depends on preventing duplicate registrations of P, i.e.
C must never have two PC EPRs which it treats as distinct for one P.
Back to how to prevent duplicate registrations, on which a separate
post.]
If the spec authors' intent was to allow two separate parallel
mechanisms for identifying actors in such retriable exchanges then they
should have explicitly stated that every message has an id, and every
reply has a relates-to=<that id>. This is not stated in WS-AT
Section 9, and I therefore believe any implementation that relied upon
that scheme was interoperable by accident or by private contract, but
not by spec conformance.
Let us imagine that we do adopt the id/relates-to approach as a result
of this discussion. We now have a situation where every exchange has an
id. This id must be retained for all retries. This is at least contrary
to the spirit of WS-A, if not provably illegal. The message is supposed
to be uniquely identified by the id, as I read the WS-A spec:
- [message id] : IRI (0..1)
-
An absolute IRI that uniquely identifies the message. When
present, it is the responsibility of the sender to ensure that each
message is uniquely identified. The behavior of a receiver when
receiving a message that contains the same [message id] as a previously
received message is unconstrained by this specification.
This does not seem a good route to follow. Unintentional duplication
(more than once delivery by the transport) is one thing; deliberately
sending duplicates with the same message ids seems wrong-headed.
Finally, why have two ways of doing the same thing? No-one is
suggesting we remove EPRs, and they provide a sufficient mechanism in
all cases of retriable idempotent messages, except the
Register/RegisterResponse exchange, which require additional identity
to correctly form the bridge or channel. To interoperate two
completely different ways cannot be good practice, and certainly
complicates implementation and interoperability testing. This is the
danger of defining conformance as equalling a 5 x 5 matrix "worked" for
5 implementations. If they all evolve to handle all variants then they
can evolve away from the spec and the design intent. A non-conformant
feature can spread by excessive tolerance.
Alastair
Mark Little wrote:
BTW, to
your point of ease: the interop scenarios we had to do in
January/February this year had many situations requiring timeouts and
retries. I certainly can't say I canvased everyone present, but I
didn't get the impression that that aspect was considered too much of
an implementation headache.
Mark.
Christopher B Ferris wrote:
I'm not sure that using ws-a messageId is the easiest... it means that
impls need to remember messageId
which can get onerous.
The WS-A WG avoided the issue of EPR equivalence mostly because of
issues related to use of
EPRs to identify something. IMO, in that spirit, EPR comparison becomes
one of comparing the
<Address> element which comes down to URI equivalence issues
which can go in a number of
directions... the namespace URI approach (straight string comparison)
or the approach which normalizes the URI
first before comparing.
Cheers,
Christopher Ferris
STSM, Emerging e-business Industry Architecture
email: chrisfer@us.ibm.com
blog: http://webpages.charter.net/chrisfer/blog.html
phone: +1 508 377 9295
Mark Little <mark.little@arjuna.com> wrote on 12/12/2005 03:20:16
PM:
> There are multiple ways of
making the operation idempotent. Using WS-A
> semantics is one and IMO is probably the easiest way of doing it:
it
> goes back to traditional Retained Results RPC mechanisms of the
late
> 1980's, where idempotency was imposed at the comms level. If we
try to
> do it higher up the stack, within the actual implementation, then
we're
> going to have to address the issue of EPR comparisons: how can I
ensure
> this is the same operation if I can't determine that the
parameters are
> identical?
>
> So, I think we're agreed that it needs to be idempotent. But
> until/unless we address EPR comparisons, I think the WS-A retry
route
> gets my vote.
>
> Mark.
>
>
> Christopher B Ferris wrote:
>
> >
> > That is one way, the other is to make the Register message
idempotent.
> >
> > Seems to me that Register SHOULD be idempotent. It is much
simpler to
> > simply process
> > the Register as if it had never been received... makes the
> > implementation of the client
> > a bit simpler.
> >
> > I also think that the "AlreadyRegistered" fault is
probablematic. It
> > doesn't reflect
> > back the CoordinationProtocolService EPR that the
RegisterResponse
> > message does.
> > So, from the perspective of the registrant, it ISN'T
registered if it
> > doesn't receive the
> > RegisterResponse message since it doesn't know the
> > CoordinationProtocolService
> > EPR.
> >
> > From the perspective of the registration service, overlaying
the
> > previous registered
> > EPR is effectively an idempotent operation, and the response
can be
> > the same as if
> > it didn't have the registration beforehand.
> >
> > IMO, making the operation idempotent makes the implementation
much
> > simpler and
> > more robust in the long run.
> >
> > Cheers,
> >
> > Christopher Ferris
> > STSM, Emerging e-business Industry Architecture
> > email: chrisfer@us.ibm.com
> > blog: http://webpages.charter.net/chrisfer/blog.html
> > phone: +1 508 377 9295
> >
> >
> > *Mark Little <mark.little@jboss.com>*
> >
> > 12/12/2005 11:36 AM
> >
> > > > To
> > ws-tx@lists.oasis-open.org
> > cc
> > > > Subject
> > Re: [ws-tx] Issue 007 - WS-C: Make
Register/RegisterResponse
retriable
> >
> >
> >
> > > >
> >
> >
> >
> >
> > Actually I'll retract this. As Kevin just reminded me, we're
using
> > WS-Addressing anyway, so surely lost messages and retries can
be coped
> > with at that level: using the same wsa:MessageID for example,
should
> > sort this.
> >
> > Mark.
> >
> >
> >
> > Mark Little wrote:
> >
> > > I
think this makes proposal makes sense.
> > >
> > > Mark.
> > >
> > >
> > > Peter Furniss wrote:
> > >
> >
>> This is hereby declared to be ws-tx Issue 007.
> > >>
> > >> Please follow-up to this message or ensure the
subject line starts
> > Issue
> >
>> 007 - (ignoring Re:, [ws-tx] etc)
> > >>
> > >> The Related Issues list has been updated to show the
issue numbers.
> > >>
> > >> Issue name -- WS-C: Make Register/RegisterResponse
retriable
> > >>
> > >> Owner: Alastair Green
[mailto:alastair.green@choreology.com]
> > >>
> > >> Target document and draft:
> > >>
> > >> Protocol: Coord
> > >>
> > >> Artifact: spec
> > >>
> > >> Draft: Coord spec working draft uploaded 2005-12-02
> > >>
> > >> Link to the document referenced:
> > >>
> > >>
> >
http://www.oasis-open.org/committees/download.php/15738/WS-Coordination-
> >
>> 2005-11-22.pdf
> > >>
> > >> Section and PDF line number:
> > >>
> > >> WS-Coordination spec, Section 3.2 "Registration
Service" l. 294
> > >>
> > >>
> > >> Issue type: Design
> > >>
> > >>
> > >> Related issues:
> > >>
> > >> Issue 008 - WS-C: Remove fault 4.6 AlreadyRegistered
> > >> Issue 014 - WS-C: EPR equality comparison is
problematic Issue
009 -
> >
>> WS-C/WS-AT: Is request-reply MEP useful?
> > >>
> > >>
> > >> Issue Description:
> > >>
> > >> Register/RegisterResponse should be retriable
exchange
> > >>
> > >>
> > >> Issue Details:
> > >>
> > >> [This issue stems from Choreology Contribution issue
TX-20.]
> > >>
> > >> Section 9 of WS-AT defines the WS-Coordination
exchanges
> > >> > > >>
CreateCoordinationContext/CreateCoordinationContextResponse
> > >> Register/RegisterResponse
> > >>
> > >> as request-reply exchanges.
> > >>
> > >> (Whether this request reply MEP should be used at
all in the WS-TX
> > >> specs is addressed in a separate issue: see "Issue
009 -
WS-C/WS-AT:
> >
>> Is request-reply MEP
> > >> useful?".)
> > >>
> > >> Substantively, it may be particularly misleading to
think of the
> > >> Register/RegisterResponse
> > >> exchange as a request-reply pattern. The implication
of using this
> > >> pattern is that there is a simple one message in,
one message out
> > >> exchange. The presence of a fault
> > >> (AlreadyRegistered) as a potential response to
Register hardens
> > >> that implication.
> > >>
> > >> Current behaviour would lead to service being
informed it has
already
> >
>> registered a
> > >> Participant, when it has in fact simply succeeded in
registering a
> > >> Participant. Superficially, the
> > >> AlreadyRegistered fault could simply be
> > >> viewed as being unnecessarily verbose: the reaction
of the
service to
> >
>> the fault at run-time must be to treat
> > >> it as uninteresting, i.e. as equal in effect to a
successful
> > >> registration.
> > >>
> > >> In fact there is a deeper problem. Consider the
following scenario:
> > >>
> > >> A Coordination Service (CS) creates a Coordinator
(C) for a new
> > >> atomic transaction (AT), and emits a
CoordinationContext (CC).
> > >>
> > >> The CC is transmitted to an application service
(AS). AS
(logically)
> >
>> creates a P which sends Register (R) to the Registration
Service (RS)
> >
>> EPR for AT, embedding the EPR for receipt
> > >> of protocol messages outbound from C to P (CP EPR).
> > >>
> > >> The RS, on receiving Register, creates an EPR for
inbound protocol
> > >> messages from P to C (PC EPR), and embeds this in
the
> > >> RegisterResponse (RR), which it sends to P.
> > >>
> > >> AS and P crash before the RR message is received by
P, or the RR
> > message
> >
>> drops and is never received by P. Either way, AS (on recovery,
or after
> >
>> waiting) causes P to resends R to RS. RS examines the inbound
Register,
> >
>> and determines that it has come from a known P (see "Related
Issues",
> >
>> "WS-C: EPR equality comparison should
> > >> not be relied upon"), i.e. that it is a duplicate
registration.
> > >>
> > >> Currently, RS replies with an AlreadyRegistered
fault, sent to P. P
> > >> now knows that he is registered with C, but has
never received
the PC
> >
>> EPR (/RegisterResponse/CoordinationProtocolService element).
Any
> > >> further retries of P send R to C will result in the
same situation.
> > >>
> > >> C will never be able to receive messages from P. P
will never
become
> >
>> Prepared. The transaction will eventually collapse through
timeout.
> > >>
> > >> Therefore, the Register/RegisterResponse exchange
must tolerate
> > >> duplicates. If a Register message is delivered more
than once
(either
> >
>> by the transport, or through comms-failure- or
recovery-induced
> > >> retry) then the Registration Service should respond
on each
occasion
> >
>> with a RegisterResponse containing the same PC EPR, to ensure
> > >> reliable completion of the EPR exchange that permits
the subsequent
> > >> coordination protocol to operate correctly.
> > >>
> > >> NOTE.
> > >>
> > >> This change brings the R/RR exchange in line with
the behaviour of
> > >> the CreateCoordinationContext/...Response
> > >> exchange. There is a difference. R/RR is likely to
be
implemented as
> >
>> a true idempotent operation. CCC/CCCR is
> > >> not: each CCCR embeds a new RS EPR, and a new
/Context/Identifier.
> > >> But each exchange can be harmlessly
> > >> replayed indefinitely, in the event of failure to
receive the
> > >> response message.
> > >>
> > >>
> > >> Proposed Resolution:
> > >>
> > >> Insert the following text in WS-Coordination spec,
Section 3.2
> > >> "Registration Service" immediately following current
l. 294
> > >>
> > >> "[New paragraph]The requester MAY send a Register
message for a
given
> >
>> Participant more than once, and the underlying transport could
> > >> deliver the Register message more than once.
> > >> On receipt of a Register message for a
> > >> given Participant, which has already been processed
succesfully, the
> >
>> Registration Service MUST send to the
> > >> requester a RegisterResponse containing the same
> > >> CoordinationProtocolService element (Endpoint
Reference for
> > >> Participant to Coordinator protocol messages) as
that contained in
> > >> all previous RegisterResponses generated by
> > >> the Registration Service which relate to the
Participant's
request to
> >
>> register for this activity.
> > >> [New paragraph]"
> > >>
> > >>
> > >>
> > >> > > >>
> > >
> >
|