ws-tx message

Subject: Issue 014 - WS-C: EPR equality comparison is problematic
From: "Peter Furniss" <peter.furniss@choreology.com>
To: <ws-tx@lists.oasis-open.org>
Date: Sat, 10 Dec 2005 12:44:55 -0000
This is hereby declared to be ws-tx Issue 014.

Please follow-up to this message or ensure the subject line starts Issue
014 (ignoring Re:, [ws-tx] etc)

The Related Issues list has been updated to show the issue numbers.


Issue name :  WS-C: EPR equality comparison is problematic

Owner:  Peter Furniss 
 
Target document and draft:

Protocol:  Coord

Artifact:  spec / schema

Draft: Coord spec working draft uploaded 2005-12-02

Link to the document referenced:

http://www.oasis-open.org/committees/download.php/15738/WS-Coordination-
2005-11-22.pdf

WS-Coordination schema contributed by input authors, not yet uploaded to
Working Drafts folder

Section and PDF line number:

Coord: Section 4.6 "Already Registered", ll. 460-468


Issue type: Design


Related issues:

Issue 007 - WS-C: Make Register/RegisterResponse retriable
Issue 008 - WS-C: Remove fault 4.6 AlreadyRegistered


Issue Description:

EPR comparison to establish identity of participants is problematic. 
Additional mechanism required to identify Participants.


Issue Details:

[This issue stems from Choreology Contribution issues TX-15, TX-16 and 
TX-20.]

The latest, 17 August 2005, Candidate Recommendation version of 
WS-Addressing

    "Web Services Addressing Core -- 1.0", 
http://www.w3.org/TR/2005/CR-wsaddr-core-20050817

has a section 2.3 "Endpoint Reference Comparison", which reads:

"This specification provides no concept of endpoint identity and 
therefore does not provide any mechanism to
determine equality or inequality of EPRs and does not specify the 
consequences of their equality or
inequality. However, note that it is possible for other specifications 
to provide a comparison function that
is applicable within a limited scope."

Protocols using the WS-Coordination address/identity Register/Response 
exchange require that Coordinator and Participant information be capable

of unambiguous comparison by the receiving party.

EPR reference parameters are explicitly defined by WS-A as being opaque:

they are intended to make sense only
to the publisher of the EPR, and are available to be used by it for 
purposes such as routing when a message is
received which is directed to a previously published EPR.

The reference parameters of a coordination protocol EPR will contain 
information that the publisher can map
unambigously to a reference to a Coordinator object or to a reference to

a Participant object, or to the
parameters of operations on an object which lead it work for a 
particular transaction). It is quite likely
that this information is an invariant and unambiguous identity (for a 
given implementation) of a transaction,
but this is not guaranteed to be so, and reference parameter information

is not designed to be understood or
interpreted by EPR receivers, so reliance on byte-for-byte comparision 
(even if it may frequently work) is not
a reliable technique.

Currently, this is not a problem for Participants: the 
CoordinationContext that they receive contains
/CoordinationContext/Identifier, and this can be used to map the RS EPR
(/CoordinationContext/RegistrationService) to the transaction being 
worked upon. Potentially, this allows
different RS EPRs to represent a single transaction.

The same is not true for Coordinators. The Register they receive 
contains an EPR
/Register/ParticipantProtocolService which is an opaque value if it 
contains opaque reference parameters. It
is a requirement that the Coordinator be able to establish that two or 
more Register messages actually refer
to the same Participant. However, there is no guarantee, for example, 
that the Participant EPR embedded in
Register will remain stable across repeated attempted registrations.

Scenario:

[This scenario is closely related to the one used in related issue 
"WS-C: Make Register/RegisterResponse
retriable". Note that any resolution of that issue will require prior 
resolution of this issue: the ability to
correctly detect duplicate registrations is a prequisite.]

A Coordination Service (CS) creates a Coordinator (C) for a new business

activity (BA), and emits a
CoordinationContext (CC).

The CC is transmitted to an application service (AS). AS (logically) 
creates a P which sends Register (R) to
the Registration Service (RS) EPR for BA, embedding the EPR for receipt 
of protocol messages outbound from C
to P (CP EPR).

The RS, on receiving Register, creates an EPR for inbound protocol 
messages from P to C (PC EPR), and embeds
this in the RegisterResponse (RR), which it sends to P.  

AS and P crash before the RR message is received by P. The AS on 
recovery causes P to resend R to RS. RS
examines the inbound Register, and seeks to determine that it has come 
from a known P, i.e. that it is a
duplicate registration.

The CP EPR (/Register/ParticipantProtocolService) has changed. The 
reference parameters denote that the
recovered application is a different instance of the application service

(e.g. load balancing, cluster), and
all of its Participants are similarly repositioned in terms of their 
full address. (Of course, for this to
work the old address must be capable of redirecting to the recovered 
Participant.)

If the RS uses a simple EPR comparision (byte stream against byte stream

for the reference parameters) then it
will conclude that the second Register relates to a different 
Participant than the first (pre-crash) Register.

It will generate a new, different RegisterResponse, containing a new, 
unique PC EPR
(/RegisterResponse/CoordinatorProtocolService) so that it can 
differentiate subsequent protocol messages sent
from "P1" to C from those sent from P2 to C.

Of course, P1 never received the first RR, and does not in fact exist as

a separate entity - its address is a
synonym for the address of P2. If the BA is AtomicOutcome it will expect

all registered Participants to go
Completed. This will either occur because the Participant sends 
Completed (Participant Completion) or because
the Coordinator sent Complete (Coordinator Completion). In the first 
case, the P1 Completed will never arrive,
and the activity will ultimately expire in the Active state. In the 
second case the Complete will be directed
twice to P2 (once to "P1" and once to P2), and the Coordinator is liable

to receive Completed twice from P2,
but never from P1. In either case the activity will end up being
ditched.

If the BA is MixedOutcome, it may be able to tolerate missing or 
unwilling Participants. Its controlling
application may have a business rule that says that P1 is not vital to a

successful outcome. Or it may know
that two registrations from one business service is an unexpected 
situation. However, this raises a second,
and related problem. How does the controlling application know what P1 
represents? How does it correlate P1
and P2 against an initial application request to the AS, that carried 
the CC in the first place? It has no way
of knowing that P1 and P2 are intended to represent one and the same P. 
It cannot detect the duplication at
the application level, even though such duplication may immediately 
violate a business rule.

This latter problem also arises in non-pathological cases. An 
application which has created a MixedOutcome BA
may send out contexts to three AS: AS-Car, AS-Hotel, AS-Plane. The 
application response for each must contain
an identifier which a) different for each of the responses for -Car, 
-Plane and -Hotel, and b) which
will match some value in the Register message, such that the controlling

application can correlate the
registered participant with the relevant application response.

If the app-response Plane-Reservation contains an IRI element with value

IRI[plane], and the Register contains
a distinguishable element with the same value, then the controlling 
application can use some API to say:
BA.close(IRI[Plane]). If the same pattern applies to Car-Reservation and

Hotel-Reservation, it can also say,
for example: BA.close(IRI[Hotel]), but BA.compensate(IRI[Car]) because 
Ł500 for a limo is no good to the user.
These instructions can be mapped by the BA Coordinator into "send Close 
to the P EPR keyed by IRI[Plane]",
"send Close to the P EPR keyed by IRI[Hotel]" and "send Compensate to 
the P EPR keyed by IRI[Car]".

[IRI seems to be the correct type for such identities, as they are 
intimately tied to application behaviour
(and may be coined by the application service rather than the system), 
and because they must be guaranteed to
be unique, interoperably.]

For all of the above to work, it is necessary for the RS, on receiving 
the Register, to be able to distinguish
the IRI which correlates the Register (thence the Participant), and to 
pluck it out for use as a key for
direct access into a collection of Participant EPRs. But, to go full 
circle, if the Register only contains an
EPR which is differentiated by opaque reference parameters, it cannot 
properly use those ref param values as
part of the key. Once again, they may change over retries. Or the 
Participant implementation may be perverse,
and may emit EPRs which compare differently by reference parameter 
values, but which in fact map to a single
Participant. The introduction of a time-related component in the EPR 
reference parameters (perhaps to help
with auditing) is a conceivable non-perverse variant of this problem.

To avoid these problems we need the ability to specify a distinguished 
non-opaque identifier for each
Participant across multiple Participants.


Proposed Resolution:

There are three possible resolutions that come to mind.

One is to allow a brand-new IRI element in Register, e.g. 
/Register/Identifier, which mirrors the /CoordinationContext/Identifier.

The other is to define an extension IRI element, for WS-Coordination, 
that can be added into the RS EPR
(/CoordinationContext/RegistrationService), to identify the Coordinator;

and into the C-P EPR (the
/Register/ParticipantProtocolService) to identify the Participant. This 
would be permitted, as we see it, by
the WS-Addressing statement quoted earlier:

    "However, note that it is possible for other specifications to 
provide a comparison function that is
    applicable within a limited scope."

The third is to ask WS-Addressing to provide a standard comparable 
element in the EPR. This seems extremely
unlikely, as the move to Candidate Recommendation from Submisssion 
removed the capacity to compare EPRs.
Follow-Ups:
- Re: [ws-tx] Issue 014 - WS-C: EPR equality comparison is problematic
  - From: Ian Robinson <ian_robinson@uk.ibm.com>