ws-tx message

Subject: Re: [ws-tx] Issue 030 - Proposal 2 silence on WS-A faults

From: Alastair Green <alastair.green@choreology.com>
To: Bob Freund-Hitachi <bob.freund@hitachisoftware.com>
Date: Tue, 25 Apr 2006 08:36:10 +0100

Bob,

My motivations are: "least work", "simplicity" and "uniformity" (and not "malice"). Use of WS-Addressing what WS-TX needs, and no more. Avoid needless variations in its use.

We could "dump it" but we would have to reinvent work done by WS-Addressing. We would cause implementers to evict running code from existing implementations, if they are using WS-A toolkits. Use of WS-A makes management marginally easier. That's why I favour using [source endpoint] rather than [ws-tx invented endpoint], a choice which is incorporated in Proposal 2.

What do we need from WS-Addressing?

Every single exchange of all three WS-TX protocols (WS-C, WS_AT, WS-BA) can be successfully executed if the following two things are done:

1. Register and RegisterResponse contain "future exchange endpoint references". (This is already true.)
2. Every message contains a [source endpoint]. (Proposal 2 goes part the way there.)

Endpoint exchange during registration creates the apparatus required for correlation. Use of message ids is unnecessary (therefore inefficient), and (in the case of some WS-A implementations) might cause garbage collection issues.

WS-A fault delivery is not required for a successful WS-TX implementation. All the malformation or receiver routing faults are expressions of non-conformance (logic errors), which do not need run-time expression. They will be ironed out of implementations in test and use. The only interesting ones are Destination Unavailable/Endpoint Unreachable/permanent. In the WS-TX context, the likelihood of these faults arising (which requires a concatenation of extremely improbable events) is near zero. Even if they do, given the retriable nature of the WS-TX protocols, the reporting of these faults adds vanishingly little or nothing to the responsibilities and functional richness of TX implementations.

The exception is Endpoint Unreachable/transient. It is useful to communicate the semantic: "Not yet. Come back in half an hour and I will be ready for you". This is a feature that BTP included in the 1.1 revision in November 2004, to reduce network chatter in long-running transactions. However, this is a semantic that has to be communicated and understood at the TX level: TX retry strategies cannot always be depressed to a lower layer, and I believe a new TX message (so-called "fault") should be created to convey it.

The TX "protocol faults" are really messages that express unusual but legitimate paths of conformant execution (or at least, that is the class of message that we must be able to convey). Examples include any message that conveys incapacity to process because of receiver state (which could include security breaches, resource limitations, desynchronized state shifts). These are not expressions of bugs, they are first-class protocol messages. The term "fault" is a misnomer, as I think Tom Rutt has pointed out.

Nothing in Proposal 2 or in what I have laid out here demands or assumes HTTP or TCP/IP. The ordered conversation/endless retry model which underlies WS-TX protocols allows them to operate over highly unreliable protocols where messages may be lost, misordered, duplicated, and where no notification of receipt or processing is received. If acks are available then they can be exploited, but they are not required.

WS-TX does not assume permanently available endpoints. Quite to the contrary: it tolerates (repeated) failures and recoveries. It cannot work correctly if a failure is permanently unavailable. "Permanent" requires a time-frame to be defined pragmatically, but that is a run-time configuration issue. In principle a WS-AT or WS-BA transaction could last for a century and the endpoints could disappear for half a century in the middle of that period, but the transaction would still be viable. (We would want the "don't bother me again for a while" semantic in that case, and we would want to deliver that, not as a fault, but as a spontaneous warning in the event of planned outage.) I've seen a use case for a transaction that spans the lifetime of a security (bond) with periodic payments (coupons) that would perdure for thirty years, but only do work every year in that time.

I agree that WS-Addressing has no template for "one way message" and "request reply". It has optional properties and rules for their use, and these need to be referenced concretely and specifically. But WS-TX does need one-way messages (and it only needs one-way messages).

An amendment to Proposal 2 that would allow an implementation to switch on WS-A fault reporting (that might be useful for testing and fault diagnosis) would be to state that if [fault endpoint] and [message id] are present, then the receiver SHOULD send WS-A faults as correlated replies. This is unenforceable, but a good implementation would want to be helpful.

The WS-Coordination Invalid X "faults" will have to be sent in the same manner as other "protocol faults" (really, just WS-TX messages) when used with WS-AT and WS-BA. This requires either that we define two ways that they can be communicated (using WS-A fault formulation rules with WS-C, and using other rules (send to [source endpoint] or cached endpoint) with WS-AT/BA, or that we abandon "request-reply" for WS-C, treating WS-C exchanges as appositions of one-way messages (as we do in the analogous AT Completion protocol exchanges).

In summary, the bits of WS-Addressing that WS-TX needs are:

1) the endpoint reference type, and
2) the abstract notion and concrete representation of [source endpoint], being a reply address which has no WS-A-level processing rules attached to it.

The WS-A [action] is not strictly necessary (it is duplicated in the message bodies) but will help leverage WS-A infrastructure.

I see no reason to use more of WS-A than this, which is why I think Proposal 2, or some close variant, is the right way to ensure that WS-TX messages are correctly targeted.

Alastair

Bob Freund-Hitachi wrote:

In that case, you might consider removing all references to
ws-addressing, since you seem not to want to deal with it. I am curious
as to what the motivation for dumping it might be.

Does ws-tx presume that it will always be operating in a soap bound to
http environment?
Will all endpoints be addressable at all times?
Will all transports supported by the spec allow implicit success/failure
status transmission to the sender?  HTTP provides a backchannel
mechanism (the equivalent to anonymous replyto) but some don't.
I get the distinct feeling that folks are imagining an environment where
the medium is raw (or shall I say unconstrained) tcp/ip.

I often hear use of the phrase "one-way" message, and occasionally its
definition is cited as being contained within Ws-addressing (which does
no such thing). At the moment I have no idea exactly what is meant by a
"one-way" message since currently, no bindings that describe it
normatively exist.

Can these one-way messages generate a fault that might be communicated
to the sender or perhaps to someplace else?  

I do not see the distinction that is being made between "infrastructure"
faults and protocol faults.  If a fault of either sort needs to go
somewhere, is that somewhere addressable at the time the fault is
generated or will the soap/http implicit backchannel have gone away
since the fault managed to happen some time after "200/202/204" and your
one chance of returning a single http response entity (as per HTTP 1.1.

If I were to imagine a soap over scsi environment, there is NO WAY to
send a non-encoded fault message unless you retain the id of the
initiator and establish some sort of correlation mechanism to the
message that caused the fault.

I think that the happy implementer may be finding that he has a bit of
infrastructure to replicate.

I have no trouble with tossing the stuff that ws-addressing provides,
provided that it is done with malice and forethought.

At the moment, I am completely baffled over the motivation behind
proposal 2.
Thanks
-bob

-----Original Message-----
From: Alastair Green [mailto:alastair.green@choreology.com] 
Sent: Friday, April 21, 2006 8:21 AM
To: Ian Robinson
Cc: ws-tx@lists.oasis-open.org
Subject: Re: [ws-tx] Issue 030 - Proposal 2 silence on WS-A faults

Leaving the cheerful implementer free to never generate [fault endpoint]

and [message id], and always to ignore them. Good stuff.

Alastair

Ian Robinson wrote:

Alastair,
Yes, these 4 points all follow - as you have stated them - from our
Proposal 2.

Regards,
Ian Robinson

             Alastair Green

             <alastair.green@c

             horeology.com>

To

                                       Ian Robinson/UK/IBM@IBMGB

             21/04/2006 09:54

cc

                                       ws-tx@lists.oasis-open.org

Subject

                                       [ws-tx] Issue 030 - Proposal 2

                                       silence on WS-A faults



Ian,

Further to yesterday's call, I want to make sure of my understanding

of the

deliberate "silence on infrastructure faults" in your Proposal 2.

1. Sender of a notification message may set values for [fault

endpoint],

[message id], neither, or both.

2. [fault endpoint]'s value, if present, is of no concern to WS-TX at

all,

and can therefore be set to none, anon, or a "real address" at the

sender's

will.

3. Receiver of a notification message may send WS-A faults to the

fault

endpoint using [relationship]; may send to the anon endpoint if anon
specified as [fault endpoint] value, may choose to refuse to send a

WS-A

fault at will, may be unable to send a WS-A fault through lack of

property

values needed to follow the fault-formulation rules in WS-A SOAP
Binding/Core (absence of either of [fault endpoint] or [message id]

has

this disabling characteristic)..

4. Under no circumstances is sender of "protocol messages" (including

e.g.

InvalidState) to ever use or pay attention to the value of [fault
endpoint]: it can only use cached EPR or [source endpoint].

Is this a correct summary of the inferences that you and Max intended

to be

drawn from silence in this circumstance?

Thanks,

Alastair


Alastair Green wrote:
      Ian,

      In the document on this issue that I submitted just after the

last

      meeting, I raised four possible solutions:

http://www.oasis-open.org/apps/org/workgroup/ws-tx/download.php/17588/20
06-04-07.WS-Addressing.and.WS-TX.doc

      Your proposal 2 is very close to my Option 2 (Minimal Use of

WS-A).

      This is the cleanest and best approach, in my view.

      My Option 3 somewhat resembles your proposal 1, but avoids

active

      (non-none) use of [reply endpoint]. I believe that active use of
      [reply endpoint] has always been a source of confusion, and

should be

      avoided in any resolution.

      Your Proposal 1 is probably closest to my Option 4, but deftly

avoids

      the MUST use of a [reply endpoint]. I raised Options 1 and 4 as
      "strawmen" to elucidate the spectrum.

      * * *

      Your Proposal 2, while very close to my Option 2, does not fully

deal

      with all the points that must be tackled.

      My Option 2 bullet points were:


      2.A)      Use either WS-A [source endpoint] or a WS-TX [ws-tx

amnesia

      endpoint] for non-terminal messages


         2.B)      Do not mandate (but tolerate) presence of [fault
         endpoint] and [message id] on any  message. Or, ban use of

these

         two properties. Or mandate that they must be ignored if

received.

      2.C)      Treat WS-TX faults as terminal notifications, which

can

      always be delivered, either to cached EPR or to supplied amnesia
      address. WS-A fault delivery rules (part of reply-processing

model)

      do not apply.


      2.D)      Set [reply endpoint] to "none", to avoid dragging in

"anon"

      default. This is necessary because infrastructure fault delivery
      might pick up on an anon value in some circumstances.


         2.E)      Incorporate a statement in the spec making it clear

that

         the reply-processing model of WS-A is not being used. If we

choose

         to process [fault endpoint] and [message id] if supplied by

the

         sender, then Section 3.4 reply-formulation rules may apply to
         faults, and that should be explained.


      2.F)      Treat WS-A predefined (infrastructure) faults as
      undeliverable (or potentially undeliverable), because


            i.        [fault endpoint] will or may be omitted


            ii.      [reply endpoint] is set to none to avoid use of

anon,

            which is forbidden


            iii.    WS-A does not send faults when [fault endpoint] is
            absent, and [reply endpoint] is set to "none"


            iv.      [ws-tx amnesia endpoint] is unknowable to
            infrastructure (layer violation)



      I believe that your proposal 2 does not yet address bullet

points

      2.B), 2.E) and 2.F).

      * * *

      If we are not going down the Option 2/Proposal 2 route, then, in

my

      view, Option 3 is preferable to your proposal 1 in a couple of
      respects.

      My Option 3 bullet points are repeated here:

      3.A)      Use either WS-A [source endpoint] or a WS-TX [ws-tx

amnesia

      endpoint] for non-terminal messages


         3.B)      Mandate presence of [fault endpoint] and [message

id] on

         all messages


      3.C)      Treat WS-TX faults as WS-A faults. WS-A fault delivery
      rules (part of reply-processing model) do apply. All faults are
      always deliverable, because of B).


         3.D)      Set [reply endpoint] to "none", to avoid dragging

in

         "anon" default. This is strictly unnecessary because the

receiver

         will never use the [reply endpoint], but it does help make it
         clear that [reply endpoint] is not part of the picture, and

that

         the "anon" endpoint will never be used.


         3.E)      Incorporate a statement in the spec making it clear

that

         the reply-processing model of WS-A is not being used, other

than

         for faults


      I believe we should avoid the tangle with [reply endpoint]
      altogether: the combination of [source endpoint] and [fault

endpoint]

      properly differentiates the two models for two kinds of

messages.

      It is not made clear that all messages must have message ids.

They

      must, to apply reply-formulation rules for faults, and this

should be

      clearly said. (Equally, if your proposal 1 is adopted, it is
      impossible to follow the reply-formulation rules for "amnesia"

unless

      [relationship] is used, which requires [message id].) If you

omit

      message id then you can legally create an undeliverable response
      which seems unnecessary.

      * * *

      My point 3.B) does not take account of the possibility of a

[fault

      endpoint] = "none". Your proposal 1 does not address the

possibility

      of a [reply endpoint] = "none" in the amnesia case. Can we not
      mandate that [fault/reply/souce endpoints] are non-anon,

non-none

      unless specifically stated otherwise (e.g. to  switch off [reply
      endpoint])? I think we may be in danger of losing an aspect of

the

      original, intended content of the term "physical address" (i.e.

      repliable, usable address, not a null value, nor an anon).

      ***

      Independently of the option chosen, and in line with dropping

the

      term "physical address", the WS-Addressing spec definitions of
      "request-reply" or "one-way" do not exactly line up with what

WS-TX

      is up to. One-way is defined as "no indication of future
      interactions", and that is not true of our messages.

"Request-reply"

      is rather loosely, or flexibly, defined, and it would be hard to
      argue that some of the behaviours we have fall cleanly outside

the

      scope of that term as described in WS-A. The point here is that

we

      use the WS-A properties in a complex and partial way, to

describe a

      bilateral conversation. References to the terms "one way" and
      "request-reply" could simply be avoided in favour of concrete
      descriptions of how WS-A properties are actually used, and

direct

      reference to use of EPRs (WS-A Core 3.3) and reply-formulation

(WS-A

      Core 3.4).

      This is most significant in WS-Coordination, where greater
      explicitness than you suggest would be appropriate (specify that
      [reply endpoint] and [message id] must be present on request
      messages, and that 3.4 should apply to all responses, fault or
      otherwise).

      ***

      I also suggested a procedure for triage of the various

sub-points,

      which I still think would enable the discussion to effectively
      proceed from primary to secondary points in a clear way:


      1. Which option? My Option 2/Your Proposal 2 (which have same

broad

      thrust)
                                  My Option 3
                                  Your Proposal 1
                                 [any other proposals raised]


      [If we want to make this simple procedurally, then I would

suggest

      that we vote first on a motion to adopt the thrust of my Option
      2/your Proposal 2. If that wins then the rest can fall away.

That is

      the big fault line, if you will pardon the pun.]


      2. If Option 2 (Your proposal 2) selected:


            a) [source endpoint] or [ws-tx amnesia endpoint]?


            b) Permit and optionally process [fault endpoint] +

[message

            id] if supplied. OR


            Permit and forcibly process [fault endpoint] + [message

id] if

            supplied, OR


            Pemit but ignore [fault endpoint] and [message id] if

supplied.

            OR


            Ban [fault endpoint] and [message id]?


      3. If Option 3 selected:


            a) [source endpoint] or [ws-tx amnesia endpoint]?


            b) Do we set [reply endpoint] to "none", or allow it to

default

            to "anon"?


      4. Remove wording on "physical addresses", replace with ban on
      "anon"? [mandate non-none values for [source/fault endpoint]?


      [Remove refs to one-way or request-reply?]


      5. Revisit use of reply-processing model in WS-C?



      Yours,


      Alastair





      Ian Robinson wrote:


            Max and I have been working on some options for resolving

issue

            030 [1].
            There has been a lot of good discussion on this issue

already;

            we have
            suggested 2 (different) concrete resolutions that we can
            discuss on the
            call.
            Proposal 1 is closer to the status quo; it retains the use

of

            the
            wsa:ReplyTo MAP for non-terminal notifications but adds a
            requirement for
            terminal notifications to set wsa:ReplyTo to 'None'.

            Proposal 2 replaces wsa:ReplyTo with wsa:From to further
            emphasize that
            protocol message are never replies. This proposal also
            classifies WS-TX
            "faults" raised during the agreement protocols (e.g. 2PC)

as

            terminal
            notification messages.

            Proposal 1 (Issue30_Propsal_1_WSAT.doc)   (See attached

file:

            Issue30_Proposal_1__WSAT.doc)


            Proposal 2 (Issue30_Propsal_2_WSAT.doc)   (See attached

file:

            Issue30_Proposal_2__WSAT.doc)

            For either of these proposals, we believe WS-Coordination
            simply needs to
            remove text that is already stated in WS-Addressing:

            (See attached file: Issue30_Proposal__WSCOOR.doc)

            [1]

http://docs.oasis-open.org/ws-tx/issues/WSTransactionIssues.xml#i030

            Regards,
            Ian Robinson
            STSM, WebSphere Messaging and Transactions Architect
            IBM Hursley Lab, UK
            ian_robinson@uk.ibm.com

Follow-Ups:
- Re: [ws-tx] Issue 030 - Proposal 2 silence on WS-A faults
  - From: Mark Little <mark.little@jboss.com>

References:
- RE: [ws-tx] Issue 030 - Proposal 2 silence on WS-A faults
  - From: "Bob Freund-Hitachi" <bob.freund@hitachisoftware.com>