OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

ws-tx message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [ws-tx] Issue 016: WS-C: ReplaceParticipant






Just to clarify one point:
Alastair wrote:
"if no coord protocol messages have flowed then a recovering participant
can re-register, to communicate a new address. If any coord protocol
messages have flowed then a recovering participant can replay, e.g. resend
Prepared, to communicate a new address."

In the case where coord protocols have started then the Participant should
resend Prepared, as you say. In the case where coord protocols have not yet
started but the participant has failed and had to be moved to a different
address then it is reasonable (certainly for short-duration activities) for
the activity as a whole to fail as a consequence of the original
participant being unavailable to respond to protocol messages. I think the
real question here is how to think about long-running activities, where
failure is more likely before the activity completion-agreement protocol
has started.
If the participant has a "stable" EPR then the problem does not occur
(certainly, there is then no need to "replace" it). But what does it mean
for an EPR to be "stable" over a long period? It might be tempting to
invent some new WS-Addressing terminology - e.g. a stable EPR is one whose
address remains coherent throughout the lifetime of the service it
references, etc - but I think this begins to stray beyond the scope of
WS-Tx and we should not do this. But it is certainly possible to build
interoperable WS-Addressing infrastructure which is fault tolerant. There
are many ways to achieve this, for example by only exposing (in the
wsa:Address) the logical address of the corporate gateway server that
typically sits in front of a Participant. Such gateways can afford to be
stateless and are typically highly available; such gateways are also part
of the environment that created the exported EPR and can be considered to
have knowledge of the structure of the exported EPR, including any
ReferenceParameters. If the server, behind the gateway, that hosts the
Participant state fails and the Participant is logically moved to another
server then it should not necessary to have to update the registration in
the external Coordinator. The routing can be a detail of the WS-Addressing
function in the gateway. Gateways may suffer outages too but they always
come back on-line at the same address (if you want to stay in business
:-)).
My point is to illustrate that there it is not a "requirement" to be able
to specify a mechanism to replace EPRs - that is just one proposed solution
to the requirement to be able to provide a fault tolerant solution for
long-running activities.

Regards,
Ian Robinson
STSM, WebSphere Messaging and Transactions Architect
IBM Hursley Lab, UK
ian_robinson@uk.ibm.com


                                                                           
             Alastair Green                                                
             <alastair.green@c                                             
             horeology.com>                                             To 
                                       Mark Little <mark.little@jboss.com> 
             14/12/2005 18:37                                           cc 
                                       marchadr@wellsfargo.com, Ian        
                                       Robinson/UK/IBM@IBMGB,              
                                       peter.furniss@choreology.com,       
                                       ws-tx@lists.oasis-open.org          
                                                                   Subject 
                                       Re: [ws-tx] Issue 016: WS-C:        
                                       ReplaceParticipant                  
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




Ian, Dan, Mark

I had missed the relatively clear statement in Section 9 of WS-AT that you
point out. It's not half-hidden. though it is very terse.

The notion that you MAY use a new one, but don't have to (e.g. you can
build up a battery of primary, secondary etc addresses, and try to failback
etc) seems right to me. Most likely you want to use the most recent, but
there is nothing to stop you using the old one (or to use the old one
first, and then to try the new one). I think the wording of the existing
text is too restrictive in its implications -- it doesn't make it clear
that this could be used to proactively redirect on recovery -- but it's
correct normatively.

I see no problem with this mechanism, so long as the retriable
Register/RegisterResponse is available with exactly the characteristics of
my revised proposed solution to 007.

Assuming that solution, if no coord protocol messages have flowed then a
recovering participant can re-register, to communicate a new address.

If any coord protocol messages have flowed then a recovering participant
can replay, e.g. resend Prepared, to communicate a new address.

If the other end is communicable, it will respond. This will happen as fast
and as effectively as a response to ReplaceParticipant -- not messier, but
neater, because there is no special new message. The style of using replay
of a real message to stimulate a recovery of the conversation is preferable
in my mind to having special messages saying "I'm recovering, where are we
up to?". (Cf. the question, why have a Replay message in WS-AT?)

Finally, on the placement of this stuff (Coord versus the referencing
specs).

The existing issues Peter and I have raised include moving all the general
statements about notifications and terminal notifications into
WS-Coordination and out of WS-AT, and then having WS-AT and WS-BA reference
them.

This should include this section: i.e we define that non-terminal
notifications which contain a replyTo can be responded to by a subsequent
message in the exchange using that EPR. This is incorporated by reference
when WS-AT or WS-BA say: the following messages are notifications in the
terms defined by WS-C, the following ones are terminal notifications in
those terms.

(If you wanted to harden this you could define base schema types which are
notifications and terminal notifications in WS-Coordination, and define all
coordination protocol messages as extensions of them in the referencing
spec schemas. The pros and cons of XMLery of this kind are not my
specialism, so I shall light that blue touchpaper and retire to a safe
distance.)

We could include a BTP-style Redirect, (i.e. the bilateral version of
Mark's proposed ReplaceParticipant), which becomes feasible if you have
participant and coordinator identification, but that seems heavy-handed.
The beauties of the current scheme are that it is self-identifying because
it uses the channel or link established by R/RR; that it requires no new
message; that it optimizes network traffic, and that it is a no-change
(other than perhaps minor editorial) resolution to this issue (assuming
necessary change on R/RR as discussed under 007).

Peter's point (that address replacement can lead to permanent loss of
connectivity if both sides just move and leave no "forwarding" address) is
very important: you want old addresses to be forwarding addresses, at
least. But that is a warning to implementers, not an enforceable normative
statement.

In sum I think we should ponder the existing wording to see if there is
anything normative that needs adding, and to consider whether the examples
and recommendations section should be a bit wider, to better surface and
explain the uses of this feature including the rationale behind Mark's
issue.

Alastair

Mark Little wrote:
      Yes, but that by itself does not help in the failure and recovery
      occurs before notification messages are exchanged. The replace
      message may help in that case, except that if the coordinator hasn't
      begun the coordination protocol, the response to replay may be
      nothing and in which case, we don't achieve much in the way of
      failure resiliency. Of course, the recovered participant could simply
      keep retrying replay until it triggered a response, as in the example
      Ian outlined, but that seems messy and inefficient to me.

      Mark.


      marchadr@wellsfargo.com wrote:

            Looks like this is already mentioned a bit in the WS-AT spec:

            "Notification messages are addressed by both coordinators and
            participants using the Endpoint
            References initially obtained during the
            Register-RegisterResponse exchange. If a wsa:ReplyTo header
            is present in a notification message it MAY be used by the
            recipient, for example in cases where a Coordinator or
            Participant has forgotten a transaction that is completed and
            needs to respond to a resent
            protocol message. Permanent loss of connectivity between a
            coordinator and a participant in an in-doubt
            state can result in data corruption."

            - Dan

            -----Original Message-----
            From: Marchant, Dan R. Sent: Wednesday, December 14, 2005 6:43
            AM
            To: ian_robinson@uk.ibm.com; alastair.green@choreology.com
            Cc: peter.furniss@choreology.com; ws-tx@lists.oasis-open.org
            Subject: RE: [ws-tx] Issue 016: WS-C: ReplaceParticipant


            +1 for using the ReplyTo.

            The replyTo could be an endpoint that virtualizes the specific
            endpoints within the EPR,
            creating a cleaner failover and recover scenario.

            My 2 cents,

            Dan


            -----Original Message-----
            From: Ian Robinson [mailto:ian_robinson@uk.ibm.com]
            Sent: Wednesday, December 14, 2005 6:15 AM
            To: Alastair Green
            Cc: Peter Furniss; ws-tx@lists.oasis-open.org
            Subject: Re: [ws-tx] Issue 016: WS-C: ReplaceParticipant






            As you say, section 9 of WS-AT deals with this situation. I
            believe the
            text is already appropriately worded. Essentially, the
            registered EPR is
            good until it isn't; if the registered EPR becomes "stale" in
            some way then
            the ReplyTo EPR is the means by which the EPR can be
            "refreshed". There is
            deliberately no requirement to replace the registered EPR with
            the ReplyTo
            EPR - this allows an implementatoin to log the registered EPR
            and to
            continue to use it throughout the transaction and across any
            failures.
            The following sequence illustrates how EPR replacement is
            supported:

            Participant A registers EPR Pa.
            Coordinator C1 sends Prepare to Pa and it responds Prepared.
            Participant A's environment suffers a disasterous failure and
            the
            participant is recovered at a different address.
            C1 tries to send commit to Pa but Pa is no longer addressable.
            C1 retries the commit.
            Meanwhile Pa is recovered at Pa' and resends Prepared to C1
            with Pa' as the
            ReplyTo MAP.
            C1, having determines that Pa is not responding, replaces Pa
            with Pa' and
            REsends commit (per the AT state table)
            The transaction proceeds to successful conclusion.


            Regards,
            Ian Robinson
            STSM, WebSphere Messaging and Transactions Architect
            IBM Hursley Lab, UK
            ian_robinson@uk.ibm.com


            Alastair Green
            <alastair.green@c
            horeology.com>                                             To
            Peter Furniss                                   13/12/2005
            19:04          <peter.furniss@choreology.com>
            cc
            ws-tx@lists.oasis-open.org
            Subject                                       Re: [ws-tx] Issue
            016: WS-C:
            ReplaceParticipant




            Mark,

            This is an interesting issue, and dovetails with a couple of
            questions on
            the Register/RegisterResponse per se.

            The first point is:  we need to make it clear when you have to
            stop
            retrying Register. You shouldn't send it if you've received
            RegisterResponse.

            If we make R/RR a standard one-way MEP, which I favour, then we
            can use the
            notification/terminal notification nomenclature to state this.

            Then we come to your address replacement issue per se.

            In BTP we ended up with a message, REDIRECT, which either the
            Superior
            (Coordinator) or Inferior (Participant) could send to the
            other, saying:
            this is entity Foo, please send my messages to this new
            address. To do this
            one needs an identity, so one can say: "I am Foo". If you have
            a
            Coordinator identifier and a Participant identifier, then this
            is easy.

            However, I think we already have this (bidirectional) feature
            in the WS-AT
            and WS-BA protocols in another form, albeit somewhat tucked
            away.

            In Section 9 on use of WS-A Headers, it is stated that a
            non-terminal
            notification has to have a reply-to address. I presume (there
            is no
            statement on this, and that needs fixing, for sure) that this
            field only
            makes sense if I am trying to redirect subsequent traffic. In
            other words,
            I send a standard message but qualify it with the added
            semantic: "I've
            moved". If the receivers sees this, I assume they should
            overwrite the old
            EPR they have, and continue as normal.

            Such an address replacement means that redirection is
            accomplished as a
            by-product of recovery-driven replay of messages, or because
            the load
            balancer has done a reshuffle -- it doesn't really matter why.

            This is neat, because it avoids having to communicate
            identifiers for
            redirection (they are still needed for the original register as
            per other
            discussions).

            Therefore, I believe that this issue could be resolved by
            supplementing and
            expanding the  WS-Coord  spec's statements on  MEPs, types of
            messages etc,
            with a statement that a non-terminal notification reply-to
            should supplant
            the previously held EPR for the next and subsequent messages in
            the
            conversation, and we're done.

            It is probably obvious, but I see no very good reason why
            redirection
            (address replacement) should be limited to the Participant end.


            Alastair


            Peter Furniss wrote:
                 This is hereby identified as ws-tx issue 016

                 Please follow up to this message or otherwise ensure your
            subject
                 line
                 starts "Issue 016 - "
                              (after any Re:, [ws-tx] etc)


                 Issue name -- WS-C: ReplaceParticipant

                 Owner: Mark Little [mailto:mark.little@jboss.com]

                 Target document and draft:

                 Protocol:  Coord

                 Artifact:  spec / schema

                 Draft:

                 Coord spec working draft uploaded 2005-12-02

                 Link to the document referenced:


            http://www.oasis-open.org/committees/download.php/15738/WS-Coordination

                 -
                 2005-11-22.pdf


                 Issue Type

                 Design

                 Issue Details

                 In order to coordinate long running interactions, it is
            necessary to
                 tolerate failures and recovery situations within the scope
            of an
                 activity (long running activity). Once a participant is
            registered
                 with
                 a coordinator,  the current specification implicitly
            mandates that
                 recovery requires it to come back up on the same EPR in
            order that
                 the
                 coordinator can subsequently drive it through whatever
            protocol is
                 used
                 (e.g., 2PC). However, recovery on the same EPR cannot be
            guaranteed
                 and
                 is at best an implementation choice. Failure to recover on
            the same
                 EPR
                 will ultimately lead to more coordinated activities
            terminating in a
                 failure state (e.g., aborting) because participants cannot
            be
                 reached,
                 even if they failed and recovered prior to the start of
            execution of
                 the

                 coordinator's protocol.

                 Proposed Resolution:

                 That we add a ReplaceParticipant operation that allows a
            registering
                 service to instruct the coordinator service to replace one
            EPR with
                 another EPR. Because EPRs are not currently comparable, a
            resolution
                 of
                 issue 7 or 14 is relevant to this issue.













[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]