OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

ws-tx message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [ws-tx] Issue 016: WS-C: ReplaceParticipant


Ian, Dan, Mark

I had missed the relatively clear statement in Section 9 of WS-AT that you point out. It's not half-hidden. though it is very terse.

The notion that you MAY use a new one, but don't have to (e.g. you can build up a battery of primary, secondary etc addresses, and try to failback etc) seems right to me. Most likely you want to use the most recent, but there is nothing to stop you using the old one (or to use the old one first, and then to try the new one). I think the wording of the existing text is too restrictive in its implications -- it doesn't make it clear that this could be used to proactively redirect on recovery -- but it's correct normatively.

I see no problem with this mechanism, so long as the retriable Register/RegisterResponse is available with exactly the characteristics of my revised proposed solution to 007.

Assuming that solution, if no coord protocol messages have flowed then a recovering participant can re-register, to communicate a new address.

If any coord protocol messages have flowed then a recovering participant can replay, e.g. resend Prepared, to communicate a new address.

If the other end is communicable, it will respond. This will happen as fast and as effectively as a response to ReplaceParticipant -- not messier, but neater, because there is no special new message. The style of using replay of a real message to stimulate a recovery of the conversation is preferable in my mind to having special messages saying "I'm recovering, where are we up to?". (Cf. the question, why have a Replay message in WS-AT?)

Finally, on the placement of this stuff (Coord versus the referencing specs).

The existing issues Peter and I have raised include moving all the general statements about notifications and terminal notifications into WS-Coordination and out of WS-AT, and then having WS-AT and WS-BA reference them.

This should include this section: i.e we define that non-terminal notifications which contain a replyTo can be responded to by a subsequent message in the exchange using that EPR. This is incorporated by reference when WS-AT or WS-BA say: the following messages are notifications in the terms defined by WS-C, the following ones are terminal notifications in those terms.

(If you wanted to harden this you could define base schema types which are notifications and terminal notifications in WS-Coordination, and define all coordination protocol messages as extensions of them in the referencing spec schemas. The pros and cons of XMLery of this kind are not my specialism, so I shall light that blue touchpaper and retire to a safe distance.)

We could include a BTP-style Redirect, (i.e. the bilateral version of Mark's proposed ReplaceParticipant), which becomes feasible if you have participant and coordinator identification, but that seems heavy-handed. The beauties of the current scheme are that it is self-identifying because it uses the channel or link established by R/RR; that it requires no new message; that it optimizes network traffic, and that it is a no-change (other than perhaps minor editorial) resolution to this issue (assuming necessary change on R/RR as discussed under 007).

Peter's point (that address replacement can lead to permanent loss of connectivity if both sides just move and leave no "forwarding" address) is very important: you want old addresses to be forwarding addresses, at least. But that is a warning to implementers, not an enforceable normative statement.

In sum I think we should ponder the existing wording to see if there is anything normative that needs adding, and to consider whether the examples and recommendations section should be a bit wider, to better surface and explain the uses of this feature including the rationale behind Mark's issue.

Alastair

Mark Little wrote:
Yes, but that by itself does not help in the failure and recovery occurs before notification messages are exchanged. The replace message may help in that case, except that if the coordinator hasn't begun the coordination protocol, the response to replay may be nothing and in which case, we don't achieve much in the way of failure resiliency. Of course, the recovered participant could simply keep retrying replay until it triggered a response, as in the example Ian outlined, but that seems messy and inefficient to me.

Mark.


marchadr@wellsfargo.com wrote:

Looks like this is already mentioned a bit in the WS-AT spec:

"Notification messages are addressed by both coordinators and participants using the Endpoint
References initially obtained during the Register-RegisterResponse exchange. If a wsa:ReplyTo header
is present in a notification message it MAY be used by the recipient, for example in cases where a Coordinator or Participant has forgotten a transaction that is completed and needs to respond to a resent
protocol message. Permanent loss of connectivity between a coordinator and a participant in an in-doubt
state can result in data corruption."

- Dan

-----Original Message-----
From: Marchant, Dan R. Sent: Wednesday, December 14, 2005 6:43 AM
To: ian_robinson@uk.ibm.com; alastair.green@choreology.com
Cc: peter.furniss@choreology.com; ws-tx@lists.oasis-open.org
Subject: RE: [ws-tx] Issue 016: WS-C: ReplaceParticipant


+1 for using the ReplyTo.

The replyTo could be an endpoint that virtualizes the specific endpoints within the EPR,
creating a cleaner failover and recover scenario.

My 2 cents,

Dan


-----Original Message-----
From: Ian Robinson [mailto:ian_robinson@uk.ibm.com]
Sent: Wednesday, December 14, 2005 6:15 AM
To: Alastair Green
Cc: Peter Furniss; ws-tx@lists.oasis-open.org
Subject: Re: [ws-tx] Issue 016: WS-C: ReplaceParticipant






As you say, section 9 of WS-AT deals with this situation. I believe the
text is already appropriately worded. Essentially, the registered EPR is
good until it isn't; if the registered EPR becomes "stale" in some way then
the ReplyTo EPR is the means by which the EPR can be "refreshed". There is
deliberately no requirement to replace the registered EPR with the ReplyTo
EPR - this allows an implementatoin to log the registered EPR and to
continue to use it throughout the transaction and across any failures.
The following sequence illustrates how EPR replacement is supported:

Participant A registers EPR Pa.
Coordinator C1 sends Prepare to Pa and it responds Prepared.
Participant A's environment suffers a disasterous failure and the
participant is recovered at a different address.
C1 tries to send commit to Pa but Pa is no longer addressable.
C1 retries the commit.
Meanwhile Pa is recovered at Pa' and resends Prepared to C1 with Pa' as the
ReplyTo MAP.
C1, having determines that Pa is not responding, replaces Pa with Pa' and
REsends commit (per the AT state table)
The transaction proceeds to successful conclusion.


Regards,
Ian Robinson
STSM, WebSphere Messaging and Transactions Architect
IBM Hursley Lab, UK
ian_robinson@uk.ibm.com


                                                                                      Alastair Green                                                            <alastair.green@c                                                         horeology.com>                                             To                                       Peter Furniss                                   13/12/2005 19:04          <peter.furniss@choreology.com>                                                                             cc                                       ws-tx@lists.oasis-open.org                                                                            Subject                                       Re: [ws-tx] Issue 016: WS-C:                                              ReplaceParticipant                                                                                                                                                                                                                                                                                                                                                                                                                                                                             



Mark,

This is an interesting issue, and dovetails with a couple of questions on
the Register/RegisterResponse per se.

The first point is:  we need to make it clear when you have to stop
retrying Register. You shouldn't send it if you've received
RegisterResponse.

If we make R/RR a standard one-way MEP, which I favour, then we can use the
notification/terminal notification nomenclature to state this.

Then we come to your address replacement issue per se.

In BTP we ended up with a message, REDIRECT, which either the Superior
(Coordinator) or Inferior (Participant) could send to the other, saying:
this is entity Foo, please send my messages to this new address. To do this
one needs an identity, so one can say: "I am Foo". If you have a
Coordinator identifier and a Participant identifier, then this is easy.

However, I think we already have this (bidirectional) feature in the WS-AT
and WS-BA protocols in another form, albeit somewhat tucked away.

In Section 9 on use of WS-A Headers, it is stated that a non-terminal
notification has to have a reply-to address. I presume (there is no
statement on this, and that needs fixing, for sure) that this field only
makes sense if I am trying to redirect subsequent traffic. In other words,
I send a standard message but qualify it with the added semantic: "I've
moved". If the receivers sees this, I assume they should overwrite the old
EPR they have, and continue as normal.

Such an address replacement means that redirection is accomplished as a
by-product of recovery-driven replay of messages, or because the load
balancer has done a reshuffle -- it doesn't really matter why.

This is neat, because it avoids having to communicate identifiers for
redirection (they are still needed for the original register as per other
discussions).

Therefore, I believe that this issue could be resolved by supplementing and
expanding the  WS-Coord  spec's statements on  MEPs, types of messages etc,
with a statement that a non-terminal notification reply-to should supplant
the previously held EPR for the next and subsequent messages in the
conversation, and we're done.

It is probably obvious, but I see no very good reason why redirection
(address replacement) should be limited to the Participant end.

Alastair


Peter Furniss wrote:
     This is hereby identified as ws-tx issue 016

     Please follow up to this message or otherwise ensure your subject
     line
     starts "Issue 016 - "
                  (after any Re:, [ws-tx] etc)


     Issue name -- WS-C: ReplaceParticipant

     Owner: Mark Little [mailto:mark.little@jboss.com]

     Target document and draft:

     Protocol:  Coord

     Artifact:  spec / schema

     Draft:

     Coord spec working draft uploaded 2005-12-02

     Link to the document referenced:

     http://www.oasis-open.org/committees/download.php/15738/WS-Coordination
     -
     2005-11-22.pdf


     Issue Type

     Design

     Issue Details

     In order to coordinate long running interactions, it is necessary to
     tolerate failures and recovery situations within the scope of an
     activity (long running activity). Once a participant is registered
     with
     a coordinator,  the current specification implicitly mandates that
     recovery requires it to come back up on the same EPR in order that
     the
     coordinator can subsequently drive it through whatever protocol is
     used
     (e.g., 2PC). However, recovery on the same EPR cannot be guaranteed
     and
     is at best an implementation choice. Failure to recover on the same
     EPR
     will ultimately lead to more coordinated activities terminating in a
     failure state (e.g., aborting) because participants cannot be
     reached,
     even if they failed and recovered prior to the start of execution of
     the

     coordinator's protocol.

     Proposed Resolution:

     That we add a ReplaceParticipant operation that allows a registering
     service to instruct the coordinator service to replace one EPR with
     another EPR. Because EPRs are not currently comparable, a resolution
     of
     issue 7 or 14 is relevant to this issue.






 





[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]