ws-tx message

Subject: RE: [ws-tx] Issue 007 - WS-C: Make Register/RegisterResponse retriable
From: "Sazi Temel" <sazi@bea.com>
To: "Max Feingold" <Max.Feingold@microsoft.com>, "Mark Little" <mark.little@jboss.com>, "Peter Furniss" <peter.furniss@choreology.com>
Date: Mon, 26 Dec 2005 21:08:48 -0800

Max,
Thanks for the clarifications. Since Andrew and I, on another thread,
were discussing the same issue, I am carrying that discussion here, so
that we can close it :)

I agree now that the protocol will work fine in both cases; I was
reading the state tables differently. The time when external and
internal events occurring were not clear, and I was applying the
internal event of 'all forgotten' and 'rollback decision' in earlier
states then it actually occurs thus reaching a wrong conclusion. Perhaps
including definitions for internal events and the sequence in which
external and internal events occurs will help to avoid such
misunderstandings?

I am now satisfied with the way protocols works regarding the issue of
Register/RegisterResponse - it does not produce unnecessary transaction
failures as I thought earlier. 

Good holidays to everyone!
,Sazi
 

-----Original Message-----
From: Max Feingold [mailto:Max.Feingold@microsoft.com] 
Sent: Monday, December 26, 2005 9:25 PM
To: Mark Little; Peter Furniss
Cc: ws-tx@lists.oasis-open.org
Subject: RE: [ws-tx] Issue 007 - WS-C: Make Register/RegisterResponse
retriable

As promised, I wanted to discuss how a WS-AT participant can behave
correctly in the face of a lossy registration transport.  Peter
mentioned [1] that he did not believe that this was feasible, and I
wanted to set the record straight.

 

In the following, P is a participant, C a coordinator and T a
transaction.

 

Let's start with the reinfected participant scenario:

 

1. P sends Register to C for T, containing EPR P1.

2. C sends RegisterResponse to P.

3. P fails.

4. P recovers.  [WS-AT presumes abort, so P will have no memory of T.]

5. An application sends P a context for T.  P registers with C again,
sending EPR P2.

6. The activity completes.

7. C sends Prepare to both P1 and P2.

 

[Note that at this point the transaction should abort.  P may have had
live participants before failing;  they will time out and abort.  P's
new participants should not receive a commit outcome.  Fortunately, P
included distinct unique identifiers in each EPR's RefParams.  P1's
identifier was lost when P failed, but P2's is still alive.]

 

8. P receives Prepare for P1, fails to recognize the enlistment, and
replies Aborted, as per the WS-AT state table.

9. The transaction aborts.  All rejoice.

 

So by using a unique identifier in RefParams, we solve the reinfection
problem.  We can use the same technique to solve the lossy transport
problem without aborting:

 

1. P sends Register to C for T, containing EPR P1.  The message is lost.

2. P times out and decides to re-send Register, containing EPR P2.

3. C receives Register for P2, sends RegisterResponse.  The message is
lost again.

4. P times out again, but perseveres and decides to re-send Register,
containing EPR P3.

5. C sends RegisterResponse.  P receives the message.

 

[Note that only at this point is it correct for P to unblock its own
registrants.  Consequently, the initiating application will not complete
the activity until at least one successful registration has completed
between all Ps and Cs in a transaction.  Other coordination protocols
may vary.]

 

6. The activity completes.

7. C sends Prepare to P2 and P3.

8. P recognizes the Prepare for P2 as superfluous, as there is already
another registered enlistment (P3).

9. P sends ReadOnly to the ReplyTo address in the Prepare message for
P2.

10. P receives Prepare for P3.  P votes as usual (e.g. Prepared).

11. The transaction enters phase two and may commit if appropriate.

12. P receives Commit on P3.  P can now discard P1 as orphaned and
superfluous.

 

Consequently, a participant can choose to survive a lossy network and
send multiple Register messages to its coordinator.  The coordinator
does not need to know or care about the participants choices.  The
participant could just as easily have decided to abort in step 2 and
report failure to its registrants.

 

In that same email [1], Peter mentioned that we might as well eliminate
RegisterResponse if the ReplyTo can be used to send protocol messages.
There are a couple of reasons why this is not an appropriate resolution
for WS-AT:

 

1. RegisterResponse is used by a WS-AT participant to determine whether
the registration step was successful.  This is important for because it
allows TMs to help ensure that the transaction is not completed until
all activity is quiescent.  Applications can be smart about this if they
so choose (e.g. by leveraging volatile enlistment notifications) but in
general we want to avoid committing partial work.

 

2. Participants can use the RegisterResponse EPR to compose unsolicited
Aborted and ReadOnly messages.  These messages can be sent before
receipt of Prepare.

 

[1]
http://www.oasis-open.org/apps/org/workgroup/ws-tx/email/archives/200512
/msg00184.html
<http://www.oasis-open.org/apps/org/workgroup/ws-tx/email/archives/20051
2/msg00184.html> 


________________________________

From: Mark Little [mailto:mark.little@jboss.com]
Sent: Sun 12/18/2005 7:24 AM
To: Peter Furniss
Cc: Max Feingold; ws-tx@lists.oasis-open.org
Subject: Re: [ws-tx] Issue 007 - WS-C: Make Register/RegisterResponse
retriable





Peter Furniss wrote:

>>A failure to receive a register response could trigger a
>>completely new
>>register message with a new EPR (on the assumption a retry of
>>the first
>>attempt caused the already-registered fault to be returned). The only
>>problem I can see at present with this mechanism is that
>>manufacturing a
>>new EPR for the "same" participant may not be feasible in some
>>environments. However, that could be seen as an
>>implementation problem.
>>The advantage would be that no changes to the specification
>>are required
>>- other than a clarification of the text to call out this possibility.
>>   
>>
>
>With no change to the current texts, I don't see how you can get
>already-registered unless the coordinator does an illegal EPR
>comparison. (that is really part of 014 - whatever we specify as the
>reaction, there needs to be a sound way of detecting duplicates - no
>change is not an option).
> 
>
I'm trying to consider the issues in isolation, but I'll admit that's
difficult ;-)

>But apart from that (i.e. assume we have a duplicate detection means),
>and back to
>the conceptual point of this issue,  why specify that a coordinator
>detecting that Register is for the same Participant as as one already
>registered must fault with AlreadyRegistered ? Just assume that the
>transport, or the sending implementation has caused the duplicate to
>turn up, and reply with a RegisterResponse reflecting the Coordinator's
>endpoint.
> 
>
My intention was to point out that a solution is possible within the
scope of the current specification. Whether or not that solution is one
we wish to adopt, is the subject of this and other discussions, just as
the other proposed solutions have been.

>In 95% of cases the EPR's will be unchanged.
>If they have changed (which would only be because the endpoint owner
>"wanted" to change it), the most recent SHOULD be used for sending by
>the peer (not MUST because that would impose complications for some
>persistence strategies).
>
>
> 
>
>>>The alternative of trying to make multiple registrations for
>>>     
>>>
>>what is in
>>   
>>
>>>fact the same participant work would seem to cause considerable
>>>complications. For atomic cases, the coordinator may not mind - it
>>>just sees two (or more) registrations and they must both be committed
>>>     
>>>
>
> 
>
>>>(or
>>>     
>>>
>>rolledback). But Max's
>>   
>>
>>>
>>>
>>>     
>>>
>>>>"The participant
>>>>  
>>>>
>>>>       
>>>>
>>>>>simply needs to behave correctly[1] by distinguishing its multiple
>>>>>enlistments.
>>>>>    
>>>>>
>>>>>         
>>>>>
>>>is very questionable, because it will receive two Prepare's
>>>     
>>>
>>(say), both
>>   
>>
>>>delivered to the same EPR, but must reply to different coordinator
>>>endpoints, one given on
>>>the succesful RegisterResponse, one on the lost one. As in Alastair's
>>>diagrams sent earlier today, it would have to use the
>>>     
>>>
>>Reply-To EPR (in
>>   
>>
>>>which case, why not use that anyway and get rid of the
>>>     
>>>
>>RegisterResponse
>>   
>>
>>>altogether) [this is completely impossible for coordination protocols
>>>     
>>>
>
> 
>
>>>where the first message is participant to coordinator - see
>>>     
>>>
>>Alastair's
>>   
>>
>>>diagram 3]
>>>
>>>
>>>     
>>>
>>I agree all of this is possible and may be sub-optimal in certain
>>degenerate situations. However, when weighed against the timeline
>>imposed for getting WS-C through to standardisation, it may
>>be that the
>>"do nothing" approach I mentioned above is the best option.
>>
>>   
>>
>>>Gosh, this has ended up rather long (and will probably now
>>>     
>>>
>>cross with
>>   
>>
>>>other messages saying the same thing or rendering it out of date)
>>>
>>>
>>>     
>>>
>>To be honest I don't have a hard stance on any solutions to
>>this issue
>>at the moment. My only concern is time spent so far and the fact that
>>there are other issues to work through that may be equally, or more,
>>contentious. I hope we can bring this to a conclusion (a vote) soon.
>>   
>>
>
>Well, we closed a quarter of the issues list yesterday, and this one is
>related to at least two of the others, and the discussion has made good
>progress. I think it's a little early to
>be worrying about timescales.
>
> 
>
I disagree that it is too early. Several of the companies on this list
have implementations that are already interoperable and, speaking as the
representative of one of them, we'd like to get reduce the amount of
time this TC takes to standardise.

Mark.