OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

ws-tx message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [ws-tx] Issue 007 - WS-C: Make Register/RegisterResponse retriable


I've gone back to Max's detailed analysis, rather than get tangled in
several layers of > prefixes.



Comments interspersed.

Main point is towards the end

Peter

> -----Original Message-----
> From: Max Feingold [mailto:Max.Feingold@microsoft.com] 
> Sent: 27 December 2005 02:25
> To: Mark Little; Peter Furniss
> Cc: ws-tx@lists.oasis-open.org
> Subject: RE: [ws-tx] Issue 007 - WS-C: Make 
> Register/RegisterResponse retriable
> 
> 
> As promised, I wanted to discuss how a WS-AT participant can 
> behave correctly in the face of a lossy registration 
> transport.  Peter mentioned [1] that he did not believe that 
> this was feasible, and I wanted to set the record straight.
> 
>  
> 
> In the following, P is a participant, C a coordinator and T a 
> transaction.
> 
>  
> 
> Let's start with the reinfected participant scenario:
> 
>  
> 
> 1. P sends Register to C for T, containing EPR P1.
> 
> 2. C sends RegisterResponse to P.
> 
> 3. P fails.
> 
> 4. P recovers.  [WS-AT presumes abort, so P will have no memory of T.]
> 
> 5. An application sends P a context for T.  P registers with 
> C again, sending EPR P2.

yes, it is reasonable to expect the new EPR to be "naturally" different.
P has 
created a new state entity, and it would be likely that it would have a 
different identification if the RefParams are being used for local
routing to the
state entity.
 
> 6. The activity completes.
> 
> 7. C sends Prepare to both P1 and P2.
> 
>  
> 
> [Note that at this point the transaction should abort.  P may 
> have had live participants before failing;  they will time 
> out and abort.  P's new participants should not receive a 
> commit outcome.  Fortunately, P included distinct unique 
> identifiers in each EPR's RefParams.  P1's identifier was 
> lost when P failed, but P2's is still alive.]

Yes, and indeed it should abort as there may have been work done or
messages
sent by P1. 

But note we've added a requirement to the P implementation that it MUST
have a 
different EPR. Compare Jini transactions, where there is a "crash count"
value 
to ensure that such a replacement registration is distinguished - and
forces 
rollback of the transaction. The requirement for different
identification should
be stated explicitily in the spec.

>  
> 
> 8. P receives Prepare for P1, fails to recognize the 
> enlistment, and replies Aborted, as per the WS-AT state table.
> 
> 9. The transaction aborts.  All rejoice.
> 
>  
> 
> So by using a unique identifier in RefParams, we solve the 
> reinfection problem.  We can use the same technique to solve 
> the lossy transport problem without aborting:
> 
>  
> 
> 1. P sends Register to C for T, containing EPR P1.  The 
> message is lost.
> 
> 2. P times out and decides to re-send Register, containing EPR P2.
> 
> 3. C receives Register for P2, sends RegisterResponse.  The 
> message is lost again.
> 
> 4. P times out again, but perseveres and decides to re-send 
> Register, containing EPR P3.
> 
> 5. C sends RegisterResponse.  P receives the message.
> 
>  
> 
> [Note that only at this point is it correct for P to unblock 
> its own registrants.  Consequently, the initiating 
> application will not complete the activity until at least one 
> successful registration has completed between all Ps and Cs 
> in a transaction.  Other coordination protocols may vary.]

"unblock its own registrants" - I think I know what you mean, but is
this
requirement on P's behaviour stated anywhere ? Stating it correctly will
need careful text - what exactly is blocked, what can be permitted to 
proceed in parallel, what can finish in parallel. (Sending
RegisterResponse 
(if P is itself a (sub-)coordinator), application replies, downstream 
application messages, database access etc.)

>  
> 
> 6. The activity completes.
> 
> 7. C sends Prepare to P2 and P3.
> 
> 8. P recognizes the Prepare for P2 as superfluous, as there 
> is already another registered enlistment (P3).
> 
> 9. P sends ReadOnly to the ReplyTo address in the Prepare 
> message for P2.
> 
> 10. P receives Prepare for P3.  P votes as usual (e.g. Prepared).
> 
> 11. The transaction enters phase two and may commit if appropriate.
> 
> 12. P receives Commit on P3.  P can now discard P1 as 
> orphaned and superfluous.
> 
>  
> 
> Consequently, a participant can choose to survive a lossy 
> network and send multiple Register messages to its 
> coordinator.  The coordinator does not need to know or care 
> about the participants choices.  The participant could just 
> as easily have decided to abort in step 2 and report failure 
> to its registrants.

Yes, this does work, but now we've done something much less natural
with RefParams.  P only created one state entity (so whatever it's
using for local routing is going to stay the same), but must
now add extra discrimination in the EPR, with a "soft" identifier 
so the state entity can recognise which of its registration attempts
is being addressed.

[ aside: the ability to add "soft" identifiers like that is a good
example of why WS-A necessarily disallows comparison of others' EPRs ]

Again, we would have to add text stating that a Participant can
register more than once, provided it can distinguish which of 
its EPR is being addressed, and it SHOULD treat the first
RegisterResponse
received as defining which of its EPRs is "real" and reply ReadOnly 
to any others, and mandate the behaviour described above.


MAIN POINT:
I realise you were strictly just answering my claim that  
without duplicate registration detection at the coordinator
this wouldn't work. But in doing so the solution has added complexity to
the
implementations, which will probably need to be reflected in the
specification.

The solution works for WS-AT 2PC, but only because it is atomic AND has
a first message from C to P AND is presume-abort AND is designed to be
maximally vulnerable to transient failures.  The question has to
be taken up again for WS-AT Completion (as it has been, with a different
solution) for WS-BA, and for any other protocol that uses WS-C that 
didn't share those characteristics of WS-AT 2PC.

Whereas putting the identification of the state entity at P (which 
would be different between the Registers in the first case above, the
same for all Registers in the latter) into an overt field of Register
allows the WS-C implementation to handle duplicates for all protocols.
Register automatically becomes functionally idempotent.  There don't 
have to be special rules about use of identifiers that apply in one
case but not others.

END OF MAIN POINT (one more minor below)


> 
>  
> 
> In that same email [1], Peter mentioned that we might as well 
> eliminate RegisterResponse if the ReplyTo can be used to send 
> protocol messages.  There are a couple of reasons why this is 
> not an appropriate resolution for WS-AT:

Yes, I agree with these.   Not sure if we might not need to
include at least guidance text on 1 though.

> 
> 1. RegisterResponse is used by a WS-AT participant to 
> determine whether the registration step was successful.  This 
> is important for because it allows TMs to help ensure that 
> the transaction is not completed until all activity is 
> quiescent.  Applications can be smart about this if they so 
> choose (e.g. by leveraging volatile enlistment notifications) 
> but in general we want to avoid committing partial work.
> 
> 
> 2. Participants can use the RegisterResponse EPR to compose 
> unsolicited Aborted and ReadOnly messages.  These messages 
> can be sent before receipt of Prepare.
> 
>  
> 
> [1] 
> http://www.oasis-open.org/apps/org/workgroup/ws-tx/email/archi
ves/200512/msg00184.html <http://www.oasis->
open.org/apps/org/workgroup/ws-tx/email/archives/200512/msg001
> 84.html> 
> 


pruned off the older messages

Peter

-----------------------------------
Chief Scientist
Choreology Ltd
web: www.choreology.com   <-- now with Cohesions 3.0 available for
download !

email:   peter.furniss@choreology.com
phone:   +44 20 8313 1833
mobile:  +44 7951 536168



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]