ws-tx message

Subject: Re: [ws-tx] Issue 007 - WS-C: Make Register/RegisterResponse retriable
From: Mark Little <mark.little@jboss.com>
To: Max Feingold <Max.Feingold@microsoft.com>
Date: Tue, 27 Dec 2005 14:11:19 +0000
Max, it's been a week or so since I had time to follow this thread, but 
comments in line ...

Max Feingold wrote:

>As promised, I wanted to discuss how a WS-AT participant can behave correctly in the face of a lossy registration transport.  Peter mentioned [1] that he did not believe that this was feasible, and I wanted to set the record straight.
>
> 
>
>In the following, P is a participant, C a coordinator and T a transaction.
>
> 
>
>Let's start with the reinfected participant scenario:
>
> 
>
>1. P sends Register to C for T, containing EPR P1.
>
>2. C sends RegisterResponse to P.
>
>3. P fails.
>
>4. P recovers.  [WS-AT presumes abort, so P will have no memory of T.]
>
>5. An application sends P a context for T.  P registers with C again, sending EPR P2.
>
>6. The activity completes.
>
>7. C sends Prepare to both P1 and P2.
>
> 
>
>[Note that at this point the transaction should abort.  P may have had live participants before failing;  they will time out and abort.  P's new participants should not receive a commit outcome.  Fortunately, P included distinct unique identifiers in each EPR's RefParams.  P1's identifier was lost when P failed, but P2's is still alive.]
>  
>
Isn't this exactly the same as what I mentioned a couple of weeks ago: 
you re-register with a new EPR? Although it works, it does require the 
ability to manufacture different EPRs for potentially the same endpoint.

> 
>
>8. P receives Prepare for P1, fails to recognize the enlistment, and replies Aborted, as per the WS-AT state table.
>
>9. The transaction aborts.  All rejoice.
>
> 
>
>So by using a unique identifier in RefParams, we solve the reinfection problem.  
>
Again, agreed: all we're saying is that each registration attempt needs 
to be unique. However, this does come back to EPR comparisons too: the 
coordinator can't know that EPRs are unique because it can't/shouldn't 
look in the RefParams (for a start). Of course we could always punt this 
to implementations, but that affects interoperability.

>We can use the same technique to solve the lossy transport problem without aborting:
>
> 
>
>1. P sends Register to C for T, containing EPR P1.  The message is lost.
>
>2. P times out and decides to re-send Register, containing EPR P2.
>
>3. C receives Register for P2, sends RegisterResponse.  The message is lost again.
>
>4. P times out again, but perseveres and decides to re-send Register, containing EPR P3.
>
>5. C sends RegisterResponse.  P receives the message.
>
> 
>
>[Note that only at this point is it correct for P to unblock its own registrants.  Consequently, the initiating application will not complete the activity until at least one successful registration has completed between all Ps and Cs in a transaction.  Other coordination protocols may vary.]
>
> 
>
>6. The activity completes.
>
>7. C sends Prepare to P2 and P3.
>
>8. P recognizes the Prepare for P2 as superfluous, as there is already another registered enlistment (P3).
>
>9. P sends ReadOnly to the ReplyTo address in the Prepare message for P2.
>
>10. P receives Prepare for P3.  P votes as usual (e.g. Prepared).
>
>11. The transaction enters phase two and may commit if appropriate.
>
>12. P receives Commit on P3.  P can now discard P1 as orphaned and superfluous.
>
> 
>
>Consequently, a participant can choose to survive a lossy network and send multiple Register messages to its coordinator.  The coordinator does not need to know or care about the participants choices.  The participant could just as easily have decided to abort in step 2 and report failure to its registrants.
>
> 
>
>In that same email [1], Peter mentioned that we might as well eliminate RegisterResponse if the ReplyTo can be used to send protocol messages.  There are a couple of reasons why this is not an appropriate resolution for WS-AT:
>
> 
>
>1. RegisterResponse is used by a WS-AT participant to determine whether the registration step was successful.  This is important for because it allows TMs to help ensure that the transaction is not completed until all activity is quiescent.  Applications can be smart about this if they so choose (e.g. by leveraging volatile enlistment notifications) but in general we want to avoid committing partial work.
>
> 
>
>2. Participants can use the RegisterResponse EPR to compose unsolicited Aborted and ReadOnly messages.  These messages can be sent before receipt of Prepare.
>  
>
So as I pointed out to Peter, and you've re-iterated in more detail, 
there are solutions within the scope of the current specification (with 
some suitable text modifications). However, I'm not sure they are 
necessarily ideal and EPR comparisons need to be addressed, even if 
working purely with the current specification - how else can a 
coordinator decide to return the AlreadyRegistered fault? I'm sure 
someone else already suggested this, but maybe we should address that 
issue first, as that will then indicate at least whether the current 
WS-Coordination operation "signatures" are necessary and sufficient to 
do the job.

Mark.

> 
>
>[1] http://www.oasis-open.org/apps/org/workgroup/ws-tx/email/archives/200512/msg00184.html <http://www.oasis-open.org/apps/org/workgroup/ws-tx/email/archives/200512/msg00184.html> 
>
>
>________________________________
>
>From: Mark Little [mailto:mark.little@jboss.com]
>Sent: Sun 12/18/2005 7:24 AM
>To: Peter Furniss
>Cc: Max Feingold; ws-tx@lists.oasis-open.org
>Subject: Re: [ws-tx] Issue 007 - WS-C: Make Register/RegisterResponse retriable
>
>
>
>
>
>Peter Furniss wrote:
>
>  
>
>>>A failure to receive a register response could trigger a
>>>completely new
>>>register message with a new EPR (on the assumption a retry of
>>>the first
>>>attempt caused the already-registered fault to be returned). The only
>>>problem I can see at present with this mechanism is that
>>>manufacturing a
>>>new EPR for the "same" participant may not be feasible in some
>>>environments. However, that could be seen as an
>>>implementation problem.
>>>The advantage would be that no changes to the specification
>>>are required
>>>- other than a clarification of the text to call out this possibility.
>>>  
>>>
>>>      
>>>
>>With no change to the current texts, I don't see how you can get
>>already-registered unless the coordinator does an illegal EPR
>>comparison. (that is really part of 014 - whatever we specify as the
>>reaction, there needs to be a sound way of detecting duplicates - no
>>change is not an option).
>>
>>
>>    
>>
>I'm trying to consider the issues in isolation, but I'll admit that's
>difficult ;-)
>
>  
>
>>But apart from that (i.e. assume we have a duplicate detection means),
>>and back to
>>the conceptual point of this issue,  why specify that a coordinator
>>detecting that Register is for the same Participant as as one already
>>registered must fault with AlreadyRegistered ? Just assume that the
>>transport, or the sending implementation has caused the duplicate to
>>turn up, and reply with a RegisterResponse reflecting the Coordinator's
>>endpoint.
>>
>>
>>    
>>
>My intention was to point out that a solution is possible within the
>scope of the current specification. Whether or not that solution is one
>we wish to adopt, is the subject of this and other discussions, just as
>the other proposed solutions have been.
>
>  
>
>>In 95% of cases the EPR's will be unchanged.
>>If they have changed (which would only be because the endpoint owner
>>"wanted" to change it), the most recent SHOULD be used for sending by
>>the peer (not MUST because that would impose complications for some
>>persistence strategies).
>>
>>
>>
>>
>>    
>>
>>>>The alternative of trying to make multiple registrations for
>>>>    
>>>>
>>>>        
>>>>
>>>what is in
>>>  
>>>
>>>      
>>>
>>>>fact the same participant work would seem to cause considerable
>>>>complications. For atomic cases, the coordinator may not mind - it
>>>>just sees two (or more) registrations and they must both be committed
>>>>    
>>>>
>>>>        
>>>>
>>
>>    
>>
>>>>(or
>>>>    
>>>>
>>>>        
>>>>
>>>rolledback). But Max's
>>>  
>>>
>>>      
>>>
>>>>    
>>>>
>>>>        
>>>>
>>>>>"The participant
>>>>> 
>>>>>
>>>>>      
>>>>>
>>>>>          
>>>>>
>>>>>>simply needs to behave correctly[1] by distinguishing its multiple
>>>>>>enlistments.
>>>>>>   
>>>>>>
>>>>>>        
>>>>>>
>>>>>>            
>>>>>>
>>>>is very questionable, because it will receive two Prepare's
>>>>    
>>>>
>>>>        
>>>>
>>>(say), both
>>>  
>>>
>>>      
>>>
>>>>delivered to the same EPR, but must reply to different coordinator
>>>>endpoints, one given on
>>>>the succesful RegisterResponse, one on the lost one. As in Alastair's
>>>>diagrams sent earlier today, it would have to use the
>>>>    
>>>>
>>>>        
>>>>
>>>Reply-To EPR (in
>>>  
>>>
>>>      
>>>
>>>>which case, why not use that anyway and get rid of the
>>>>    
>>>>
>>>>        
>>>>
>>>RegisterResponse
>>>  
>>>
>>>      
>>>
>>>>altogether) [this is completely impossible for coordination protocols
>>>>    
>>>>
>>>>        
>>>>
>>
>>    
>>
>>>>where the first message is participant to coordinator - see
>>>>    
>>>>
>>>>        
>>>>
>>>Alastair's
>>>  
>>>
>>>      
>>>
>>>>diagram 3]
>>>>
>>>>
>>>>    
>>>>
>>>>        
>>>>
>>>I agree all of this is possible and may be sub-optimal in certain
>>>degenerate situations. However, when weighed against the timeline
>>>imposed for getting WS-C through to standardisation, it may
>>>be that the
>>>"do nothing" approach I mentioned above is the best option.
>>>
>>>  
>>>
>>>      
>>>
>>>>Gosh, this has ended up rather long (and will probably now
>>>>    
>>>>
>>>>        
>>>>
>>>cross with
>>>  
>>>
>>>      
>>>
>>>>other messages saying the same thing or rendering it out of date)
>>>>
>>>>
>>>>    
>>>>
>>>>        
>>>>
>>>To be honest I don't have a hard stance on any solutions to
>>>this issue
>>>at the moment. My only concern is time spent so far and the fact that
>>>there are other issues to work through that may be equally, or more,
>>>contentious. I hope we can bring this to a conclusion (a vote) soon.
>>>  
>>>
>>>      
>>>
>>Well, we closed a quarter of the issues list yesterday, and this one is
>>related to at least two of the others, and the discussion has made good
>>progress. I think it's a little early to
>>be worrying about timescales.
>>
>>
>>
>>    
>>
>I disagree that it is too early. Several of the companies on this list
>have implementations that are already interoperable and, speaking as the
>representative of one of them, we'd like to get reduce the amount of
>time this TC takes to standardise.
>
>Mark.
>
>
>
>  
>
References:
- RE: [ws-tx] Issue 007 - WS-C: Make Register/RegisterResponse retriable
  - From: "Peter Furniss" <peter.furniss@choreology.com>
- Re: [ws-tx] Issue 007 - WS-C: Make Register/RegisterResponse retriable
  - From: Mark Little <mark.little@jboss.com>
- RE: [ws-tx] Issue 007 - WS-C: Make Register/RegisterResponse retriable
  - From: "Max Feingold" <Max.Feingold@microsoft.com>