ws-tx message

Subject: RE: [ws-tx] Issue 007 - WS-C: Make Register/RegisterResponse retriable
From: "Max Feingold" <Max.Feingold@microsoft.com>
To: "Alastair Green" <alastair.green@choreology.com>
Date: Mon, 9 Jan 2006 20:54:16 -0800
Alastair:
 
I think you may have misinterpreted my comment about message ids.  I am not advocating that we use message ids to solve the general problem under discussion;  I was simply observing that using transport retries as a motivation for forcing coordinators to distinguish between different registrations is a bit of a red herring.  Any SOAP stack that uses a transport that can transmit duplicate messages would be well-advised to implement some form of duplicate detection to avoid redundant and non-idempotent message processing.  Consistent with the spirit and letter of WS-A, a WS-C implementation that deliberately sends multiple registration messages should use a distinct message id for each message.
 
Concerning completion, a reasonable implementation of a transactions API on top of WS-AT would likely perform its registration for completion immediately after creating the transaction.  A failure in either of these two initial operations would likely result in the abandonment of the transaction, because at this early stage the transaction has not been propagated or shared with any resources.  Failing early is both cheap and reasonable, as these operations will generally be performed against a transaction manager that is in some sense "local".
 
While I agree with you that it is unwise to assume reliable transports for general distributed protocols, we do have some specific knowledge concerning the semantics and usage scenarios of AT Completion.  By its very nature, AT Completion is far less likely than 2PC to be used across machines or to cross interoperability boundaries.  Consequently, it is not clear to me that we need to be as preoccupied with recovering from Completion registration failures as we are with 2PC registrations.

________________________________

From: Alastair Green [mailto:alastair.green@choreology.com]
Sent: Sat 1/7/2006 1:30 PM
To: Max Feingold
Cc: Mark Little; Peter Furniss; ws-tx@lists.oasis-open.org
Subject: Re: [ws-tx] Issue 007 - WS-C: Make Register/RegisterResponse retriable


Max,

I want to focus on two points you raise:

a) the "triviality" of avoiding duplicate terminator registration in the WS-AT Completion protocol

b) the issue of generality (WS-C versus WS-X which references WS-C)

Triviality

I think you are masking a substantive problem with the label of "triviality" applied to the WS-AT Completion protocol. The current scheme (message ids combined with an AlreadyRegistered [or CannotRegisterParticipant] fault) is not the best solution. I am also worried about reliance on SOAP stack implemention characteristics (which are not standardized), and on estimates of retry or failure likelihood, based on "topological closeness" (which seems like a contradiction in terms to me).

A coordinator receives a registration from a participant. Its view is: "I will not allow more than one registration: there must only be one participant." Restated: allow only one agent to adopt the role of transaction terminator [Initiator in WS-AT terminology] in the Completion protocol. I assume here that we need to avoid the situation where two programs register themselves to play the terminating role. 

This rule can only be fulfilled properly if the coordinator can a) handle repeat registrations of the same participant, and b) distinguish between repeat registrations of the same thing, on the one hand, and registrations of different things, on the other. It seems very easy to handle both of these requirements using participant ids. 

Repeat registrations of the same participant must be responded to, as there is no way of knowing if the repeat occurs because the original response was lost. Making a rule that retries will not be made, to avoid this problem, seems unnecessary, and inconsistent. Both WS-AT and WS-BA, in their coordination protocols proper, use replays to achieve failure tolerance. It is true that a "one shot" approach would "work" in WS-AT, in the sense that a transaction timeout could be used to garbage collect the transaction, but I see no good reason to have such a fragile approach.

Either we must make an unambiguous statement that retry will not occur, or we must state how retries will be handled. A SOAP stack that had a retry strategy based on replaying the same request with the same message id until a related reply was received would violate a rule prohibiting retries. What is required is a rule in this specification that does permit retries, with defined means of identifying replays: this rule can then be implemented at whatever level makes sense for a particular product. Any other approach will not define interoperable behaviour correctly.

(If we do repeat, I think we are all agreed that it is not appropriate to use a fault as the response replay: a fault is the wrong way to carry the required EPR, whose transmission is required to terminate the exchange.) 

What is the best way of identifying replays on behalf of a given Participant? Idempotence via reuse of message ids for replays is contrary to the spirit of WS-Addressing, as I have pointed out in a prior posting. A separated WS-Addressing implementation is quite likely to generate a unique message-id for each RR MEP exchange. To demand that it allow "chaining" (repetition of the same message id as a prior exchange) is to introduce a non-WS-Addressing concept. Which is fine, but then we are not relying upon another specification's approach or stipulations: we have a free hand to achieve our requirement optimally, and we must write the rules.

Your comment that it is easy to eliminate duplicates at the transport layer (ignore the second delivery of the same message id) dovetails with your view that it is unlikely that deliberate retries will be attempted. But deliberate retrying is perfectly likely -- I think you anticipate it happening in a SOAP/WS-A stack. 

An implementation may have some method such as Transaction.commit(). The implementation of this API call will logically cause a) registration for AT CP, and b) transmission of AT Commit. If the registration fails to receive a response (we assume that communications can fail) then I would want to retry (for some configurable number of times) before blowing out the client. Assumptions of deployment "closeness" ("topological closeness") have no place in a distributed interoperation protocol: we cannot rely on high hopes of reliability relating to "closeness" of two agents. If they are connected by an unreliable transport of unknown quality (which the specs otherwise assume) then any message send can fail, and we must take account of that. There is no "connection break" to inform us that the attempted exchange is out of the water: we must do that job at our level in the stack.

Even if the questionable technique of reusing WS-A message ids to identify a sequence is used, it is unclear why it should be deemed to be the best solution. Participant ids are lighter weight, more obvious in their intent and purpose, and more generally useful.

The elimination of multiple terminator registrations also requires identifying the logical entity on whose behalf a Register/RegisterResponse exchange operates. Again, message ids as a means of discrimination could be used. But what is being identified here is not the message exchange, but the sequence of message exchanges required to achieve, in a reliable way, the registration of the Initiator. And (in this context) the identity of the sequence R1/RR1, R2/RR2 .. Rn/RRn is tantamount to the identity of the registrant (i.e. the Participant in WS-C terms).

Method A: We can bend the meaning of message id (stating that it must be reused for retries), and add a message id (URL) to the request, and reference it in the reply by use of the request-reply MEP.

Method B: We can add a U/IRI participant id to the request, and only use one MEP (one-way with full addressing). The reply is not affected.

One might say: on a scale of triviality, B is more trivial than A. Stone B also kills several other birds in passing.

Generality

I am much more sympathetic to your points on avoiding false generality in the "base class" of WS-C. This is a classic design choice: how many reuses justify depression to the base of a given piece of functionality? To which there is no "right" answer.

I believe that both WS-AT and WS-BA require the same feature (to be precise, all known issues relating to identification, duplicate/multiple registration of participants for both protocols can best be resolved by one solution: participant ids). I think one could put this feature in WS-C, or restate the feature in each referencing specification. Personally, I would prefer to do it in WS-C.

Alastair

Max Feingold wrote: 

	Merry Christmas and happy holidays to all!
	
	 
	
	There are a few observations I would like to make on this topic before I head out on vacation.
	
	 
	
	First, it is perfectly possible to implement WS-AT without participant identifiers in a manner that does not prohibit deliberate resends generate undesired transaction aborts.  There are two generally interesting cases:  one in which the participant has forgotten and is not aware that it is sending a duplicate Register, and another in which the participant has deliberately decided to resend Register.  Both can be made to work in an interoperable fashion.  I'll send a separate message containing that discussion.
	
	 
	
	Second, I do not believe that anyone in this TC wishes to prohibit the possibility of creating a coordination protocol that relies on participant identifiers or any other mechanism in order to ensure correctness.  On the other hand, it seems unwise to me to attempt to enforce a single model for registration for every coordination protocol, regardless of their specific requirements.
	
	 
	
	The design spirit of WS-Coordination, which we applied quite successfully in the last telephone discussion (concerning the appropriate definition and placement of faults), is to include two general sets of mechanisms in WS-C:
	
	 
	
	1) Those that are used by virtually all protocols
	
	2) Broad extensibility that allows derived protocols to cover their other specific needs.  That philosophy, applied to this discussion, would suggest that if a given mechanism is not needed by our current coordination protocols, it is not a good candidate for inclusion in the base specification.
	
	 
	
	Consequently, the participant identifier mechanism is a perfect example of a mechanism that should make use of WS-Coordination extensibility.  Any protocol that requires the ability to detect duplicate registrations and uniquely identify participants can simply leverage the open content that is provided in the Register message.
	
	 
	
	I think that the interesting discussion is not whether such a mechanism belongs in WS-Coordination (I think it is pretty clear that it does not), but whether specific coordination protocols need such a mechanism.  The ones that do should not be prohibited by WS-C;  the ones that don't should not suffer any additional complexity.  I believe that is the case with the current language in the specifications, although some editorial language clarifying this freedom may be appropriate (e.g. Ian's suggested text).
	
	 
	
	Third, some odds and ends in response to several previous messages: 
	
	 
	
	- AlreadyRegistered was intended mostly for protocols such as WS-AT Completion where duplicate detection is trivial.  Given that we have already determined that a RegistrationFailed fault is desirable, we can probably just delete the AlreadyRegistered fault.  For protocols that can detect duplicates, the appropriate response for a duplicate registration is likely either a RegistrationFailed fault with a specific reason, a standard RegisterResponse or some protocol-specific message.
	
	 
	
	- Register messages that are duplicated by the transport are not likely to be of concern to a coordination protocol;  duplicate detection can be trivially performed at the SOAP layer by filtering on message ids.  It's just as easy as filtering on some other identifier, and it's likely that many stacks will already do this.
	
	 
	
	- WS-AT Completion registrations are restricted to a single participant.  It is true that adding participant identifiers would allow a completion participant to re-send register.  However, Completion participants are (a) unlikely to be recoverable or tolerant of failures and (b) unlikely to be topologically distant from their coordinator.  Consequently, I do not sense a strong need to allow the Completion registrations to be re-sent.
	
	
	________________________________
	
	From: Mark Little [mailto:mark.little@jboss.com]
	Sent: Sun 12/18/2005 7:24 AM
	To: Peter Furniss
	Cc: Max Feingold; ws-tx@lists.oasis-open.org
	Subject: Re: [ws-tx] Issue 007 - WS-C: Make Register/RegisterResponse retriable
	
	
	
	
	
	Peter Furniss wrote:
	
	  

			A failure to receive a register response could trigger a
			completely new
			register message with a new EPR (on the assumption a retry of
			the first
			attempt caused the already-registered fault to be returned). The only
			problem I can see at present with this mechanism is that
			manufacturing a
			new EPR for the "same" participant may not be feasible in some
			environments. However, that could be seen as an
			implementation problem.
			The advantage would be that no changes to the specification
			are required
			- other than a clarification of the text to call out this possibility.
			  
			
			      

		With no change to the current texts, I don't see how you can get
		already-registered unless the coordinator does an illegal EPR
		comparison. (that is really part of 014 - whatever we specify as the
		reaction, there needs to be a sound way of detecting duplicates - no
		change is not an option).
		
		
		    

	I'm trying to consider the issues in isolation, but I'll admit that's
	difficult ;-)
	
	  

		But apart from that (i.e. assume we have a duplicate detection means),
		and back to
		the conceptual point of this issue,  why specify that a coordinator
		detecting that Register is for the same Participant as as one already
		registered must fault with AlreadyRegistered ? Just assume that the
		transport, or the sending implementation has caused the duplicate to
		turn up, and reply with a RegisterResponse reflecting the Coordinator's
		endpoint.
		
		
		    

	My intention was to point out that a solution is possible within the
	scope of the current specification. Whether or not that solution is one
	we wish to adopt, is the subject of this and other discussions, just as
	the other proposed solutions have been.
	
	  

		In 95% of cases the EPR's will be unchanged.
		If they have changed (which would only be because the endpoint owner
		"wanted" to change it), the most recent SHOULD be used for sending by
		the peer (not MUST because that would impose complications for some
		persistence strategies).
		
		
		
		
		    

				The alternative of trying to make multiple registrations for
				    
				
				        

			what is in
			  
			
			      

				fact the same participant work would seem to cause considerable
				complications. For atomic cases, the coordinator may not mind - it
				just sees two (or more) registrations and they must both be committed
				    
				
				        

		    

				(or
				    
				
				        

			rolledback). But Max's
			  
			
			      

				    
				
				        

					"The participant
					 
					
					      
					
					          

					simply needs to behave correctly[1] by distinguishing its multiple
					enlistments.
					   
					
					        
					
					            

				is very questionable, because it will receive two Prepare's
				    
				
				        

			(say), both
			  
			
			      

				delivered to the same EPR, but must reply to different coordinator
				endpoints, one given on
				the succesful RegisterResponse, one on the lost one. As in Alastair's
				diagrams sent earlier today, it would have to use the
				    
				
				        

			Reply-To EPR (in
			  
			
			      

				which case, why not use that anyway and get rid of the
				    
				
				        

			RegisterResponse
			  
			
			      

				altogether) [this is completely impossible for coordination protocols
				    
				
				        

		    

				where the first message is participant to coordinator - see
				    
				
				        

			Alastair's
			  
			
			      

				diagram 3]
				
				
				    
				
				        

			I agree all of this is possible and may be sub-optimal in certain
			degenerate situations. However, when weighed against the timeline
			imposed for getting WS-C through to standardisation, it may
			be that the
			"do nothing" approach I mentioned above is the best option.
			
			  
			
			      

				Gosh, this has ended up rather long (and will probably now
				    
				
				        

			cross with
			  
			
			      

				other messages saying the same thing or rendering it out of date)
				
				
				    
				
				        

			To be honest I don't have a hard stance on any solutions to
			this issue
			at the moment. My only concern is time spent so far and the fact that
			there are other issues to work through that may be equally, or more,
			contentious. I hope we can bring this to a conclusion (a vote) soon.
			  
			
			      

		Well, we closed a quarter of the issues list yesterday, and this one is
		related to at least two of the others, and the discussion has made good
		progress. I think it's a little early to
		be worrying about timescales.
		
		
		
		    

	I disagree that it is too early. Several of the companies on this list
	have implementations that are already interoperable and, speaking as the
	representative of one of them, we'd like to get reduce the amount of
	time this TC takes to standardise.
	
	Mark.
References:
- Re: [ws-tx] Issue 007 - WS-C: Make Register/RegisterResponse retriable
  - From: Alastair Green <alastair.green@choreology.com>