ws-tx message

Subject: RE: [ws-tx] Commentary on Issue 007 - WS-C: Make Register/RegisterResponseretriable
From: Mark Little <mark.little@jboss.com>
To: "ws-tx@lists.oasis-open.org" <ws-tx@lists.oasis-open.org>
Date: Thu, 09 Feb 2006 16:47:24 +0000
Max, I think we can agree that there's no requirement for extension to register-request/register-response. Therefore, I don't think the discussion should continue to revolve around 007. I also agree that we need to keep WS-C as core/minimal as possible. Now, whether that means a re-registration capability, as discussed in 016 goes into WS-C or referencing specifications, is a matter for a separate debate (assuming we agree it is needed).

Mark.


----


Mark:

In general I would expect implementations to be intolerant of failures
in read-only participants during the propagation phase of an AT.
Without knowing the closure of the extent of the activity, we can't be
sure that the activity is in fact complete.  I do agree with the
documentation aspect:  even the most light-weight of volatile
participants is expected to stick around long enough to vote.

I agree that we shouldn't shape WS-C in a way that limits the freedom of
derived specifications.  As you indicate, not every consensus protocol
presumes abort or accepts frequent rollbacks.  However, I think we
should be equally wary of adding features to WS-C that are of limited
utility to the vast majority of such protocols.  Keeping the baseline
simple and free of a thousand optional details has been a design goal
from the very beginning.

-----Original Message-----
From: Mark Little [mailto:mark.little@jboss.com] 
Sent: Monday, January 30, 2006 5:33 AM
To: Max Feingold
Cc: ws-tx@lists.oasis-open.org
Subject: Re: [ws-tx] Commentary on Issue 007 - WS-C: Make
Register/RegisterResponse retriable

Max, you're absolutely correct in that re-registration cannot simply be 
a done by the participant (or a service acting on its behalf) without 
some recourse to what happened prior to the failure. However, before I 
go on, it is worth noting that many transactional applications today 
have a large proportion of read-only participants within them, and the 
inability to tolerate their failures prior to the start of the 2PC is 
something that we shouldn't ignore (and at the very least we should 
document).

Moving on, WS-AT is probably a bad example of why I still think 
participant IDs or some form of EPR comparison may continue to be 
required though (NOTE: this is not related directly to 007, but through 
issue 016 that I raised which is relevant to the 
register-request/register-response debate). In long running 
transactions, of which WS-BA is an example, persisting of information by

the participant/service as the activity progresses may well be the norm 
(hence why I said that WS-AT is probably not a good example). It's 
certainly my experience that long running services do checkpoints of 
their business state as well as aspects of the transaction state in 
order that they can pick up from where they left off in the event of a 
failure; this includes subordinate coodinator information, such as 
intentions lists. In that situation, it does make sense for a recovering

service to want to register another participant with the coordinator if 
it can't re-establish the original participant EPR. Loss of data 
integrity is not necessarily violated, but it certainly can't be 
ignored: hence the reason I agreed at the top of this email - any 
attempt to register a new participant must be made by the application 
because only it has the necessary semantic information to determine what

is right and what is wrong. That's also precisely why I don't think we 
should be preventing it.

In long running interactions, we *have* to be able to tolerate failures 
and recovery more than would be the case with the typical short-duration

WS-AT transactions. Otherwise there may be an argument as to what 
benefits WS-BA really brings in these situations over something like a 
workflow systems that groups together short duration WS-AT transactions.

The extensibility capability in WS-C can certainly be used to achieve 
this, but then the question becomes: is this sufficiently generic for 
transaction models that we want to put it within WS-C, or do we mandate 
that each model that requires it develops its own solution? However we 
do this, if the extensibility element is used then obviously it needs to

be documented in whatever model/protocol uses it, or we lose 
interoperability.

Mark.


Max Feingold wrote:
>
> Hello.
>
> I'd like to add some written content on issue 007, in order to clarify

> the point I made verbally during our last TC call: that durable 
> participant identifiers in isolation do not add any value to 
> registration retry scenarios in WS-AT.
>
> In an earlier message [1] to this list, I introduced the details of 
> two important scenarios where a transaction manager (TM) will send a 
> registration retry in the context of WS-AT.
>
> The first is the positive case: a subordinate TM opts to send a second

> registration message because its first registration message did not 
> receive a response (either because the Register message was lost or 
> because the RegisterResponse message was lost). In this case, the 
> expectation is that the dropped message will not prevent the 
> transaction from committing.
>
> The second is the negative case: a subordinate TM successfully 
> registers for 2PC, then fails and recovers. Because WS-AT presumes 
> abort, the recovering TM will have forgotten about its membership in 
> the existing transaction. Consequently, any participants it might have

> accumulated during the active phase before the failure will also have 
> been forgotten. Because WS-AT is a disconnected protocol and the TM 
> has no recollection of prior events, no immediate action will be taken

> by any node in the transaction tree. As the active phase proceeds, the

> TM may be re-infected with the same transaction. If that occurs, the 
> TM will naturally attempt to register with a coordinator[2] in order 
> to (re)join the transaction. In this case, the expectation is that to 
> avoid transaction tree splits (and therefore data corruption) the 
> transaction needs to be aborted.
>
>
> In my earlier message[1], I explained how a participant implementation

> can ensure that each of these scenarios is processed correctly in the 
> context of the current specifications. To summarize, the participant 
> simply needs to use a unique volatile identifier in each registration 
> EPR in order to distinguish which individual registration is being 
> targeted by subsequent protocol messages composed by its 
> coordinator(s). This technique allows the participant to send an 
> arbitrary number of registration requests while retaining correctness 
> (in the context of the two scenarios mentioned above) and imposing no 
> special requirements on the behavior of a WS-AT coordinator.
>
> What I would like to do now is explain why this technique _/must/_ 
> still be used even if durable participant identifiers are used in 
> registration. In other words, _/adding durable participant identifiers

> does not add value to WS-AT's registration retry semantics/_.
>
> To illustrate this argument, I've outlined the behavior that we would 
> observe when durable participant identifiers are used, but the 
> volatile identifiers mentioned above (or an equivalent) are not used.
>
> As before, P is a participant, C a coordinator and T a transaction.
>
> 1. P registers for durable 2PC on C for transaction T.
> a. P provides C with durable identifier IP.
> 2. P accumulates a number of participants during the active phase
> 3. P fails and recovers, is reinfected with T by a local application.
> 4. P (re-)registers for durable 2PC on C for transaction T.
> a. P provides C with the same durable identifier IP.
> 5. C recognizes IP and responds to P with a successful 
> RegisterResponse message.
> a. C continues to think of P's multiple registrations as a single 
> enlistment.
>
> At this point, there is nothing preventing the transaction from 
> committing. This violates the principles outlined above for the 
> re-infection scenario. Data corruption will result.
>
> One can imagine a couple of variations on this behavior in order to 
> attempt to address the problem:
>
> 1. C could abort the transaction when it recognizes P's duplicate 
> registration in step 5.
> a. This is self-defeating, as the entire purpose of the feature is to 
> allow registration retries. In other words, this breaks the positive 
> first scenario.
>
> b. If the original C is different from the second C, the latter will 
> not know to abort the transaction.
> 2. C could send an augmented RegisterResponse indicating "already 
> registered". P could detect this condition and abort if it only 
> remembers sending one registration message.
>
> c. This is not a full solution: it _/stops working/_ as soon as the 
> recovered P needs to send more than one Register message (e.g. it 
> falls into the deliberate retry pattern exemplified by the positive 
> first scenario).
>
> d. If the original C is different from the second C, no augmented 
> response will be sent. Only P can truly know (or deduce) that it has 
> registered twice, but forgotten the state associated with the first 
> enlistment.
>
> The bottom line is that the participant needs to own the problem of 
> noticing that it has become amnesiac. It does not work to attempt to 
> push the problem off onto a coordinator; which is essentially what the

> durable participant id proposal does. To make these scenarios work 
> correctly, some kind of per-enlistment identifier scheme needs to be 
> used at the participant, in order to distinguish individual 
> enlistments created by individual registrations. This is in effect the

> same scheme that was already discussed [1]. It can be implemented 
> easily, cleanly and interoperably with the current specifications.
>
> I conclude therefore that WS-AT does not benefit from durable 
> participant identifiers, for the stated purpose of allowing 
> registration retries to occur safely.
>
> For those coordination protocols that do happen to need participant 
> identifiers, in order to enable some other feature, it is trivial to 
> use the extensibility provided by WS-C to add them as a feature of 
> those protocols. That is _/precisely/_ why this extensibility was 
> created: to allow the design and use of features that do not make 
> sense for all coordination protocols.
>
> Thanks,
>
> -mfeingol
>
> [1] _http://lists.oasis-open.org/archives/ws-tx/200512/msg00223.html_
>
> [2] Due to the vagaries of transaction propagation, the coordinator 
> may not be the same one as before.