Re: Managers, addressses and the like

business-transaction message

Subject: Re: Managers, addressses and the like

From: Mark Little <mcl@arjuna.com>

To: Alastair Green <alastair.green@choreology.com>

Date: Wed, 30 May 2001 10:00:52 +0100

Excellent. I don't believe we know what the environments are in which the protocol will actually be used, so a permissive approach is a good one. The existing draft does incorporate the atom id in all messages (i.e. it has that permissive quality, does not assume that address uniquely identifies per-atom coordinator, but can allow it have that effect).

Fine, as long as there's something to which we can direct a BEGIN, and get back a CONTEXT.

Hmm. A sub-coordinator going heuristic because of a presumed timeout, caused by time skew, is not a happy outcome.
I did not mention anything about heuristic.
No, I did. I was musing on the problem that could arise if a sub-coordinator "timed out" because of time skew, when other branches carried on. I was thinking about (as the next sentence indicated) participant timeouts affecting sub-coordinators which _have_ prepared. And, if we allow that participant timeouts can fire after prepare (otherwise they are useless), then it is not clear that we need a separate "before prepare" statement of timeout.

These are quite distinct timeouts. One is the participant saying (essentially) "OK, you've prepare me, so now you'd better confirm within X time units or I'm going to do something myself." Whereas the other is a message to the coordinator saying "if you don't hear from me within Y time units then undo automatically *iff* you haven't got past prepare". The OTS has the second timeout explicitly, whereas the first (if it exists at all) exists purely within the resource and this doesn't get communicated to the coordinator (at least not within the bounds of the protocol).

The first timeout is very useful for failure situations involving the coordinator, *and* more importantly in the environment in which BTP will exist, for denial of service attacks. The second timeout is more a business-logic decision, and some implementations of BTP could use this to configure their subsequent higher-level requests, e.g., if the first participant tells me it can only hold for 20 seconds I may be able to tell the next participant that if it can't complete within (say) 10 seconds it needn't bother.

The interpretation becomes contextual. In other words, if the timeout fires before prepare, we cancel (if we have done work, or resign if we haven't); if it fires after prepare we cancel ourselves,

As I said above, it depends on which timeout we're talking about. What I'd like to see is that the coordinator timeout *always* results in a CANCEL iff it hasn't been told to prepare, i.e., after prepare starts, this timeout is ignored. The second timeout (the participant one) will have an affect that will depend on the participant, i.e., it may CANCEL or CONFIRM itself, leading to heuristics.

which may be a heuristic if it collides with the coordinators' decision. In each case we need to notify the coordinator, and get an ack, and then discard our knowledge of the atom. Given this, what I said about C-P timeout qualification being a negotiation of the participant timeout makes sense.
If you "stop the clock" when the VOTE has been sent then you put participants in the position of holding resources (e.g. time-sensitive quotes) open at the behest of the coordinator, which may take for ever to come back with the outcome.

No, the only clock that stops is the coordinator one once PREPARE is sent. Participant timeouts are still running.

The minuted decision of the FTF (and the whole discussion preceding it) revolved around the qualification of the VOTE by a timeout interval, coupled with a direction indicator (seconds to rollback, seconds to commit), a warning therefore of heuristic possibility. This is different from the "prepare freezes time" approach in conventional tp protocols.

The minutes are therefore inaccurate, because I distinctly remember us (Peter, myself and several of the BEA representatives) discussing this point precisely because Peter had put in his email a point about not knowing why we would ever want to send a coordinator-specific timeout with the context.

Incidentally, time may be the stimulus for participant "withdrawal", but in general if a participant "spontaneously" withdraws (with respect to the coordinator) then the same before/after prepare rules would apply.
In fact, this raises the question as to whether participant timeouts should be allowed for sub-coordinator participants, as they may drag down a whole tree, whereas leaf participants can only kill themselves. I guess you just cannot prevent that. And as you say, in a discontinous environment, give or take an accuracy factor, it's exactly what you want to see happen. In which case coordinator timeout becomes a kind of C-P "I want you stay up for this long" negotiation message, as you suggested at Mt Laurel? The relative time/absolute time problem gets worse here, although the use of relative time will just expand the time to expiry at the bottom of the tree (as Ed pointed out) so I guess it's safe.
Agreed.
Is this still agreed? It wasn't clear to me whether you oppose post-prepare timeouts.

Hopefully the above will have fixed this.

Mark.

-----------------------------------------------------------------------
SENDER : Dr. Mark Little, Architect (Transactions), HP Arjuna Labs
PHONE : +44 191 206 4538, FAX : +44 207 670 1964
EMAIL : mark@arjuna.com

References:

Managers, addressses and the like
- From: Alastair Green <alastair.green@choreology.com>
Re: Managers, addressses and the like
- From: Mark Little <mcl@arjuna.com>
Re: Managers, addressses and the like
- From: Alastair Green <alastair.green@choreology.com>