Mark,
Replying only to the contentious:
I'll leave this, as I think previous
emails have pretty much done-this-to-death, and we seem to be in agreement
that an application specific impetus ca stimulate an early prepare message
being sent by the BTP participant.
For the avoidance of doubt, I assume
you mean "early vote message"
No, there was an actor: it was a coordinator
manager/factory service.
Now that this actor has emerged from
the shadowy catch-all of "coordinator", there is an actor to receive BEGIN,
yes. (:-))
However, as I said, we don't
believe this is the most valid model for web services. But in order to
move things along we'd be happy with a protocol that encapsulates both.
When you consider the amount of payload that will be accompanying any BTP
message, adding the atom id to PREPARE, CONFIRM, CANCEL (UNDO) isn't going
to have much of an impact. If the receiving service only knows about a
single coordinator then it can either ignore the extra information or (and
probably more sensibly) check that it refers to itself just in case!
Excellent. I don't believe we know
what the environments are in which the protocol will actually be used,
so a permissive approach is a good one. The existing draft does incorporate
the atom id in all messages (i.e. it has that permissive quality, does
not assume that address uniquely identifies per-atom coordinator, but can
allow it have that effect).
However, I believe that it is
about time we moved onto the concrete implementation of this protocol in,
say, XML and SOAP.
Yes, but in order to do that we need
to exactly define "on paper" the names and the required fields, and resolve
issues like "is this a request-response model, or is it a conversation
made up of oneways (at a logical level)", so we know who does pairing etc
etc. I don't think we're through that kind of discussion yet. Peter is
working on a more detailed, revised version of the protocol message set,
state table etc. XML can/will follow in short order.
Hmm.
A sub-coordinator going heuristic because of a presumed timeout,
caused by time skew, is not a happy outcome.
I did not mention anything about heuristic.
No, I did. I was musing on the problem
that could arise if a sub-coordinator "timed out" because of time skew,
when other branches carried on. I was thinking about (as the next sentence
indicated) participant timeouts affecting sub-coordinators which _have_
prepared. And, if we allow that participant timeouts can fire after prepare
(otherwise they are useless), then it is not clear that we need a separate
"before prepare" statement of timeout. The interpretation becomes contextual.
In other words, if the timeout fires before prepare, we cancel (if we have
done work, or resign if we haven't); if it fires after prepare we cancel
ourselves, which may be a heuristic if it collides with the coordinators'
decision. In each case we need to notify the coordinator, and get an ack,
and then discard our knowledge of the atom. Given this, what I said about
C-P timeout qualification being a negotiation of the participant timeout
makes sense.
If you "stop the clock" when the
VOTE has been sent then you put participants in the position of holding
resources (e.g. time-sensitive quotes) open at the behest of the coordinator,
which may take for ever to come back with the outcome. The minuted decision
of the FTF (and the whole discussion preceding it) revolved around the
qualification of the VOTE by a timeout interval, coupled with a direction
indicator (seconds to rollback, seconds to commit), a warning therefore
of heuristic possibility. This is different from the "prepare freezes time"
approach in conventional tp protocols.
Incidentally, time may be the stimulus for participant "withdrawal",
but in general if a participant "spontaneously" withdraws (with respect
to the coordinator) then the same before/after prepare rules would apply.
In
fact, this raises the question as to whether participant timeouts should
be allowed for sub-coordinator participants, as they may drag down a whole
tree, whereas leaf participants can only kill themselves. I guess you just
cannot prevent that. And as you say, in a discontinous environment, give
or take an accuracy factor, it's exactly what you want to see happen. In
which case coordinator timeout becomes a kind of C-P "I want you stay up
for this long" negotiation message, as you suggested at Mt Laurel? The
relative time/absolute time problem gets worse here, although the use of
relative time will just expand the time to expiry at the bottom of the
tree (as Ed pointed out) so I guess it's safe.
Agreed.
Is this still agreed? It wasn't clear
to me whether you oppose post-prepare timeouts.
Alastair
|