OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

bt-models message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]

Subject: Re: 2001-06-01_OASIS_BTP_abstractmessages_DRAFT_1

> Yes, there is such a risk. If a one-shot reply carries
> enroll (no-response) + app response + vote/ready/confirm/10
> but that gets lost (perhaps the initiator and coordinator crashed) and the
> timer goes off, the participant will confirm on its own.

It's not dependent on a one-shot. If the participant sends an out-of-band
VOTE and that is lost (or not acknolwedged) then these issues still arise.
If I want to optimise recovery so that it is driven from both the
coordinator and the participant it would mean that the coordinator has to
record information prior to prepare, such that if it fails and recovers, it
knows who to contact.

>  It will still
> attempt to run the recovery sequence (I'm doing the state table details
> there at the moment) and detect there is a problem, even if the
> initiator/coordinator have no surviving information.

I think this comes down to what we began to discuss at JavaOne: what kind of
boxcarring is legal. Also, where does recovery get driven from? We've been
thinking quite a bit about this and I'll have the promised text on
redirection/recovery for Monday.

> This is only a risk where there are autonomous confirms (and in this case
> with spontaneous ready).

Autonomous and unacknowledged.

> I don't think there is any general protocol cure possible - by avoiding
> various features of the protocol, you can ensure it can't happen in a
> particular case (if location of participants is pre-known, spontaneous
> are not used, autonomous confirm is not allowed etc.), but I think the
> is inevitable for the total combination.

If we have an ack message then participants can work in a safer mode IMO. No
ack and a subsequent cancel means no heuristic. Ack and cancel/confirm means
the participant must stay around to find out what happened (and it knows the
coordinator has recorded sufficient information to eventually get back in
touch somehow).

> Or at least the cure will be worse
> than the disease (essentially not allowing certain setups to use BTP
> they can't meet the demands for business reasons)

I disagree. I don't think adding an ack for VOTE is such a major
modification to the current protocol. In some messaging implementations it
may well be implicit anyway.

> >                    So,
> > the coordinator must make persistent all information about participants
> > before it even enters the prepare phase so that it can contact them in
> > event of failures and find out what they did. I certainly wouldn't want
> > not know that an autonomous decision was made contrary to my preferred
> > outcome.
> with spontaneous VOTEs, you would have to make the information persistent
> before sending the context out - but in general you don't know the
> participants then anyway.  Hmm - perhaps that won't matter - if a
> coordinator (manager) receives an incoming INFERIOR_STATUS/confirmed for
> atom that cancelled or is unknown, then it knows there was a
> It may not know what the transaction was about (or who the participant
> but it knows something has gone wrong.

Agreed. It's a matter of who has the most responsibility for determining
there has been a problem, and what has to happen in the case of recovery. A
valid implementation of recovery may want to drive everything from the
coordinator; in which case, with acks, it must record information about all
participants prior to prepare, such that it can tell them to rollback.
Another implementation may want to drive recovery entirely from the
participant; in which case (this is irrespective of the presence of VOTE
acks) a rolled back participant which did prepare first, must record
information about the coordinator so that it can call back to find out what
happened to the transaction.

I'm not against either of these implementations (and obviously a third is
possible that combines the two). However, I'd like to see the possibility
for the following implementation: if a coordinator hasn't heard from a
participant before prepare it can roll back (cancel) without having to
record information about it, safe in the knowledge that it won't cause any
heuristics. If the coordinator got a VOTE and ack-ed then it becomes part of
the recovery case.

> I forsee some white board diagram next week :-)

Let's hope the diagrams can also make it into the document :-)!


Dr. Mark Little (mark@arjuna.com)
Transactions Architect, HP Arjuna Labs
Phone +44 191 2064538
Fax   +44 191 2064203

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]

Powered by eList eXpress LLC