OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

business-transaction message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: Re: Coordinator timeouts (was Re: Managers, addressses and the like)


> I agree with your substantive point on the nature of coordinator
> timeout. It does not mix well with participant timeouts, on reflection.
>
> I have some questions and issues, however.
>
> I presume that the coordinator timeout is transmitted in the CONTEXT,
> and that it is in some way communicated to participants. I presume that
> services that forward the CONTEXT are allowed to trim the timeout to
> avoid excessive prolongation (minor point).

Yes.

> Here's where it gets a little less obvious. If a participant gets the
> timeout and it expires then it is only advisory: if the participant is
> happy to hang around and hold resources after the expiry, that is its
> business. It will never go wrong, get a wrong result. It could decide to
> send an INFERIOR_STATUS to the coordinator, and get back an Active
> status, in which case it might decide to hang on. If the status messages
> included a timeout value then it could receive an updated timeout. In
> other words, this coordinator timeout could be viewed as a hint (can't
> really be viewed as anything else).

If a coordinator timeout goes off then the participant could hang around if
it so wants. However, it cannot make an independent CONFIRM choice. If it
doesn't want to hang around then it can only CANCEL. This is different to
the participant-specific timeout, where I assume it can go eitherway.

> The minutes from Mt Laurel are accurate. The decision concerned
> participant timeouts, and is accurately recorded. The point that you
> raised there on participant timeouts, which was agreed to, is also
> recorded, namely that the PREPARE can be qualified with a "minimum upper
> bound" on the participant timeout, sent from the Coordinator to the
> Participant.
>
> There was no decision or concrete proposal about coordinator timeouts.
> It was mentioned in discussion, but mentioning things doesn't make them
> decisions, nor did the minutes attempt to record every issue that was
> mentioned.

I disagree. Your minutes are inaccurate. Simply because you did not make a
note of it does not mean that the discussion did not take place. It did, and
it was about propagating coordinator timeouts to subcoordinators.

> Every thing I minuted as agreed was read over to the meeting
> before a vote was taken, and before I noted it, sometimes verbatim,
> sometimes very close to verbatim when we hadn't arrived at the
> absolutely precise wording.

I think that it certainly was not an officially voted-on decision, but it
was discussed and should have been minuted. Apologise for not noting this
oversight in the minutes earlier.

> A side comment on participant timeouts: I think that you will not avoid
> the need for failure recovery in participants by participant timeouts,
> if that is what you have in mind.

No it isn't for all situations, but it is for others - remember we are using
a "presumed abort-like" protocol.

> Also, the motivation for participant
> timeouts is that a coordinator should not be able to wrest control of
> time-sensitive data or locks from its owner. Denial of service is only
> one special case of this. Expiry of offers is actually a much more
> likely use, in my view.

As I said in earlier emails, there are two different timeouts that we should
support, and both serve different roles. The coordinator is not trying to
wrest control over data from a participant with the coordinator-timeout:
it's trying to prevent a number of things:

(i) the situation where a resource hasn't been prepared and the coordinator
fails. There may well be resource implementations that would happily sit
around for years as long as they haven't got to prepare. After a week of
inactivity (for example) they may periodically probe the coordinator to find
out what's going on. If I can cut down on that message with a timeout
meaning an implicit failure, then I'd like to do that (and I would expect
you would too, given the amount of email there has been on boxcarring!)

(ii) the coordinator is created by an initiator who then fails before
prepare has been sent to it. I don't want to have to write initiators who
must keep persistent information on all coordinators they are using before
they ask them to prepare. I could certainly do that but it's a performance
bottleneck, and the critical point for saving the coordinator information is
at prepare, and not before. If my initiator fails prior to prepare then I'd
quite like the system to tidy-up for me, i.e., undo the coordinator
automatically. With a timeout, there's an implicit CANCEL message that never
needs to be sent.

Mark.

-----------------------------------------------------------------------
SENDER : Dr. Mark Little, Architect (Transactions), HP Arjuna Labs
PHONE  : +44 191 222 8066, FAX : +44 191 222 8232
EMAIL  : mark@arjuna.com





[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC