bt-models message

Subject: Re: Your proposal
From: Mark Little <mcl@arjuna.com>
To: "Temel, Sazi (MBS)" <Sazi.Temel@mortgagefamily.com>,alastair.green@choreology.com
Date: Tue, 09 Oct 2001 10:37:07 +0000
> I think you are assuming a wider scope for what I am suggesting. What I am
> suggesting is not trying to resolve problems of entire application logic
or
> workflow etc. It simply is a help for participants of the atoms to recover
> independently, it is not a recovery of the workflow, business level
> agreements and application.

My point is that they did not participant in the application independently,
so why should they be able to compensate independently? Try looking at the
application as a whole, and not individual components of it, please!

If BEA want to do this as an added value feature, then they are free to do
so. The requirement for this is not proven, and from the general
availability of workflow systems I can probably wheel out enough counter
arguments. If you (BEA) do this and there is a mass take-up of this idea
then we should obviously revisit it. However, as I have tried to point out
time and time again, this will impose a significant requirement on users and
implementers of BTP. We should not be adding it simply at a whim.

Now, if what you want is simply to *inform* participants that the
transaction has terminated, and not specify what they can do based on this,
then that's a different matter. In the OTS specification, for example, there
are equivalents of BTP participants (called Resources), and special
participants called Synchronizations (obviously the same end-point can do
two rolls if it wants). These Synchronizations are informed before the
transaction starts (before_completion) to complete, and can have an affect
on that completion by forcing it to rollback. They are then informed when
the transaction completes (after_completion), and can have *no* effect on
the transaction outcome. In addition, information about them is not
maintained persistently by the coordinator, so it does not need to write
them to its intentions list, and if it fails, they don't get informed.

The OTS specification (and JTA which took them on board as well) does not
say what the Synchronization can do when it receives an after_completion
invocation, and it should not. However, if I really wanted to I suppose I
could compensate in that. It would break the transaction model and the
entire application semantics, but hey, if I'm so sure the user wants that
then why not? Saves having to write/OEM a workflow system.

>
> The **fact** is that there will be some situations where some participants
> of **atoms** are confirmed some canceled.

**Agreed**

> When this happens the coordinator
> already marked this atom 'not confirming'

And this is where we may start to diverge in what we think is going on. If
PREPARE has been sent to all participants and some have CONFIRMED and then
some CANCEL, the atom is not marked as "not confirming". It will CONFIRM and
a CONTRADICTION will have occurred. It's the same as in a transaction system
where some participants raise heuristics: the transaction may well still
commit and knowledge about the heuristic may be propagated back to the
terminator.

If a participant CANCELs during PREPARE, then the other participants will be
told to CANCEL too. Obviously if some of them independently CONFIRMED then
they will have caused the CONTRADICTION, and not the ones that CANCELED.

> thus the business process or
> workflow or simply the client knows the fact, it is not going to assume
> opposite, it knows that A*azon is not going to send the book!

No it does not. See above. Please verify at what stage during the 2PC you
assume the CANCEL happens. Is it synchronous or asynchronous (i.e.,
independent)?

> so that it can
> choose another atom, e.g another book store - but by letting know A*azon
> that the transaction is failed we will be helping them not to send the
book
> because you are going to p*ssed off when you get a bill for the book that
> you did not order - you already chosen another atom, another bookstore!

Hmmm, sounds like a good place for a workflow system then. Please take a
look at some of these - there are quite a few in the market, and you will
see how applicable they are.

> Are
> you suggesting that even if some participants of atom canceled coordinator
> should confirm the atom?, if so how many of cancel will be enough to let
> coordinator understand that the atom actually does exist anymore?

I think you have a misunderstanding of the way in which the atom works.
Let's take the simplest case first:

(i) During phase 1 it sends PREPARE to all participants. Now suppose they
all say PREPARED and none will act independently. Now the coordinator sends
CONFIRM and if they all CONFIRM then everything's great! Now suppose that
for one reason or another a participant could PREPARE, but couldn't CONFIRM
at the end (e.g., the hard disk crashed). If this is the first participant
the coordinator sends CONFIRM to it will be able to change its decision to
CANCEL, and send CANCEL to all other participants (let's assume also that
they do as they are told). In this case, the atom as a whole CANCEL-ed. Now,
suppose that this rouge participant is someway down the coordinator's
intentions list. It has sent CONFIRM to some participants and they have
CONFIRMED and "gone away", so it can't undo this. And in fact out of N
participants it may only be 1 that cannot CONFIRM after PREPARE and this may
be number N/2. So rather than CANCEL all other participants after we reach
number N/2, what TP systems typically do is continue on with the
coordinator's decision to CONFIRM the others and remember (durably) the
failed participant(s). The transaction (atom) has still CONFIRM-ed though.
What we do with the failures will depend upon the type of failure. For
example, if it was a transient failure such as a comms failure, then the
coordinator may periodically try to CONFIRM the participant. If it was a
definitive answer from the participant that it could not, and can never,
CONFIRM, then this is a heuristic, and it is reported to the terminator
application to deal with.

Now let's take the slightly more complicated scenario of independent
confirms:

(ii) the coordinator sends PREPARE to all participants. No participant can
independently CONFIRM until it has received a PREPARE, but it can CANCEL
prior to PREPARE and in which case the atom must CANCEL too. If some
PREPARE-d participants can't CANCEL as a result then they have caused a
heuristic, and see above. When the coordinator sends CONFIRM, say, to the
participants, if some of them say they have already CANCEL-ed (because, for
example, the coordinator was too slow in making the final decision) then
it's pretty much as (i), i.e., depending upon where the participant is in
the intentions list we will either have an entirely CANCEL-ed atom, or one
which is CONFIRM-ed and has possibly got a heuristic.

Can you just confirm that you agree with all of this?

>
> I am **assuming** that similar situation will occur enough that requires
> some thinking to find a solution to **reduce** the inconsistencies that
may
> occur.

Yes, and that thinking has been done by various workflow and process flow
people over the years.

> This proposal, specially will **help** to the participant that is
> confirmed while the **atom** failed.

But as I keep saying, it is not up to the participant to independently
decide that it can compensate itself when it told the *application
coordinator* that it had confirmed. As far as the coordinator is concerned,
the participant has confirmed and will never un-confirm. If this is not the
case, then we need to run another completion protocol (more phases!) between
the coordinator and these participants so that the coordinator can return a
*definitive* answer to the invoker about what happened. Or is this not
important in your scenarios? I know our customers would find it really
useful to know the final outcome.

>  BTP does help the canceled participant
> by sending CONTRADICTION message already - but does not require any
actions
> to eliminate the contradiction.

No, and that is the right thing for it to do because it will be highly
application/resource dependent on how to resolve this. Take a look at TP
systems.

> It is also clear that this problem may be
> attempt to be resolved by asking to the canceled participant to
> re-think/revise its decision of canceling, because there are others
already
> confirmed which I think this is what Keith Weir was suggesting. I think
this
> second way of resolving the contradiction is valid but more cumbersome
than
> just letting the confirmed participants know the results of the atom and
> take necessary actions whatever it might be (the atom is already
canceled).

That's a different situation, but one which we could consider in a revision
task force.

> The best solution would be to require all the confirmed participants keep
> the log until the 'complete' message arrived. This way it will be a
complete
> recovery for the atom and how the individual participant recover is not
> concern of atom coordinator. But an optional qualifier in the CONFIRMED
> message may do the job - I am assuming no participants wants to be doing
> inconsistent work thus they all will set such qualifier (note that if such
> qualifier exist coordinator should honor the request).

I disagree, but then that shouldn't come as a surprise ;-)

It is too late to add such a significant modification IMO. Let's get this
specification adopted now, and people can then use it. That's the only way
we can resolve many of the outstanding "niggles" that people have: prove
them through use cases.

>
> Shortly,
> 1) It is a fact that this situation will occur (I feel it will
> happen more than you think),

As I said in an earlier email, this is all conjecture at the moment.

> 2) It is clear to me that it is not an attempt to alter business
> logic, it is a generic attempt to beware the consistency at atom level. We
> have relaxed/reduced 'I' and 'D' of ACID but should keep 'C' as much as
> possible - after all the protocol is to create some degree of consistency!

No, it is *exactly* an attempt to alter business logic. You tell me how the
independent compensation of a participant without recourse to what the
driver of the application wants is not anything other than such an
alteration?!

> 3) There are other alternatives that address the same issue (let the
> contradicted participant revise its decision), but I think are more
> cumbersome, and at the end it may still need to let confirmed participant
> involve..

The best "attempt" is to use workflow layered on BTP.

> 4) There are some performance penalty, I am not sure how much, needs
> to be clarified, 5) The best solution would be as suggested
> originally - all the participants keep the log around until a
final_outcome
> message received (not necessarily lock or not to do the job, they may
> already have done the work),

"best" is definitely subjective.

> 6) The suggestion from Alastair and Peter (with minor modifications
> for coordinator honoring every request for a final_outcome) will satisfy
the
> need since I am assuming all participants will be interested in requesting
> the final outcome message!
>
> Looks like I am repeating (like a broken record!) the same explanations...
> hope I have been able to answer some of your questions and concerns.

Not quite!

Mark.

----------------------------------------------
Dr. Mark Little
Transactions Architect, HP Arjuna Labs
Email: mark@arjuna.com | mark_little@hp.com
Phone: +44 191 2064538
Fax  : +44 191 2064203