business-transaction message

Subject: Re: Your proposal

From: Sazi Temel <sazi.temel@bea.com>
To: business-transaction@lists.oasis-open.org
Date: Tue, 09 Oct 2001 20:47:24 -0700

Mark,
Although I belive I raised an issue that is still important to resolve, I am convinced that what we need is to concentrate on is, now, finishing the spec on time, thus I am freezing my proposal for the revision task force. I think, as suggested by Peter and Alastair's it has at least some value so I expect that it will be revisited later.

Included my comments and answers for you inline..

Regards,
--Sazi

At 10:37 AM 10/9/01 +0000, you wrote:

> I think you are assuming a wider scope for what I am suggesting. What I am
> suggesting is not trying to resolve problems of entire application logic
or
> workflow etc. It simply is a help for participants of the atoms to recover
> independently, it is not a recovery of the workflow, business level
> agreements and application.

My point is that they did not participant in the application independently,
so why should they be able to compensate independently? Try looking at the
application as a whole, and not individual components of it, please!

None of the participants know other participants... if there will be any compensation (which I am not saying there will be, I am saying what ever action each participant thinks is appropriate for their recovery) each participant/service pair knows what to do.

If BEA want to do this as an added value feature, then they are free to do
so.

Thanks!

The requirement for this is not proven, and from the general
availability of workflow systems I can probably wheel out enough counter
arguments. If you (BEA) do this and there is a mass take-up of this idea
then we should obviously revisit it.

Technical details of discussions amongst us are not really ideas of our companies, rather individuals own (and I am enjoying such freedom at BEA) - we are trying to put together a big picture with a lot of details that is the Business Transaction Protocol. BEA, and HP as well as all the others - both members or non-members of OASIS will benefit from a well thought, complete, useful, robust... spec. That's why we have all these discussions.. I am sure one will regret it if one creates a solution which is not as perfect as it could be..

However, as I have tried to point out
time and time again, this will impose a significant requirement on users and
implementers of BTP. We should not be adding it simply at a whim.

I was not suggesting that we should add it without any discussions, in fact I was not even putting it on top of the issues list we have...

Now, if what you want is simply to *inform* participants that the
transaction has terminated, and not specify what they can do based on this,
then that's a different matter. In the OTS specification, for example, there
are equivalents of BTP participants (called Resources), and special
participants called Synchronizations (obviously the same end-point can do
two rolls if it wants). These Synchronizations are informed before the
transaction starts (before_completion) to complete, and can have an affect
on that completion by forcing it to rollback. They are then informed when
the transaction completes (after_completion), and can have *no* effect on
the transaction outcome. In addition, information about them is not
maintained persistently by the coordinator, so it does not need to write
them to its intentions list, and if it fails, they don't get informed.

The OTS specification (and JTA which took them on board as well) does not
say what the Synchronization can do when it receives an after_completion
invocation, and it should not. However, if I really wanted to I suppose I
could compensate in that. It would break the transaction model and the
entire application semantics, but hey, if I'm so sure the user wants that
then why not? Saves having to write/OEM a workflow system.

>
> The **fact** is that there will be some situations where some participants
> of **atoms** are confirmed some canceled.

**Agreed**

> When this happens the coordinator
> already marked this atom 'not confirming'

And this is where we may start to diverge in what we think is going on. If
PREPARE has been sent to all participants and some have CONFIRMED and then
some CANCEL, the atom is not marked as "not confirming". It will CONFIRM and
a CONTRADICTION will have occurred. It's the same as in a transaction system
where some participants raise heuristics: the transaction may well still
commit and knowledge about the heuristic may be propagated back to the
terminator.

As far as I know in case of REQUEST_CONFIRM we cannot propagate a MIXED result back to terminator. Perhaps Peter may give more insight on this. If it is so, Coordinator can only let the terminator know that it is either CONFIRMED or CANCELED (with first terminator will assume everything was ok, with the second it will think transaction is canceled). But the situation above very much requires a MIXED message (with some detail info) to be sent to terminator.

If a participant CANCELs during PREPARE, then the other participants will be
told to CANCEL too. Obviously if some of them independently CONFIRMED then
they will have caused the CONTRADICTION, and not the ones that CANCELED.

Yes.

> thus the business process or
> workflow or simply the client knows the fact, it is not going to assume
> opposite, it knows that A*azon is not going to send the book!

No it does not. See above. Please verify at what stage during the 2PC you
assume the CANCEL happens. Is it synchronous or asynchronous (i.e.,
independent)?

I do not belive what we have here is a plain 2PC, we have a lot of provisions in BTP that let the participant send HAZARD, CANCEL even RESIGN after receiving CONFIRM request and you are putting all these into basket of heuristic decisions to make it plain 2PC. Nothing wrong with it theoretically, except that now you should expect a lot more heuristic decisions.

> so that it can
> choose another atom, e.g another book store - but by letting know A*azon
> that the transaction is failed we will be helping them not to send the
book
> because you are going to p*ssed off when you get a bill for the book that
> you did not order - you already chosen another atom, another bookstore!

Hmmm, sounds like a good place for a workflow system then. Please take a
look at some of these - there are quite a few in the market, and you will
see how applicable they are.

> Are
> you suggesting that even if some participants of atom canceled coordinator
> should confirm the atom?, if so how many of cancel will be enough to let
> coordinator understand that the atom actually does exist anymore?

I think you have a misunderstanding of the way in which the atom works.
Let's take the simplest case first:

(i) During phase 1 it sends PREPARE to all participants. Now suppose they
all say PREPARED and none will act independently.

Good.

Now the coordinator sends
CONFIRM and if they all CONFIRM then everything's great!

Great!

Now suppose that
for one reason or another a participant could PREPARE, but couldn't CONFIRM
at the end (e.g., the hard disk crashed). If this is the first participant
the coordinator sends CONFIRM to it will be able to change its decision to
CANCEL, and send CANCEL to all other participants (let's assume also that
they do as they are told). In this case, the atom as a whole CANCEL-ed.

I am not sure on 'it being first participant'... looks like implementation detail.. I am not following you here...

I assume in the situation above there will be a recovery path (perhaps when it comes up it will ask coordinator the result of the transactions etc.) Any way this situation is not what I am thinking that will cause problem.

Now,
suppose that this rouge participant is someway down the coordinator's
intentions list. It has sent CONFIRM to some participants and they have
CONFIRMED and "gone away", so it can't undo this. And in fact out of N
participants it may only be 1 that cannot CONFIRM after PREPARE and this may
be number N/2. So rather than CANCEL all other participants after we reach
number N/2, what TP systems typically do is continue on with the
coordinator's decision to CONFIRM the others and remember (durably) the
failed participant(s). The transaction (atom) has still CONFIRM-ed though.

Yes.

What we do with the failures will depend upon the type of failure. For
example, if it was a transient failure such as a comms failure, then the
coordinator may periodically try to CONFIRM the participant. If it was a
definitive answer from the participant that it could not, and can never,
CONFIRM, then this is a heuristic, and it is reported to the terminator
application to deal with.

Yes and what we are assuming here that in normal conditions there will be OK (as response to COMMIT request) in 2PC, but in BTP there are HAZARD, CANCEL, and RESIGN as response to CONFIRM request which indicates that we are expecting such messages less frequently but as expected responses... and we are sending a CONTRADICTION message back... - I am not sure, you may have better info on this, but I do not think there is a CONTRADICTION message or something like that sent back to participants in plain 2PC.

Another difference between plain 2PC and BTP's 2PO is that in BTP the participants don not know each other, thus any recovery should go thru the coordinator which is the only actor that may know the participants.

Now let's take the slightly more complicated scenario of independent
confirms:

(ii) the coordinator sends PREPARE to all participants. No participant can
independently CONFIRM until it has received a PREPARE, but it can CANCEL
prior to PREPARE and in which case the atom must CANCEL too.

I am not sure but I think actually there are situations that participants sent CONFIRMED prior to receiving CONFIRM in BTP, I will check state diagrams for this..

If some
PREPARE-d participants can't CANCEL as a result then they have caused a
heuristic,

Yes.

and see above. When the coordinator sends CONFIRM, say, to the
participants, if some of them say they have already CANCEL-ed (because, for
example, the coordinator was too slow in making the final decision) then
it's pretty much as (i),

Yes.

i.e., depending upon where the participant is in
the intentions list we will either have an entirely CANCEL-ed atom, or one
which is CONFIRM-ed and has possibly got a heuristic.

Can you just confirm that you agree with all of this?

See above.

>
> I am **assuming** that similar situation will occur enough that requires
> some thinking to find a solution to **reduce** the inconsistencies that
may
> occur.

Yes, and that thinking has been done by various workflow and process flow
people over the years.

> This proposal, specially will **help** to the participant that is
> confirmed while the **atom** failed.

But as I keep saying, it is not up to the participant to independently
decide that it can compensate itself when it told the *application
coordinator* that it had confirmed.

Of course not, participant is not independently deciding - it will do so only if coordinator send it 'NOT_COMPLETED'

As far as the coordinator is concerned,
the participant has confirmed and will never un-confirm. If this is not the
case, then we need to run another completion protocol (more phases!) between
the coordinator and these participants so that the coordinator can return a
*definitive* answer to the invoker about what happened. Or is this not
important in your scenarios? I know our customers would find it really
useful to know the final outcome.

> BTP does help the canceled participant
> by sending CONTRADICTION message already - but does not require any
actions
> to eliminate the contradiction.

No, and that is the right thing for it to do because it will be highly
application/resource dependent on how to resolve this. Take a look at TP
systems.

I am not saying anything against it.

> It is also clear that this problem may be
> attempt to be resolved by asking to the canceled participant to
> re-think/revise its decision of canceling, because there are others
already
> confirmed which I think this is what Keith Weir was suggesting. I think
this
> second way of resolving the contradiction is valid but more cumbersome
than
> just letting the confirmed participants know the results of the atom and
> take necessary actions whatever it might be (the atom is already
canceled).

That's a different situation, but one which we could consider in a revision
task force.

Well, we can also consider the issue we are discussing here in a revision task force...

> The best solution would be to require all the confirmed participants keep
> the log until the 'complete' message arrived. This way it will be a
complete
> recovery for the atom and how the individual participant recover is not
> concern of atom coordinator. But an optional qualifier in the CONFIRMED
> message may do the job - I am assuming no participants wants to be doing
> inconsistent work thus they all will set such qualifier (note that if such
> qualifier exist coordinator should honor the request).

I disagree, but then that shouldn't come as a surprise ;-)

It is too late to add such a significant modification IMO. Let's get this
specification adopted now, and people can then use it. That's the only way
we can resolve many of the outstanding "niggles" that people have: prove
them through use cases.

>
> Shortly,
> 1) It is a fact that this situation will occur (I feel it will
> happen more than you think),

As I said in an earlier email, this is all conjecture at the moment.

> 2) It is clear to me that it is not an attempt to alter business
> logic, it is a generic attempt to beware the consistency at atom level. We
> have relaxed/reduced 'I' and 'D' of ACID but should keep 'C' as much as
> possible - after all the protocol is to create some degree of consistency!

No, it is *exactly* an attempt to alter business logic. You tell me how the
independent compensation of a participant without recourse to what the
driver of the application wants is not anything other than such an
alteration?!

> 3) There are other alternatives that address the same issue (let the
> contradicted participant revise its decision), but I think are more
> cumbersome, and at the end it may still need to let confirmed participant
> involve..

The best "attempt" is to use workflow layered on BTP.

> 4) There are some performance penalty, I am not sure how much, needs
> to be clarified, 5) The best solution would be as suggested
> originally - all the participants keep the log around until a
final_outcome
> message received (not necessarily lock or not to do the job, they may
> already have done the work),

"best" is definitely subjective.

> 6) The suggestion from Alastair and Peter (with minor modifications
> for coordinator honoring every request for a final_outcome) will satisfy
the
> need since I am assuming all participants will be interested in requesting
> the final outcome message!
>
> Looks like I am repeating (like a broken record!) the same explanations...
> hope I have been able to answer some of your questions and concerns.

Not quite!

Thanks for the discussions!

--Sazi

Mark.

----------------------------------------------
Dr. Mark Little
Transactions Architect, HP Arjuna Labs
Email: mark@arjuna.com | mark_little@hp.com
Phone: +44 191 2064538
Fax : +44 191 2064203

Sazi Temel
Principal Consultant,
eCommerce Services, bea Systems, inc. [www.bea.com]