business-transaction message

Subject: Re: Heuristics in BTP atoms

From: Sazi Temel <sazi.temel@bea.com>
To: Mark Little <mcl@arjuna.com>, Alastair Green <alastair.green@choreology.com>
Date: Wed, 03 Oct 2001 22:39:52 -0700

Mark,

At 02:14 PM 10/3/01 +0100, Mark Little wrote:

Sazi, just a few comments on your (essentially 3-phase) protocol extensions:

I think there are two different but some how related issues here: 1) the issue of qualifying PREPARE that is raised by Pal and 2) the proposal of including a new message (a final state) into protocol that is proposed by me (at the conf call last week and partially via prior e-mails).

First of all I have to say that I'm against making any more significant modifications to the protocol at this final stage that aren't strictly necessary.

It is difficult to be against this argument, I guess question is then to figure out whether what I suggest is really a significant modification or not. What I suggest I belive will improve the protocol but there may be many valid reasons not to consider to include it in the protocol at this time.

I am not even trying to put this into high in the agenda that might interrupt ongoing work on finalizing the spec... Obviously if we all agreed at once we would do it; if we think it might be useful but not urgent at this time, we may consider it later or if we think this does not make sense we will forget it :( .

We should all realise that no single protocol can ever be the solution to all of the worlds problems. If we were to try to make BTP do this then it would become a bloated protocol that no-one would ever use. The best we can really do is try for the 80% case, whereby we have a protocol that works perfectly for 80% of the applications that want to use it, and either doesn't work, or works less efficiently for the other 20%.

Very well, agreed.

I'd be satisifed with this. In my opinion BTP already does this.

I think the reason that I suggest this modification is that I 'feel' that failure case might exceed %20... and I think with the suggested modification we may reduce the failure situations. Again, at this stage both your and my suggestions are all best guesses based on our experiences and assumptions.

I'd also like to point out that way back at the start of this I suggested a three-phase protocol had some merits for *some* applications, but not all. Then was the time to discuss it in more detail if people really felt strongly about it (I didn't), not now.

I think my proposal boil-downs to send a final message to a participant of an atomic transaction that is already confirmed but unaware of the fact that the transaction is not reached a final outcome because some other participant cancelled (whether it is because of a PREPARED time-out or other reasons).
So what you're saying is that a participant that confirms shouldn't really do the work until it has received this other message? So, for example, I shouldn't really dispatch my books to the purchaser until I find out from the coordinator whether or not the insurance was actually able to confirm as well. I can see applications where this might be a good idea. However, how long do I wait for this "actual confirm" message to happen? What if I can't actually confirm when it turns up? Add a fourth message? Why can't the application programmer, or service provider, simply implement for this, and use compensation if that's the case? In this situation, for example, the bookshop could have a "insurance failed to complete" method that is invoked at the *application* level within *another* atom/cohesion, that either stops the books, uses an insurance company associated with the bookshop, or something else that might make application sense.

No, I do not suggest that the participant should not do the work until the confirm received at all. I suggest that the participant keeps the log around until the confirm is received (just like the cancelled participant waits a contradiction message from coordinator) so that it can do compensation or any other way of recovery it may see appropriate when it receives 'not completed' message otherwise it receives 'completed' and can remove the log.

In such situation, it is clear that the participant that is already confirmed will experience much more severe consequences than the one canceled.

First of all that's for the application to sort out. If I want to have these kind of guarantees then I should probably be looking at ACID transactions without heuristics, rather than BTP. One protocol for one job, not one protocol for all jobs.

I think it is not difficult to guess that participant confirmed (when transaction failed) will experience more trouble. Again, your assumption is that the participant will wait to do work until a 'completed' message is received, I suggest it will do the work but keep the log until it receives a 'completed' message.

Secondly, how am I as a service provider supposed to program now? I get a confirm, and it's not really a confirm. In fact, it's very similar to a prepare because I can't do any real work on the basis of its reception until I get this third message. How long do I wait? Do I keep the resources blocked/locked until I get this third phase message?

Again, you are assuming that the participant will block, do not do the work until it receives 'completed' message. All I suggest is to let the participant do the work, keep the log (on what has been done) until it receives 'completed'' so in case of failure it may 'have a chance' to try to recover, perhaps compensation, perhaps by other means.

It's starting to look like ACID again, so what are the benefits to me from using this rather than, say, an OTS implementation layered on SOAP? It's starting to look hideously complicated as a protocol now, so perhaps I won't bother using BTP at all. Web Service users expect things to be simple: we shouldn't disappoint them by making BTP more complicated than it really needs to be.

I agree that we should not make BTP complicated, but what is suggested is not making so.

In fact, there is no penalty for canceling,
Well, in fact we have to make a distinction between a participant that definitely cancelled, and one which had failed by the time the confirm message came along.

plus the canceling participant receives a CONTRADICTION message from the superior which will most probably be ignored.

Well BEA wants to put that statement into their product documentation then that's a company matter. However, I certainly won't be encouraging HP or anyone else to tell programmers that they should ignore CONTRADICTION messages. It's the same as telling them to ignore heuristics in CICS, OTS, ... Not a good idea if you want to even attempt to maintain consistency. They are hard things to resolve, that's true, but simply ignoring them is looking for trouble.

Tell me what will a participant do when it received a 'contradiction'? Participant received contradiction because it perhaps timed-out on prepare, perhaps it knowingly issued cancel (remember it is allowed to time-out, and send cancel for what ever reason before receiving a confirm). There is no assumption on what a contradicted participant will do in BTP. BTP do not assume the participant will revise its own decision, beside it most probably sent canceled because it is timed-out - are we not allowing to time-out?

No comments on BEA and HP stuff... BTP will still be fine without my suggestion included, we all are trying to make BTP better, whether we are agreeing or not in details.

On the other hand the confirmed participant will continue to take the necessary business actions to complete its promise (of sending goods! without a transportation arranged!) when it confirmed.

And this will either be dealt with by an application specific compensation (e.g., a workflow style), or even at the physical lever when the books wait at the warehouse for a shipper to turn up and no one does. I'm not saying that a three-phase protocol isn't useful in *some* situations; only that it isn't required in the majority and we shouldn't consider it for this round of BTP.

Since I do not suggest that (although, perhaps I have used the term 3PC in my previous e-mail..) participant to wait to do work until it receives 'completed' (only suggest to keep logs to use it in case of failure), thus there is no locking involved, it is not a 3PC.. You may call it 2.5PC!

Now, my question is: why does the superior bother to send a CONTRADICTION message to the Cancelled participant? There is not much to do since we do not expect the canceling participant will re-consider its decision,

It's not so that the cancelling participant can re-consider, because in all likelihood it won't be able to. It's more so that some administration system/person can use this for a number of reasons, e.g., look at why the participant "failed", see if some compensation can be fired off transparently, ...

Ok, this is a good point. At least there is a way (although I think is more complicated than what I suggest) to initiate a recovery and hopefully help the other participants to recover... but then participants are not aware of each others (only coordinator know all the participants!)... so at best this recovery is very cumbersome.

Coordinator should, in fact, sent a CONTRADICTION to the confirmed participant so that it can take any necessary corrective action (what ever it might be, the protocol does not mandate any action to be taken it just informs the participant that the 'deal is off.') I don't think this is a business level agreement at all.

But the confirmed participant hasn't contradicted the decision. It has done *exactly* what the coordinator asked of it.

You may find a better wording for this, in fact I suggest other words instead of contradiction in this situation. So I agree, participant did **exactly** what is asked. But still when the transaction is failed, the participant that did exactly what is asked is in worse shape then the participant that did not do what is asked!

It's like saying that in an OTS implementation a Resource that throws a heuristic exception from commit shouldn't be told to forget (rough equivalent of CONTRADICTION), but all of the other "committed" participants should be. They (and their BTP equivalents) have finished. They may well have gone away and tied up. There may be no end-point for them anymore. I don't believe CONTRADICTIONS are going to happen that often, so I as a service implementer don't want to have to program compensations into *every single* resource I write just on the off chance that it may be needed.

A 'good' Atom coordinator is trying to reach a common outcome for its participants - no more than the BPT protocol mandates. Also, in case of VT, a Coordinator cannot inform the VT on this 'MIXED' situation (it cannot send MIXED to VT per protocol rules) so even if it is considered as a business level agreement Coordinator cannot help to upper layer to make its decision at least for the VT case. I think the simplest approach would be to Coordinator send the 'final message' to the participant not to the terminator.

So the coordinator sees nothing of this? What happens if the "committed" participants can't uncommit? It seems like you're trying to make the entire protocol atomic by removing the possibility of heuristics. Unfortunately this isn't possible unless we tell implementers that they aren't allowed to produce them, i.e., if they have a resource that says it will prepare, then it *must* prepare, no matter how long it takes for the final commit message to come in. A participant isn't allowed to make a unilateral decision at all.

I think my comments above clarified this... this is not a suggestion of trying to make atomic transactions in BTP more than what it is now. I am just suggesting to let the confirmed participant know that the transaction is not completed successfully so that it can do what ever it can.

That's certainly one protocol that some applications would find useful. It's not, however, a protocol that HP would be interested in supporting for Web Services, since it is no different from using true ACID transactions.

I disagree with you on what I suggested is being an ACID transaction protocol at all.

I agree with you and others who argued that BTP is a protocol for coordination of loosely coupled distributed services etc., but disagree with the argument that a Web Services participant cannot/won't keep the logs (which is necessary to do anything about a 'past' transaction) for long time after it confirmed.

But what is your definition of "long time". The logs you refer to are optional, and we make no call as to how long they have to be maintained for anyway.

We know that BTP already requires keeping the logs around for a while for the participant that is cancelled - participant which is cancelled will keep the log until it hears from the superior (in fact every participant that make an autonomous decision whether it is CANCEL or CONFIRM required to keep logs until they hear from the Coordinator, per BTP rule. Also, Cohesion requires that the logs to be around for long time (as long as the Cohesion is around) so the keeping the logs around for long time is not an issue.

Yes, but you are extending this to require participants who make decisions *only* at the behest of the coordinator to also keep their logs. That is a different scenario, and one which I would not want to see. The performance penalty of this is quite signification.

Ok, another good point. I agree that there may be some performance penalty (at the end it is another message sent and participant keeping the log)

No longer can a committed participant "simply" commit it's work, it now also needs to update a log to say it has done so, even though the coordinator knows this by virtue of the CONFIRMED message that the participant is going to send to the coordinator. That's at least two disk writes and syncs, compared to only one.

Although my proposal is not in anyway based on the issue of qualifying PREPARE, I think prepare time-out will create situations in which the final-outcome message may be used to help recovering of the participants and reach

Firstly participants don't have to qualify their prepared message with any timeout,

Do not have to, but it is in the protocol thus it may be used.

so it's quite valid for a participant to decide to never unilaterally take a decision. In fact, lots of participants could well do the same thing, and such services may well want to publish this kind of qualifier in, say, a UDDI service. That way, clients who never want to end up in a non-ACID situation (ignoring failures for now) can determine who to talk to before hand. Now, failures do occur, and it's just possible for one of these participants to find that despite its best efforts it still can't confirm even if it wants to, e.g., the disk has failed catastrophically. So, heuristics are still possible, but we've narrowed this "window of vulnerability" somewhat.

<Original email deleted.>

I think there are two ways that Coordinator can help to bring both participants to a common outcome

No one can ever guarantee to do this, no matter how many messages we use, or rounds of protocol we have.

I am not suggesting a guarantee that's why I used word 'help' ...

Failures of media, business logic, or whatever, can still happen and prevent "committed" participants from undoing, or "cancelled" participants from committing.

Agreed.

Let's deal with these situations at the application level, or charter a new working group to resolve this in a domain specific manner.

This is not a domain specific issue. Also as I mentioned earlier, in case of VT, coordinator cannot really tell terminator which participant 'contradicted' which one 'committed'.

The easier we make it for people to use BTP, the quicker its take-up will be. A bloated protocol isn't the way to go.

Agreed with its meaning... respectfully disagreed with what might be implied. I do not think my suggestion makes BTP less useful. I agree that it will have some performance penalty, but I think it makes the protocol close to %90 perfectly functional instead of %80 ;)

Regards,

Mark.

----------------------------------------------
Dr. Mark Little
Transactions Architect, HP Arjuna Labs
Email: mark@arjuna.com | mark_little@hp.com
Phone: +44 191 2064538
Fax : +44 191 2064203

Follow-Ups:
- Re: Heuristics in BTP atoms
  - From: Mark Little <mcl@arjuna.com>

References:
- Re: Heuristics in BTP atoms
  - From: Sazi Temel <sazi.temel@bea.com>
- Re: Heuristics in BTP atoms
  - From: Mark Little <mcl@arjuna.com>