RE: Heuristics in BTP atoms

business-transaction message

Subject: RE: Heuristics in BTP atoms

From: Peter Furniss <peter.furniss@choreology.com>

To: Sazi Temel <sazi.temel@bea.com>, business-transaction@lists.oasis-open.org

Date: Fri, 31 Aug 2001 13:08:22 +0100

Sazi,

I agree with your statement of the problem, but I think we've already got this in the CONTRADICTION mechanism, and the requirements on making autonomous decisions. Alastair is just sending out the current working draft of the document, which includes the revised state tables and their explanation. I'm not sure how much on contradiction was in the earlier drafts.

Yes, the expired Participant timeout is the same as a heuristic (in fact the state tables don't distinguish these, since the expiry of the timeout just causes a quantitative change in the probability of the Participant cancelling - it is possible, though we hope unlikely, that a Participant makes an autonomous (truly heuristic) decision before the timeout). But, following the timeouts (if the client is so foolish as to let the clock run out), it is probable that inconsistencies will be more common than in classic tp systems - ultimately because the resource is owned by and exists for the benefit of someone other than the client.

A Participant that makes an autonomous decision is required to have a persistent record of this, and will retain that record until either receiving the matching "right" answer, or receiving a CONTRADICTION message. At the coordinator, the contradiction is certain to be detected, despite failures. We don't absolutely guarantee that it will be known at the participant, as some failure sequences can lose the messages: (reformat to fixed-pitch font if your mailer doesn't)

trial 1
superior                                              inferior
I1 :B1 <--------------------------ENROL/no-rsp-req <-- i1 :b1
                              decide to be prepared === b1 :e1
B1 :E1 <----------------------------------PREPARED <-- e1 :e1
E1 :F1 === decide to confirm
                      decide to cancel autonomously === e1 :j1
F1 :F1 --> CONFIRM
F1 :K1 <---------------------------------CANCELLED <-- j1 :j1
K1 :R1 === record contradiction
R1 :R2 --> CONTRADICTION
                                       disruption 0 XXX j1 :j1
                CONFIRM---X
                CONTRADICTION---X
R2 :Z   === remove persistent information
Z :Y1 <---------------------------------CANCELLED <-- j1 :j1
Y1 :Z   --> SUP_STATE/unknown-------------------------> j1 :j2
                      remove persistent information === j2 :z
Superior confirms, inferior cancelled - contradiction reported (+!:-)

The only way to avoid this would be another round of messages before the superior was allowed to remove the persistent information (e.g. a CONTRADICTED message from the inferior). But you can more easily avoid it in practice by retaining the superior's record of the contradiction for longer (how long is obviously a management decision - but that R2:Z remove persistent information is a "lazy delete", so it can be postponed as long as you like).

Without that exceptional case, the inferior (Participant) *will* know it made the wrong decision, and could reconsider. As you suggest, the coordinator cannot *require* the reconsideration, since again the resource is owned by someone else. (In fact, if the original autonomous decision was made for good reason, it would seem unlikely that the decision will be reversed - if it could have waited until the right answer was known, why didn't wait in the first place)

As it says in the spec, the persisting of the "decide to cancel autonomously" might not actually require a disk write. The participant had to write the "decide to be prepared" record, and if this contains the timeout information, then the mere presence of an expired record means the autonomous decision must have been made. (the implementation has to "know" that it would have removed the record if it had confirmed).

Peter

-----Original Message-----
From: Sazi Temel [mailto:sazi.temel@bea.com]
Sent: 30 August 2001 05:36
To: business-transaction@lists.oasis-open.org
Cc: temels2000@yahoo.com
Subject: Fwd: Heuristics in BTP atoms

FYI.

-----------------------//---------------------------------------------
Folks,
As I mentioned in an earlier conf-call I am bringing
this issue to your attention. I have some concerns
regarding handling of heuristics in atomic BTP
transactions. I would like to hear your opinion before
I set it to rest. Since I have no access to my BEA
e-mail during the day thus messages I sent to oasis
list does not go through. I am sending this to a few
people who were involved in discussions with me
earlier. Please feel free to distribute it in the
list.

Below an example (shown a message sequence of an atom
coordinator with two participants) of such heuristics
situation that my occur :

Coordinator ----PREPARE-------> Participant_1
Coordinator ----PREPARE-------> Participant_2
Coordinator <---PREPARED------- Participant_2
Coordinator <---PREPARED------- Participant_1

Coordinator ----CONFIRM--------> Participant_1
Coordinator ----CONFIRM--------> Participant_2
Coordinator <---CANCEL---------- Participant_1 (*)
Coordinator <---CONFIRMED------- Participant_2

Now, one participant is prepared and one is canceled.
The assumption here is that a PREPARE is only valid
for certain period of time. When a participant is
PREPARED it may include a condition such as "PREPARED
is valid for 5 sec only". And after that it may take
an either optimistic or pessimistic heuristic action.

My concern is that this will yield a non-atomic
outcome and BTP does not attempt to correct this
inconsistency. Although this situation looks like the
heuristics in 2PC I think it is different since 2PC
participants are usually in the same administrative
domain thus a corrective administrative action may be
easily taken. My concern is mostly based on my "guess"
that this situation (taking a heuristics action) will
occur more often in atomic BTP transactions than 2PC,
mostly because a timeout for PREPARED will occur often
due to distributed and loosely coupled systems that
are involved in.

Note that a message sent to Participant_2 to let it
UNDO the it's CONFIRM may help to bring the
participants in a consistent state wrt transaction.
The action to take (undoing) depends on the
participant but BTP can let the participants know that
the transaction is not completed! Although BTP cannot
guarantee a common outcome (and will not be blocking
for-ever protocol), it should not let the participants
be in an inconsistent state, "knowingly" (ordered good
but no shipment is possible!)

It looks like there is another (final outcome)"state"
after CONFIRM that a coordinator informs the
participants that the transaction is completed and
they can forget any log that they may have on this
particular transaction.

I have not considered the performance and other
implications for the participants (such as keeping the
log - how long?) of including such a final-message
into the protocol.

Do you think we need such message? Is there such
message in the protocol?

Thanks.
--Sazi

----------------------------------//------------------------------------

Sazi Temel
Principal Consultant,
eCommerce Services, bea Systems, inc. [www.bea.com]

References:

Fwd: Heuristics in BTP atoms
- From: Sazi Temel <sazi.temel@bea.com>