business-transaction message

Subject: Fwd: RE: Heuristics in BTP atoms

From: Sazi Temel <sazi.temel@bea.com>
To: business-transaction@lists.oasis-open.org
Date: Fri, 31 Aug 2001 22:31:04 -0700

FYI.

--Sazi

Date: Fri, 31 Aug 2001 15:23:48 -0700 (PDT)
From: Sazi Temel <temels2000@yahoo.com>
Subject: RE: Heuristics in BTP atoms
To: Peter Furniss <peter.furniss@choreology.com>,
        Sazi Temel <sazi.temel@bea.com>,
        business-transaction@lists.oasis-open.org
Cc: temels2000@yahoo.com

Peter,

My comments inline...

Peter Furniss <peter.furniss@choreology.com> wrote:

Sazi,

I agree with your statement of the problem, but I think we've already got this in the CONTRADICTION mechanism, and the requirements on making autonomous decisions. Alastair is just sending out the current working draft of the document, which includes the revised state tables and their explanation. I'm not sure how much on contradiction was in the earlier drafts.

Something to read for long holiday weekend, thanks Alastair!

Yes, the expired Participant timeout is the same as a heuristic (in fact the state tables don't distinguish these, since the expiry of the timeout just causes a quantitative change in the probability of the Participant cancelling - it is possible, though we hope unlikely, that a Participant makes an autonomous (truly heuristic) decision before the timeout). But, following the timeouts (if the client is so foolish as to let the clock run out), it is probable that inconsistencies will be more common than in classic tp systems - ultimately because the resource is owned by and exists for the benefit of someone other than the client.

Agreed.

A Participant that makes an autonomous decision is required to have a persistent record of this, and will retain that record until either receiving the matching "right" answer, or receiving a CONTRADICTION message. At the coordinator, the contradiction is certain to be detected, despite failures. We don't absolutely guarantee that it will be known at the participant, as some failure sequences can lose the messages: (reformat to fixed-pitch font if your mailer doesn't)

When a time-out occurs the participant makes a decision on whether it will wait more or it will cancel, does this considered an autonomous decision? If so in this case the participant should retain the log records? What about the participant that is already confirmed, should it also retain the records or it simply forgets the transaction?

trial 1
superior                                              inferior
I1 :B1 <--------------------------ENROL/no-rsp-req <-- i1 :b1
                              decide to be prepared === b1 :e1
B1 :E1 <----------------------------------PREPARED <-- e1 :e1
E1 :F1 === decide to confirm
                      decide to cancel autonomously === e1 :j1
F1 :F1 --> CONFIRM
F1 :K1&n!
bsp; <---------------------------------CANCELLED <-- j1 :j1
K1 :R1 === record contradiction
R1 :R2 --> CONTRADICTION
                                       disruption 0 XXX j1 :j1
                CONFIRM---X
                CONTRADICTION---X
R2 :Z   === remove persistent information
Z :Y1 <---------------------------------CANCELLED <-- j1 :j1
Y1 :Z   --> SUP_STATE/unknown-------------------------> j1 :j2
                      remove persistent inf! ormation === j2 :z
Superior confirms, inferior cancell! ed - contradiction reported (+!:-)

The only way to avoid this would be another round of messages before the superior was allowed to remove the persistent information (e.g. a CONTRADICTED message from the inferior). But you can more easily avoid it in practice by retaining the superior's record of the contradiction for longer (how long is obviously a management decision - but that R2:Z remove persistent information is a "lazy delete", so it can be postponed as long as you like).

Ok.

Without that exceptional case, the inferior (Participant) *will* know it made the wrong decision, and could reconsider. As you suggest, the coordinator cannot *require* the reconsideration, since again the resource is owned by someone else. (In fact, if the original autonomous decision was made for good reason, it would seem unlikely that the decision will be reversed - if it could have waited until the right answer was known, why didn't wait in the first place)

I was thinking the usual case where the participant knows that it made "wrong" (actually nothing is wrong it just followed the protocol and canceled after the time-out or some other reasons) decision but mean while the other participant is already confirmed and perhaps removed the logs of this transaction... As you pointed out, since the coordinator cannot assume re-consideration of decisions by the participants and it knows that now the transaction is in a contradicting state (one confirmed, one canceled) and it already informed the participant that sent cancel by sending a CONTRADICTING message, I think it should also send the same message to the participant that is confirmed so that it can take necessary actions (undo, compensate etc.) assuming that the participant retained the logs on this particular transaction.

It looks like BTP (atomic) transaction requires a final-outcome message to be sent to the participants (whether the transactions committed or not - since we do not want the participants wait for a while and assume everything went ok).

For the failure case above, the contradiction message is already sent to failing participant. Does the protocol include such a message to be sent to the confirmed participant too? Looks like it is the one that needs such a message... Since participants cannot assume that if they do not get contradiction message from the coordinator in a certain time period they should assume all ok, the final-outcome message should be send in both failure and successful cases...

As it says in the spec, the persisting of the "decide to cancel autonomously" might not actually require a disk write. The participant had to write the "decide to be prepared" record, and if this contains the timeout information, then the mere presence of an expired record means the autonomous decision must have been made. (the implementation has to "know" that it would have removed the record if it had confirmed).

What about the confirmed participant.. does it also retain the logs...and how long..

I know there are a lot of implications on the protocol by adding a final-out come message... I will read the new spec that Alastair is sending before going into details... and perhaps some of the issues are already covered...

Peter

Have a nice weekend.

--Sazi

-----Original Message-----
From: Sazi Temel [mailto:sazi.temel@bea.com]
Sent: 30 August 2001 05:36
To: business-transaction@lists.oasis-open.org
Cc: temels2000@yahoo.com
Subject: Fwd: Heuristics in BTP atoms

-----------------------//---------------------------------------------
Folks,
As I mentioned in an earlier conf-call I am bringing
this issue to your attention. I have some concerns
regarding handling of heuristics in atomic BTP
transactions. I would like to hear your opinion before
I set it to rest. Since I have no access to my BEA
e-mail during the day thus messages I sent to oasis
list does not go through. I am sending this to a few
people who were involved in discussions with me
earlier. Please feel free to distribute it in the
list.

Below an example (shown a message sequence of an atom
coordinator with two participants) of such heuristics
situation that my occur :

Coordinator ----PREPARE-------> Participant_1
Coordinator ----PREPARE-------> Participant_2
Coordinator <---PREPARED------- Part! icipant_2
Coordinator <---PREPARED------- Participant_1

Coordinator ----CONFIRM--------> Participant_1
Coordinator ----CONFIRM--------> Participant_2
Coordinator <---CANCEL---------- Participant_1 (*)
Coordinator <---CONFIRMED------- Participant_2

Now, one participant is prepared and one is canceled.
The assumption here is that a PREPARE is only valid
for certain period of time. When a participant is
PREPARED it may include a condition such as "PREPARED
is valid for 5 sec only". And after that it may take
an either optimistic or pessimistic heuristic action.

My concern is that this will yield a non-atomic
outcome and BTP does not attempt to correct this
inconsistency. Although this situation looks like the
heuristics in 2PC I think it is different since 2PC
participants are usually in the same administrative
domain thus a corrective administrative action! may be
easily taken. My concern is mostly based on my "guess! "
that this situation (taking a heuristics action) will
occur more often in atomic BTP transactions than 2PC,
mostly because a timeout for PREPARED will occur often
due to distributed and loosely coupled systems that
are involved in.

Note that a message sent to Participant_2 to let it
UNDO the it's CONFIRM may help to bring the
participants in a consistent state wrt transaction.
The action to take (undoing) depends on the
participant but BTP can let the participants know that
the transaction is not completed! Although BTP cannot
guarantee a common outcome (and will not be blocking
for-ever protocol), it should not let the participants
be in an inconsistent state, "knowingly" (ordered good
but no shipment is possible!)

It looks like there is another (final outcome)"state"
after CONFIRM that a coordinator informs the
participants that the transaction is completed and
they can forget any log th! at they may have on this
particular transaction.

I have not considered the performance and other
implications for the participants (such as keeping the
log - how long?) of including such a final-message
into the protocol.

Do you think we need such message? Is there such
message in the protocol?

Thanks.
--Sazi

----------------------------------//------------------------------------