business-transaction message

Subject: RE: Heuristics in BTP atoms

From: Sazi Temel <sazi.temel@bea.com>
To: Peter Furniss <peter.furniss@choreology.com>,Sazi Temel <temels2000@yahoo.com>, business-transaction@lists.oasis-open.org
Date: Sat, 01 Sep 2001 21:38:28 -0700

Peter,
I am attempting to reply your e-mail. It looks like I have sent several copies (and perhaps versions) of my previous e-mail, thinking that they were not sent and kept sending... Any way my comments are inline.

--Sazi

At 12:23 PM 9/1/01 +0100, Peter Furniss wrote:

A Participant that makes an autonomous decision is required to have a persistent record of this, and will retain that record until either receiving the matching "right" answer, or receiving a CONTRADICTION message. At the coordinator, the contradiction is certain to be detected, despite failures. We don't absolutely guarantee that it will be known at the participant, as some failure sequences can lose the messages: (reformat to fixed-pitch font if your mailer doesn't)

When a time-out occurs the participant makes a decision on whether it will wait more or it will cancel, does this considered an autonomous decision? If so in this case the participant should retain the log records? What about the participant that is already confirmed, should it also retain the records or it simply forgets the transaction?

This is "promise or threat" timeout question - if the timeout goes off, has the participant said it WILL apply its own decision, or just that it reserves the right to do so. We currently have it that the standard qualifier timeout is a threat, not a promise - it is up to the participant to determine exactly when to make the decision. (However, we do have the default-to-cancel flag as well, which means there is no further exchange if both sides do cancel). It would be essentially unenforceable to require the Participant to make the decision on the dot.

So "autonomous decision" means that the cancel (or confirm) decision is made and applied, not just that it is thinking about it. (At the crunch, the timeout is advisory, warning the superior that it should get its decision there before the timeout goes off, but not making any absolute statements - unlike CANCELLED or CONFIRMED which announce what has happened).

Either way, the Participant is required to retain log records until it receives a message from the Superior - exactly which message depends on which combination of decisions - if the Inferior decision is in line with the Superior's, then it will be the CONFIRM or CANCEL, if contrary then the CONTRADICTION (except in the case of failures like the example, where it is the SUP_STATE/unknown.

trial 1
superior                                              inferior
I1 :B1 <--------------------------ENROL/no-rsp-req <-- i1 :b1
                              decide to be prepared === b1 :e1
B1 :E1 <----------------------------------PREPARED <-- e1 :e1
E1 :F1 === decide to confirm
                      decide to cancel autonomously === e1 :j1
F1 :F1 --> CONFIRM
F1 :K1 <---------------------------------CANCELLED <-- j1 :j1
K1 :R1 === record contradiction
R1 :R2 --> CONTRADICTION
                                       disruption 0 XXX j1 :j1
                CONFIRM---X
                CONTRADICTION---X
R2 :Z   === remove persistent information
Z :Y1 <---------------------------------CANCELLED <-- j1 :j1
Y1 :Z   --> SUP_STATE/unknown-------------------------> j1 :j2
                      remove persistent information === j2 :z
Superior confirms, inferior cancelled - contradiction reported (+!:-)

The only way to avoid this would be another round of messages before the superior was allowed to remove the persistent information (e.g. a CONTRADICTED message from the inferior). But you can more easily avoid it in practice by retaining the superior's record of the contradiction for longer (how long is obviously a management decision - but that R2:Z remove persistent information is a "lazy delete", so it can be postponed as long as you like).

Ok.

Without that exceptional case, the inferior (Participant) *will* know it made the wrong decision, and could reconsider. As you suggest, the coordinator cannot *require* the reconsideration, since again the resource is owned by someone else. (In fact, if the original autonomous decision was made for good reason, it would seem unlikely that the decision will be reversed - if it could have waited until the right answer was known, why didn't wait in the first place)

I was thinking the usual case where the participant knows that it made "wrong" (actually nothing is wrong it just followed the protocol and canceled after the time-out or some other reasons) decision but mean while the other participant is already confirmed and perhaps removed the logs of this transaction..

"wrong" meaning contrary to the decision (or above) the Superior - yes, it's ok for the protocol.

I know :)

The superior can't remove its logs until it has the reply back from the inferior.

I do not mean the coordinator (or superior) but one of the participant (in the sequence diagram that I had in my first posting) in the atom that was already CONFIRMED...

As you pointed out, since the coordinator cannot assume re-consideration of decisions by the participants and it knows that now the transaction is in a contradicting state (one confirmed, one canceled) and it already informed the participant that sent cancel by sending a CONTRADICTING message, I think it should also send the same message to the participant that is confirmed so that it can take necessary actions (undo, compensate etc.) assuming that the participant retained the logs on this particular transaction.

That rather goes against the general assumption that Service/Participants are independent, and linked only at the instigation of the client. If the hotel cancelled and the airline confirmed, I have a problem, but neither of them cares. I'm going to have to do something new at application (or management) level. Obviously there will be scenarios where it can be useful to tell all the parties that someone made a contrary decision, and they could respond variously, but I'm not sure the circumstances will be sufficiently regular to carry in BTP messages.

I think your last sentence that starts with "Obviously there will be scenarios ..." is what I am assuming will occur often enough... The reason I think it will occur often is that if there is a PREPARE time-out, at least, it is possible (if not often occurs) that coordinator and participants will be out of sync and will send contradicting messages (CONFIRM --> <-- CANCEL) across.. The question is then do we need to consider this case in the protocol?

It looks like BTP (atomic) transaction requires a final-outcome message to be sent to the participants (whether the transactions committed or not - since we do not want the participants wait for a while and assume everything went ok).

For the failure case above, the contradiction message is already sent to failing participant. Does the protocol include such a message to be sent to the confirmed participant too? Looks like it is the one that needs such a message... Since participants cannot assume that if they do not get contradiction message from the coordinator in a certain time period they should assume all ok, the final-outcome message should be send in both failure and successful cases...

As it is at present, the contradiction is a bilateral matter between the superior (coordinator) and inferior (participant) - and that is the only relationship the participant is aware of (plus any lower relationships it may have if it is a subcoordinator) - the Participant is unaware of its siblings. It seriously changes the implicit contract if we make the participant aware of the siblings - and in a direction away from the inter-organisational target of BTP, back towards classic transaction systems (where the whole lot is "owned" by one entity, linked in a transaction for their mutual benefit)

I don't think I said or implied in anyway that a participant (unless it is a sub coordinator) will be aware of its siblings... Coordinator knows all its participants and their decisions. Normally when the coordinator sends CONFIRM message to all the participants it expects to get CONFIRMED. If some CANCELED and some CONFIRMED there is a contradicting situation (the transaction is in trouble) that the coordinator is the only actor that can help to recover the participants. It sends CONTRADICTION to the participant that is CANCELED which is good, but I think it should also send a NOT_COMPLETED (or something like that to indicate that the transaction that "you" are CONFIRMED is actually not completed, recover if you can) message to the participants that are already CONFIRMED.

I cannot recall if there is such a message in the protocol, perhaps there is (if so I think we all agree!), I will read the latest spec..

As it says in the spec, the persisting of the "decide to cancel autonomously" might not actually require a disk write. The participant had to write the "decide to be prepared" record, and if this contains the timeout information, then the mere presence of an expired record means the autonomous decision must have been made. (the implementation has to "know" that it would have removed the record if it had confirmed).

What about the confirmed participant.. does it also retain the logs...and how long..

It is required to retain logs until just before it sends back a reply to the superior (CONFIRMED/response in the tables) - it must not have an unmodified prepared log when it sends that. (However, that is a logical log-removal, not necessarily a real one. The critical point is that, if the participant were to crash and recover, any log record would NOT cause the participant to query the superior and, if receiving a SUP_STATE/unknown, then treat that as a cancel.

Sounds like the log is not retained (at the participant) after CONFIRMED is sent, but it may be easily modified to do so.. if required?

Depending on how the underlying resources work, it is possible this doesn't need any real changes to the log records as such.)

I know there are a lot of implications on the protocol by adding a final-out come message... I will read the new spec that Alastair is sending before going into details... and perhaps some of the issues are already covered...

Peter

Have a nice weekend.

And you

Sazi Temel
Principal Consultant,
eCommerce Services, bea Systems, inc. [www.bea.com]

References:

RE: Heuristics in BTP atoms
From: Peter Furniss <peter.furniss@choreology.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]

Powered by eList eXpress LLC