ws-tx message

Subject: Re: [ws-tx] Issue 052 - WS-AT: Replay message generates protocolerrors

From: Alastair Green <alastair.green@choreology.com>
To: Mark Little <mark.little@jboss.com>
Date: Fri, 12 May 2006 17:03:15 +0100

Hi Mark,

Comments interleaved:

Mark Little wrote:

Alastair Green wrote:

I'm sorry, but I don't get it.

1. Replay is never sent from the Coordinator to the Participant.

I never said that, did I?

No, you didn't. I was establishing a premise, that we are only interested in participant-driven recovery, and (in the following points) that the principles of coordinator-driven recovery are infinite retry of the same messages till they get thro'. All of which begs the question: "why does AT not use the same principles for participant-driven recovery?".

I spent time on this because you said the following:

"Different type of failure. I interpret the resend of Prepared on comms failure to be in the case where the sender (the participant) knows that the original Prepared wasn't delivered. [AG: that is, a participant-detected failure.] My original statement above referring to comms failures is more: there was a network partition after Prepared was successfully delivered and this partition has been healed. In the meantime, the coordinator committed, couldn't contact the participant because of the network partition and so must go into some form of recovery mode. [AG: which is to say, coordinator-side recovery] From the coordinator's perspective, there is no way for it to distinguish between a network partition and the failure of the machine on which the participant resides. From the participants perspective, there is a difference, though the resolution is the same: it initiates a Replay message. [AG: there is no difference between the perception of failure of the C by the P, or the P by the C. If the net is down or divided, either sender may get a comm failure.]

My point was: what is the relevance of being a C or a P? The behaviour needed (indefinite retry) is identical for Prepare/Prepared, and Prepared/Commit | Rollback. Both sides should have the same retry behaviour.

2. If the Coordinator never receives Prepared, it resends Prepare. If it never gets Committed back, it resends Commit (your scenario). In each case it does so as often as it wants until it gets the *ed back.

Sure. But that's top-down (coordinator driven recovery). I thought what we were discussing was bottom-up (participant driven recovery). Can you confirm that is your reading of the original issue too?

Absolutely. I am trying to establish we all agree on the principle of infinite retry-driven guaranteed message delivery. It's followed C to P, but it isn't followed in P to C, which is the precise point of this issue.

3. If the Participant fails and recovers, it knows that it may not have sent Prepared (it could fail between the log write and the message send), and must communicate the semantic "prepared". A message exists that carries exactly that semantic: Prepared.

Or, it could send Replay ;-)?

Yes, we could, but why on earth would we define a new message for the same semantic?

If the Participant tries to send Prepared (before or after crash recovery) and the message send fails to its knowledge (one interpretation of comms time out), it resends Prepared.

Sure. No argument there: if it knows the Prepared failed to be delivered, then it can obviously resend for an implementation (potentially infinite) time. It could then periodically keep retrying. Or, it could send Replay later.

Having an extra message requires justification. Saying it exists is not a justification. We don't have two messages with subtly differing semantics (or odder still, with identical semantics) for coordinator-driven recovery.

If the Participant never receives Commit or Rollback (another interpretation of comms time out), it again resends Prepared.

Or Replay.

In other words, the Participant sends and resends Prepared until it gets Commit or Rollback, across all failures and for all time.

4. The OTS replay_completion is not a precedent. OTS uses RPCs, not one-way messages. This makes retry behaviour more difficult to model. But if we strip that aside, we see that OTS does exactly the opposite of AT: it does not tolerate communications failure if the prepared semantic fails to get through, and it does not cause premature abort after a recoverable failure in the prepared state. In my view, both OTS and AT are wrong: *there is no reason to treat comms failure and crash recovery differently, either in mechanism of retrying or in effect on transaction outcome.*

In OTS we say Vote vote = resource.prepare(), and the Vote enumeration tells us whether it's prepared, readonly or rollback. The operation is not idempotent -- a communications failure that prevents the vote returning will cause transaction abort. I think this is wrong and arbitrary, i.e it is a bad precedent and should not be copied. Correctly, AT does not copy this feature, and tolerates this failure (comms time out = resend Prepared).

I think you're definitely misinterpreting my reference to replay_completion: I'm talking only about the bottom-up recovery scenario, which is exactly the same scenario this issue describes.

In OTS the failure to send (or a failure to deliver) the vote from P to C will cause a comms time out at the coordinator end. OTS treats a comms failure from P to C as causing an abort. AT treats a comms failure from P to C (failure to send or deliver, if there is enough acking going on in the transport) as the occasion for a resend of Prepared. That is the difference I was pointing out. The point of detection in OTS is different, but in both cases we are talking about a message failing to get from P to C.

I agree that in OTS a C to P message failure could also cause transaction abort (a product of the decision to prohibit C to P retries of prepare in OTS). I wasn't trying to comment on that.

If the participant fails in OTS then it can't tell when it failed (did it ever return from the prepare operation, i.e. send back the Vote?) So, it has to send a message to say: "I am prepared" (replay_completion), and it will receive a status. It may also get a replay of commit or rollback, as these operations can be duplicated (they are idempotent).

replay_completion is defined as being "a hint to the coordinator" that the prepared participant has never received commit or rollback. As a hint it cannot affect the state or the behaviour of the coordinator, other than to stimulate a replay of commit or rollback, speeding things up. Its semantic is: "I am prepared". (The additional semantic "And once I failed" is irrelevant.). Correctly, in OTS replaying the prepared semantic never causes transaction abort, as it wrongly can in AT.

I think you're mixing issues, which can only lead to confusion. Let's keep this strictly at the issue in hand. It'll make it easier for everyone else to follow.

I think you missed the point of what this section said. This issue 052 concerns the fact that replay of Prepared causes different C behaviour than sending Retry. OTS correctly makes the Vote returned on prepare (the normal send) have exactly the semantics as replay_completion (the retry send). AT does not follow this good precedent of OTS.

The only reason for the existence of replay_completion as a distinct operation is because you can't return the response/return value of an RPC twice.

If OTS had modelled this using one ways, it would have ended up with two interfaces (simplified, and forgetting my IDL syntax, and changing the real names to save looking them up):

interface coordinator
{
    void vote (in Vote); // Vote is an enum: Commit = Prepared, Readonly, Rollback
}

interface resource
{
    void prepare();
    void commit();
    void rollback();
}

Our failure scenario would then logically be:

C invokes resource.prepare()
P invokes coordinator.vote (Vote.Commit)
P fails
P invokes coordinator.vote (Vote.Commit)

In AT this appears as

C sends Prepare
P sends Prepared
P fails
P resends Prepared

The separate message replay_completion is an artefact of RPC, not of the requirements of the transaction protocol.

The correct behaviour for AT is to resend Prepared in the face of comms failures, and after crash recovery.

I disagree. The correct behaviour is to send Replay.

It seems to me that this is just an assertion: you haven't yet presented a single argument as to why this should be the case. I think you agree that Replay should not have a different semantic than Prepared (the point of this issue 052). If that is true we will create a Replay row in the CV state table that is identical in all respects to the Prepared row, other than the fact that it has the label Replay. Why should we do this? It is truly pointless.

This is the relevance of the OTS comparison. OTS has to have two ways of sending the semantic, because it uses an RPC to get back the semantic in normal operation, and must define a separate operation (message) for recovery. But the reaction to the semantic "prepared" is identical.

AT does not use RPC, every message is a one way. It does not need two messages for one semantic. The only semantic difference between Prepared and Replay, if you accept that early abort is wrong, is that Replay additionally conveys the secondary, irrelevant meaning: "I have failed and recovered". As this is truly secondary (has no effect on the receiver, which acts identically on receipt) it is truly irrelevant.

This can be proved. If a P implementation resends Prepared on recovery (and no-one can stop it doing that) then transactions will complete correctly and with no diminution in QoS.

Having to cater for Replay is either a waste of implementers' time, or it creates an inconsistency, which I think we both agree has no evident rationale. Maybe someone else can provide a rationale, but I haven't seen one yet.

Alastair

Mark.

Alastair

Mark Little wrote:

Alastair Green wrote:

Hi Mark,

Just one point:

Mark Little wrote:

Since it crashed in Prepared Success state we should be able to assume that the participant obeyed the rules and made its decision to be able to commit durable. Hence, this Replay message should be interpreted as a), though the semantic of "have recovered" shouldn't exclude the fact that the failure may have been in the network and not the participant service itself (for instance).

One might think so, but in fact when the Participant experiences a comms time out it Resends Prepared (PV state table).

Which begs the question: if that works for comm failures, why do we do something different for process failures which are recovered?

Different type of failure. I interpret the resend of Prepared on comms failure to be in the case where the sender (the participant) knows that the original Prepared wasn't delivered. My original statement above referring to comms failures is more: there was a network partition after Prepared was successfully delivered and this partition has been healed. In the meantime, the coordinator committed, couldn't contact the participant because of the network partition and so must go into some form of recovery mode. From the coordinator's perspective, there is no way for it to distinguish between a network partition and the failure of the machine on which the participant resides. From the participants perspective, there is a difference, though the resolution is the same: it initiates a Replay message.

I just wanted to make sure our definition of failure didn't preclude partitions.

The implication of the two events for the Coordinator, as you point out, should be identical (we are ensuring that the Prepared Success state is communicated to the Coordinator).

But these are different scenarios. As a slight (related) aside: the OTS works fine with replay_completion on the RecoveryCoordinator, so there is precedent for Replay.

Mark.

Alastair

Alastair

References:
- RE: [ws-tx] Issue 052 - WS-AT: Replay message generates protocol errors
  - From: "Peter Furniss" <peter.furniss@erebor.co.uk>
- Re: [ws-tx] Issue 052 - WS-AT: Replay message generates protocolerrors
  - From: Mark Little <mark.little@jboss.com>
- Re: [ws-tx] Issue 052 - WS-AT: Replay message generates protocolerrors
  - From: Alastair Green <alastair.green@choreology.com>
- Re: [ws-tx] Issue 052 - WS-AT: Replay message generates protocolerrors
  - From: Mark Little <mark.little@jboss.com>
- Re: [ws-tx] Issue 052 - WS-AT: Replay message generates protocolerrors
  - From: Alastair Green <alastair.green@choreology.com>
- Re: [ws-tx] Issue 052 - WS-AT: Replay message generates protocolerrors
  - From: Mark Little <mark.little@jboss.com>
- Re: [ws-tx] Issue 052 - WS-AT: Replay message generates protocolerrors
  - From: Alastair Green <alastair.green@choreology.com>
- Re: [ws-tx] Issue 052 - WS-AT: Replay message generates protocolerrors
  - From: Mark Little <mark.little@jboss.com>