ws-tx message

Subject: Re: [ws-tx] Issue 052 - WS-AT: Replay message generates protocolerrors

From: Alastair Green <alastair.green@choreology.com>
To: Mark Little <mark.little@jboss.com>
Date: Fri, 12 May 2006 10:46:56 +0100

I'm sorry, but I don't get it.

1. Replay is never sent from the Coordinator to the Participant.

2. If the Coordinator never receives Prepared, it resends Prepare. If it never gets Committed back, it resends Commit (your scenario). In each case it does so as often as it wants until it gets the *ed back.

3. If the Participant fails and recovers, it knows that it may not have sent Prepared (it could fail between the log write and the message send), and must communicate the semantic "prepared". A message exists that carries exactly that semantic: Prepared.

If the Participant tries to send Prepared (before or after crash recovery) and the message send fails to its knowledge (one interpretation of comms time out), it resends Prepared.

If the Participant never receives Commit or Rollback (another interpretation of comms time out), it again resends Prepared.

In other words, the Participant sends and resends Prepared until it gets Commit or Rollback, across all failures and for all time.

4. The OTS replay_completion is not a precedent. OTS uses RPCs, not one-way messages. This makes retry behaviour more difficult to model. But if we strip that aside, we see that OTS does exactly the opposite of AT: it does not tolerate communications failure if the prepared semantic fails to get through, and it does not cause premature abort after a recoverable failure in the prepared state. In my view, both OTS and AT are wrong: there is no reason to treat comms failure and crash recovery differently, either in mechanism of retrying or in effect on transaction outcome.

In OTS we say Vote vote = resource.prepare(), and the Vote enumeration tells us whether it's prepared, readonly or rollback. The operation is not idempotent -- a communications failure that prevents the vote returning will cause transaction abort. I think this is wrong and arbitrary, i.e it is a bad precedent and should not be copied. Correctly, AT does not copy this feature, and tolerates this failure (comms time out = resend Prepared).

If the participant fails in OTS then it can't tell when it failed (did it ever return from the prepare operation, i.e. send back the Vote?) So, it has to send a message to say: "I am prepared" (replay_completion), and it will receive a status. It may also get a replay of commit or rollback, as these operations can be duplicated (they are idempotent).

replay_completion is defined as being "a hint to the coordinator" that the prepared participant has never received commit or rollback. As a hint it cannot affect the state or the behaviour of the coordinator, other than to stimulate a replay of commit or rollback, speeding things up. Its semantic is: "I am prepared". (The additional semantic "And once I failed" is irrelevant.). Correctly, in OTS replaying the prepared semantic never causes transaction abort, as it wrongly can in AT.

The only reason for the existence of replay_completion as a distinct operation is because you can't return the response/return value of an RPC twice.

If OTS had modelled this using one ways, it would have ended up with two interfaces (simplified, and forgetting my IDL syntax, and changing the real names to save looking them up):

interface coordinator
{
    void vote (in Vote); // Vote is an enum: Commit = Prepared, Readonly, Rollback
}

interface resource
{
    void prepare();
    void commit();
    void rollback();
}

Our failure scenario would then logically be:

C invokes resource.prepare()
P invokes coordinator.vote (Vote.Commit)
P fails
P invokes coordinator.vote (Vote.Commit)

In AT this appears as

C sends Prepare
P sends Prepared
P fails
P resends Prepared

The separate message replay_completion is an artefact of RPC, not of the requirements of the transaction protocol.

The correct behaviour for AT is to resend Prepared in the face of comms failures, and after crash recovery.

Alastair

Mark Little wrote:

Alastair Green wrote:

Hi Mark,

Just one point:

Mark Little wrote:

Since it crashed in Prepared Success state we should be able to assume that the participant obeyed the rules and made its decision to be able to commit durable. Hence, this Replay message should be interpreted as a), though the semantic of "have recovered" shouldn't exclude the fact that the failure may have been in the network and not the participant service itself (for instance).

One might think so, but in fact when the Participant experiences a comms time out it Resends Prepared (PV state table).

Which begs the question: if that works for comm failures, why do we do something different for process failures which are recovered?

Different type of failure. I interpret the resend of Prepared on comms failure to be in the case where the sender (the participant) knows that the original Prepared wasn't delivered. My original statement above referring to comms failures is more: there was a network partition after Prepared was successfully delivered and this partition has been healed. In the meantime, the coordinator committed, couldn't contact the participant because of the network partition and so must go into some form of recovery mode. From the coordinator's perspective, there is no way for it to distinguish between a network partition and the failure of the machine on which the participant resides. From the participants perspective, there is a difference, though the resolution is the same: it initiates a Replay message.

I just wanted to make sure our definition of failure didn't preclude partitions.

The implication of the two events for the Coordinator, as you point out, should be identical (we are ensuring that the Prepared Success state is communicated to the Coordinator).

But these are different scenarios. As a slight (related) aside: the OTS works fine with replay_completion on the RecoveryCoordinator, so there is precedent for Replay.

Mark.

Alastair

Alastair

Follow-Ups:
- Re: [ws-tx] Issue 052 - WS-AT: Replay message generates protocolerrors
  - From: Mark Little <mark.little@jboss.com>

References:
- RE: [ws-tx] Issue 052 - WS-AT: Replay message generates protocol errors
  - From: "Peter Furniss" <peter.furniss@erebor.co.uk>
- Re: [ws-tx] Issue 052 - WS-AT: Replay message generates protocolerrors
  - From: Mark Little <mark.little@jboss.com>
- Re: [ws-tx] Issue 052 - WS-AT: Replay message generates protocolerrors
  - From: Alastair Green <alastair.green@choreology.com>
- Re: [ws-tx] Issue 052 - WS-AT: Replay message generates protocolerrors
  - From: Mark Little <mark.little@jboss.com>
- Re: [ws-tx] Issue 052 - WS-AT: Replay message generates protocolerrors
  - From: Alastair Green <alastair.green@choreology.com>
- Re: [ws-tx] Issue 052 - WS-AT: Replay message generates protocolerrors
  - From: Mark Little <mark.little@jboss.com>