Re: [ws-tx] Issue 052 - WS-AT: Replay message generates protocolerrors

Alastair,

As we discussed below, the replay message is typically sent by a participant that is in an in-doubt situation. It should not be used for replaying a previous protocol message as the specification currently states.

The definition of Replay message should read along these lines:

“Upon receipt of this notification, the coordinator may assume the participant has suffered a recoverable failure. It should resend the transaction outcome (commit or rollback protocol notification) to the in-doubt participant.”

From: Alastair Green [mailto:alastair.green@choreology.com]
Sent: Thursday, May 18, 2006 4:14 AM
To: Ram Jeyaraman
Cc: Mark Little; Peter Furniss; ws-tx@lists.oasis-open.org
Subject: Re: [ws-tx] Issue 052 - WS-AT: Replay message generates protocol errors

Ram,

Absolutely. So if Replay is pure synonym for a resent Prepared (i.e. carries no additional semantic relevant to the outcome, e.g. does not imply that P is aborting) then there is no reason for C to abort the transaction on receiving the message. Put another way, there is no reason for Replay to induce a different behaviour from a resent Prepared (which would not cause abortion).

If Replay/Preparing is corrected to become identical to Prepared/Preparing, then the unnecessary abort problem goes away (this issue).

If the two rows become identical, then there is no need to have a separate Replay message (it is redundant) -- the related issue.

Alastair

Ram Jeyaraman wrote:

Alastair,

A participant that has successfully prepared when it sends out a replay

(after a crash) genuinely wants to know what the outcome is, so it can

complete the in-doubt transaction during its recovery.

-----Original Message-----

From: Alastair Green [mailto:alastair.green@choreology.com]

Sent: Wednesday, May 10, 2006 4:15 AM

To: Mark Little

Cc: Peter Furniss; Ram Jeyaraman; ws-tx@lists.oasis-open.org

Subject: Re: [ws-tx] Issue 052 - WS-AT: Replay message generates

protocol errors

Ram, Mark --

I've got a quite a few thoughts on this, but I want to check with the TC

on a couple of premises, in case I am misunderstanding some unwritten

piece of design intent.

1. The text at l.221 of the spec defines the Replay message thus:

"Upon receipt of this notification, the coordinator may assume the

participant has suffered a recoverable failure. It should resend the

last appropriate protocol notification."

Does a Replay message for a Participant that crashed in the Prepared

Success and then recovered, carry the semantic:

a) "Have recovered, am in good state to proceed, i.e. am still

prepared", or

b) "Have recovered, was prepared, but am now aborting", or

c) "Have recovered, and may be prepared successfully, or may be

aborting", or

d) some other semantic, that I haven't thought of?

2. Is a Participant which crashed in the Prepared Success state, has

recovered from a failure and is still prepared (i.e. is in the same

state as it was prior to crash recovery) allowed to re-send Prepared? Or

better, can its decision to do so damage the consistency of the

transaction outcome, or slow down arriving at the outcome decision?

Alastair

Mark Little wrote:

Peter Furniss wrote:

I think it is likely the state table is being misinterpreted. I'm not

sure by who :-)

If you treat the state as referring to just one participant, you

either

get some very convoluted definitions of the internal events (c.f.

issue

048 - but more convoluted that the ones proposed there) or you

violate

atomicity.

We already agreed prior to the last f2f (in telecons) and at the last

f2f (during the meeting) that the state table is not referring to just

one participant.

Receiving a 'Prepared' message doesn't move the state to

PreparedSuccess

- that's done by "Commit Decision", and until then 'Replay' would

cause

an abort. You could define "Commit Decision" as meaning "receipt of

ok

vote for just this one participant", and take the state for this

participant to PreparedSuccess. But the only way to leave

PreparedSuccess is from "WriteDone" or "WriteFailed". Since a

'Aborted'

from another participant should certainly cause this participant to

be

rolled back, that 'Aborted' will have to trigger "WriteFailed", which

is

not an obvious interpretation.

But I think this issue, with 053 (eliminate Replay) is more about

whether Replay need ever force an abort. We may be looking at a

carry-over from connection-centric protocols, where it made sense to

force an abort if the connection broke before commit-time. In those

worlds (more or less all transaction protocols that weren't using xml

and/or web-services, I think), receipt of a recovery message before

the

connection was observed to break could only mean the connection break

was about to happen. But with WS-AT (especially because we have said

all

messages go on the underlying request) there is no connection to be

monitored anyway. The coordinator hasn't noticed that participant was

out of communication for a while, and now the participant says it is

ready for the commit. Why *require* the coordinator to abort ?

Agreed.

Of course that's not to say the coordinator cannot *choose* to abort

by

implementation option if replay is received (or any other

circumstance

that leads the coordinator to suspect a failure somewhere). It can

always do that if it hasn't progressed too far - it would appear in

the

tables as a User Rollback or Write Failed.

Yes, I'd like to see this as an implementation specific choice.

Mark.

Peter

-----Original Message-----

From: Ram Jeyaraman [mailto:Ram.Jeyaraman@microsoft.com] Sent: 06 May

2006 02:09

To: ws-tx@lists.oasis-open.org

Subject: RE: [ws-tx] Issue 052 - WS-AT: Replay message generates

protocol errors

Section 10 (AT specification) states "These tables present the view

of a

coordinator or participant with respect to a single partner".  Thus,

the

coordinator states correspond to interactions with a single

participant.

The receipt of a participant vote "PreparedSuccess" triggers the

coordinator state to "PreparedSuccess" with respect to that

particular

participant, even though the coordinator may not have completed the

prepare phase for the rest of the participants.

Is it possible that the state table is likely being misinterpreted?

-----Original Message-----

From: Ram Jeyaraman [mailto:Ram.Jeyaraman@microsoft.com]

Sent: Thursday, April 06, 2006 10:50 AM

To: ws-tx@lists.oasis-open.org

Subject: [ws-tx] Issue 052 - WS-AT: Replay message generates protocol

errors

This is identified as WS-TX issue 052.

Please ensure follow-ups have a subject line starting "Issue 052 -

WS-AT: Replay message generates protocol errors ".

-----Original Message-----

From: Alastair Green [mailto:alastair.green@choreology.com]

Sent: Wednesday, April 05, 2006 5:07 PM

To: ws-tx@lists.oasis-open.org

Subject: [ws-tx] New Issue: WS-AT: Replay message generates protocol

errors

Issue name -- WS-AT: Replay message generates protocol errors

PLEASE DO NOT REPLY TO THIS EMAIL OR START A DISCUSSISON THREAD UNTIL

THE ISSUE IS ASSIGNED A NUMBER.

The issues coordinators will notify the list when that has occurred.

Target document and draft:

Protocol:  WS-AT

Artifact:  spec

Draft:

WS-AT CD 0.1 uploaded

Link to the document referenced:

http://www.oasis-open.org/apps/org/workgroup/ws-tx/download.php/17325/ws

tx-wsat-1.1-spec-cd-01.pdf

Section and PDF line number:

Coordinator View State Table, after l. 503

Issue type:

Design

Related issues:

New issue: WS-AT: Eliminate Replay message. New issue: WS-AT: Is

logging mandatory?

Issue Description:

Replay reactions defined in current CV state table will cause

unnecessary transaction aborts.

Issue Details:

The cells in row (Inbound Messages) Replay, columns (States) Active

and Preparing read:

Active: Send Rollback --> Aborting

Preparing: Send Rollback --> Aborting

Replay message means: "play it again Sam", not "demolish the piano".

Case A. If the last thing they sent was Prepared, and it got through

(we're Preparing and we've recorded their vote), and they've

recovered, and they're waiting for a Commit or a Rollback, then we

need to Ignore the Replay (just like if they send it when we've done

our own housekeeping, and moved to Prepared Success).

Case B. If the message didn't get through, and we've processed User

Commit then we could be in the Preparing state, but have no record of

their vote. In that case we'd have to replay Prepare to indicate to

them, send us your vote again.

Case C. If the last thing we received was Register, and we haven't

processed User Commit, then we're still Active and they won't have

logged. Replay won't happen on crash recovery (no log record to

recover off), but it could be used to say to the coordinator "Are you

still there? Should I crap out?" (i.e., because of impatience). We

can't stop them using Replay in that fashion. Our only sensible

response would have

to be: silence (we don't have a blank ack to a ping), i.e. to Ignore.

There is no harm in them doing this, even though it is pointless. You

could argue that this should be a N/A but that seems heavy-handed.

Proposed Resolution:

As the state tables do not differentiate between Preparing/no vote

recorded and Preparing/vote recorded, it seems easiest to always

resend Prepare in the Preparing state. Therefore:

Replace the current text in the cells in row (Inbound Messages)

Replay, columns (States) Active and Preparing with:

Active: Ignore --> Active

Preparing: Resend Prepare --> Preparing

ws-tx message