OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

ws-tx message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [ws-tx] Issue 052 - WS-AT: Replay message generates protocolerrors


Ram,

I very much appreciate your responsiveness, but I'm afraid that it seems you are arguing simply by assertion (that which is, shall be).

1) Why should the coordinator abort in some circumstances purely because of participant crash and recovery, despite the fact that the participant has recovered in an identical state? That is the subject of this issue 052. No-one has attempted to justify this behaviour thus far. Please can someone explain why this should be so?

2) Anything that leaves the participant in doubt as to the outcome requires it to communicate (again) with the C. If this occurs as a result of detecting comms failure this leads to resend of Prepared. In undefined circumstances, we are supposed to send Replay. The spec is entirely silent on which circumstances, incidentally: it merely says that if it is received then it is after "after recoverable failure". If we resolve that aggressive abort is wrong (unjustified) then there will be no reason for having two messages. Can someone please provide an argument for having two messages, if they carry identical effective semantics?

3) There is nothing in the spec to prevent replaying of Prepared. Where is this prohibited, defined, circumscribed etc? Why should it be? Is it dangerous or slow?

Alastair

Ram Jeyaraman wrote:

Alastair,

 

As we discussed below, the replay message is typically sent by a participant that is in an in-doubt situation. It should not be used for replaying a previous protocol message as the specification currently states.

 

The definition of Replay message should read along these lines:

 

Upon receipt of this notification, the coordinator may assume the participant has suffered a recoverable failure. It should resend the transaction outcome (commit or rollback protocol notification) to the in-doubt participant.”


From: Alastair Green [mailto:alastair.green@choreology.com]
Sent: Thursday, May 18, 2006 4:14 AM
To: Ram Jeyaraman
Cc: Mark Little; Peter Furniss; ws-tx@lists.oasis-open.org
Subject: Re: [ws-tx] Issue 052 - WS-AT: Replay message generates protocol errors

 

Ram,

Absolutely. So if Replay is pure synonym for a resent Prepared (i.e. carries no additional semantic relevant to the outcome, e.g. does not imply that P is aborting) then there is no reason for C to abort the transaction on receiving the message. Put another way, there is no reason for Replay to induce a different behaviour from a resent Prepared (which would not cause abortion).

If Replay/Preparing is corrected to become identical to Prepared/Preparing, then the unnecessary abort problem goes away (this issue).

If the two rows become identical, then there is no need to have a separate Replay message (it is redundant) -- the related issue.

Alastair

Ram Jeyaraman wrote:

Alastair,
 
A participant that has successfully prepared when it sends out a replay
(after a crash) genuinely wants to know what the outcome is, so it can
complete the in-doubt transaction during its recovery.
 
-----Original Message-----
From: Alastair Green [mailto:alastair.green@choreology.com] 
Sent: Wednesday, May 10, 2006 4:15 AM
To: Mark Little
Cc: Peter Furniss; Ram Jeyaraman; ws-tx@lists.oasis-open.org
Subject: Re: [ws-tx] Issue 052 - WS-AT: Replay message generates
protocol errors
 
Ram, Mark --
 
I've got a quite a few thoughts on this, but I want to check with the TC
 
on a couple of premises, in case I am misunderstanding some unwritten 
piece of design intent.
 
1. The text at l.221 of the spec defines the Replay message thus:
 
"Upon receipt of this notification, the coordinator may assume the 
participant has suffered a recoverable failure. It should resend the 
last appropriate protocol notification."
 
Does a Replay message for a Participant that crashed in the Prepared 
Success and then recovered, carry the semantic:
 
a) "Have recovered, am in good state to proceed, i.e. am still
prepared", or
b) "Have recovered, was prepared, but am now aborting", or
c) "Have recovered, and may be prepared successfully, or may be 
aborting", or
d) some other semantic, that I haven't thought of?
 
2. Is a Participant which crashed in the Prepared Success state, has 
recovered from a failure and is still prepared (i.e. is in the same 
state as it was prior to crash recovery) allowed to re-send Prepared? Or
 
better, can its decision to do so damage the consistency of the 
transaction outcome, or slow down arriving at the outcome decision?
 
Alastair
 
 
Mark Little wrote:
  
 
Peter Furniss wrote:
    
I think it is likely the state table is being misinterpreted. I'm not
sure by who :-)
 
If you treat the state as referring to just one participant, you
      
either
  
get some very convoluted definitions of the internal events (c.f.
      
issue
  
048 - but more convoluted that the ones proposed there) or you
      
violate
  
atomicity.
  
      
We already agreed prior to the last f2f (in telecons) and at the last 
f2f (during the meeting) that the state table is not referring to just
    
 
  
one participant.
 
    
Receiving a 'Prepared' message doesn't move the state to
      
PreparedSuccess
  
- that's done by "Commit Decision", and until then 'Replay' would
      
cause
  
an abort. You could define "Commit Decision" as meaning "receipt of
      
ok
  
vote for just this one participant", and take the state for this
participant to PreparedSuccess. But the only way to leave
PreparedSuccess is from "WriteDone" or "WriteFailed". Since a
      
'Aborted'
  
from another participant should certainly cause this participant to
      
be
  
rolled back, that 'Aborted' will have to trigger "WriteFailed", which
      
is
  
not an obvious interpretation.
 
 
But I think this issue, with 053 (eliminate Replay) is more about
whether Replay need ever force an abort. We may be looking at a
carry-over from connection-centric protocols, where it made sense to
force an abort if the connection broke before commit-time. In those
worlds (more or less all transaction protocols that weren't using xml
and/or web-services, I think), receipt of a recovery message before
      
the
  
connection was observed to break could only mean the connection break
was about to happen. But with WS-AT (especially because we have said
      
all
  
messages go on the underlying request) there is no connection to be
monitored anyway. The coordinator hasn't noticed that participant was
out of communication for a while, and now the participant says it is
ready for the commit. Why *require* the coordinator to abort ?
  
      
Agreed.
 
    
Of course that's not to say the coordinator cannot *choose* to abort
      
by
  
implementation option if replay is received (or any other
      
circumstance
  
that leads the coordinator to suspect a failure somewhere). It can
always do that if it hasn't progressed too far - it would appear in
      
the
  
tables as a User Rollback or Write Failed.   
      
Yes, I'd like to see this as an implementation specific choice.
 
Mark.
 
    
Peter
 
-----Original Message-----
From: Ram Jeyaraman [mailto:Ram.Jeyaraman@microsoft.com] Sent: 06 May
      
 
  
2006 02:09
To: ws-tx@lists.oasis-open.org
Subject: RE: [ws-tx] Issue 052 - WS-AT: Replay message generates
protocol errors
Section 10 (AT specification) states "These tables present the view
      
of a
  
coordinator or participant with respect to a single partner".  Thus,
      
the
  
coordinator states correspond to interactions with a single
      
participant.
  
The receipt of a participant vote "PreparedSuccess" triggers the
coordinator state to "PreparedSuccess" with respect to that
      
particular
  
participant, even though the coordinator may not have completed the
prepare phase for the rest of the participants.
 
Is it possible that the state table is likely being misinterpreted?
 
-----Original Message-----
From: Ram Jeyaraman [mailto:Ram.Jeyaraman@microsoft.com]
Sent: Thursday, April 06, 2006 10:50 AM
To: ws-tx@lists.oasis-open.org
Subject: [ws-tx] Issue 052 - WS-AT: Replay message generates protocol
errors
This is identified as WS-TX issue 052.
 
Please ensure follow-ups have a subject line starting "Issue 052 -
WS-AT: Replay message generates protocol errors ".
 
-----Original Message-----
From: Alastair Green [mailto:alastair.green@choreology.com]
Sent: Wednesday, April 05, 2006 5:07 PM
To: ws-tx@lists.oasis-open.org
Subject: [ws-tx] New Issue: WS-AT: Replay message generates protocol
errors
Issue name -- WS-AT: Replay message generates protocol errors
 
PLEASE DO NOT REPLY TO THIS EMAIL OR START A DISCUSSISON THREAD UNTIL
THE ISSUE IS ASSIGNED A NUMBER.
 
The issues coordinators will notify the list when that has occurred.
 
Target document and draft:
 
Protocol:  WS-AT
 
Artifact:  spec
 
Draft:
 
WS-AT CD 0.1 uploaded
 
Link to the document referenced:
 
 
      
http://www.oasis-open.org/apps/org/workgroup/ws-tx/download.php/17325/ws
  
tx-wsat-1.1-spec-cd-01.pdf
 
Section and PDF line number:
 
Coordinator View State Table, after l. 503
 
 
Issue type:
 
Design
 
 
Related issues:
 
New issue: WS-AT: Eliminate Replay message. New issue: WS-AT: Is 
logging mandatory?
 
 
Issue Description:
 
Replay reactions defined in current CV state table will cause
unnecessary transaction aborts.
 
 
Issue Details:
 
The cells in row (Inbound Messages) Replay, columns (States) Active 
and Preparing read:
 
Active: Send Rollback --> Aborting
Preparing: Send Rollback --> Aborting
 
Replay message means: "play it again Sam", not "demolish the piano".
 
Case A. If the last thing they sent was Prepared, and it got through 
(we're Preparing and we've recorded their vote), and they've 
recovered, and they're waiting for a Commit or a Rollback, then we 
need to Ignore the Replay (just like if they send it when we've done 
our own housekeeping, and moved to Prepared Success).
 
Case B. If the message didn't get through, and we've processed User 
Commit then we could be in the Preparing state, but have no record of
      
 
  
their vote. In that case we'd have to replay Prepare to indicate to 
them, send us your vote again.
 
Case C. If the last thing we received was Register, and we haven't 
processed User Commit, then we're still Active and they won't have 
logged. Replay won't happen on crash recovery (no log record to 
recover off), but it could be used to say to the coordinator "Are you
      
 
  
still there? Should I crap out?" (i.e., because of impatience). We 
can't stop them using Replay in that fashion. Our only sensible 
response would have
 
to be: silence (we don't have a blank ack to a ping), i.e. to Ignore.
      
 
  
There is no harm in them doing this, even though it is pointless. You
      
 
  
could argue that this should be a N/A but that seems heavy-handed.
 
 
Proposed Resolution:
 
As the state tables do not differentiate between Preparing/no vote 
recorded and Preparing/vote recorded, it seems easiest to always 
resend Prepare in the Preparing state. Therefore:
 
Replace the current text in the cells in row (Inbound Messages) 
Replay, columns (States) Active and Preparing with:
 
Active: Ignore --> Active
Preparing: Resend Prepare --> Preparing
 
 
  
      
 
  


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]