OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

ws-tx message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [ws-tx] Issue 052 - WS-AT: Replay message generates protocolerrors




Alastair Green wrote:
> Hi Mark,
>
> Just one point:
>
> Mark Little wrote:
>> Since it crashed in Prepared Success state we should be able to 
>> assume that the participant obeyed the rules and made its decision to 
>> be able to commit durable. Hence, this Replay message should be 
>> interpreted as a), though the semantic of "have recovered" shouldn't 
>> exclude the fact that the failure may have been in the network and 
>> not the participant service itself (for instance).
>>
> One might think so, but in fact when the Participant experiences a 
> comms time out it Resends Prepared (PV state table).
>
> Which begs the question: if that works for comm failures, why do we do 
> something different for process failures which are recovered?

Different type of failure. I interpret the resend of Prepared on comms 
failure to be in the case where the sender (the participant) knows that 
the original Prepared wasn't delivered. My original statement above 
referring to comms failures is more: there was a network partition after 
Prepared was successfully delivered and this partition has been healed. 
In the meantime, the coordinator committed, couldn't contact the 
participant because of the network partition and so must go into some 
form of recovery mode. From the coordinator's perspective, there is no 
way for it to distinguish between a network partition and the failure of 
the machine on which the participant resides. From the participants 
perspective, there is a difference, though the resolution is the same: 
it initiates a Replay message.

I just wanted to make sure our definition of failure didn't preclude 
partitions.
>
> The implication of the two events for the Coordinator, as you point 
> out, should be identical (we are ensuring that the Prepared Success 
> state is communicated to the Coordinator).

But these are different scenarios. As a slight (related) aside: the OTS 
works fine with replay_completion on the RecoveryCoordinator, so there 
is precedent for Replay.

Mark.

>
> Alastair
>
> Alastair


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]