Subject: Re: [wsrm] treatment of aborted out-of-order seq
Before replying in - line, I provide rationale for receiver sending the "abort ordered delivery" fault. Note that I am proposing this fault be sent when receiver gives up ordered delivery due to local resource issue- not when previously received out of sequence message expires. For a telecom/ networking person, this is not an optimization but a mandatory requirement.
1. Much quicker recovery from this failure- user of WS Reliability can take some type of recovery action after fault is received and conveyed.
2. Avoid wasting communications resources (and associated costs) by preventing multiple retransmissions of one or more messages that will never be delivered. Otherwise, retransmissions will continue until the retry counter has reached terminal count.
3. Sender can distinguish reason that receiver has aborted ordered delivery. It could be due to
a] message expiration (sender can deduce this by not receiving an ACK for a message that has expired, so no Fault need be sent). Note that the reason for this may be c] below.
b] local resources issue- the Fault sent by receiver tells the sender why ordered delivery has been abandoned/aborted
c] other problem at the receiver (no ACK or Fault received)- power failure, processing failure, higher layer failure, etc. In this case, the sender would poll the receiver and nothing would come back (i.e. no Poll Response)
----- Original Message -----
From: Jacques Durand
Date: Tue, 13 Apr 2004 17:31:30 -0700
To: "WSRM (E-mail)"
Subject: [wsrm] treatment of aborted out-of-order seq
We talked about two Issues that are not unrelated,
yet after review, I believe only one remains:
----- Issue 1:
Should the Receiver warn the Sender that an out-of-order sequence has been
aborted on the Receiver side, due to a reason other than message expiration?
(in case of message expiration, the Sender can - generally - deduce the failure).
Position 1: This can't be more than just an optimization, as the reliability
contract would not be broken anyway (would not cause delivery of un-ordered messages !)
And also we never rely on error messages for critical RM logic.
Note also that any further "quantitative" aspect of RM (e.g. about max length of
out-of-order sequence) is out of scope for V1 (line 103, sec 1.2)
Position 2: This is more than just an optimization, as it is a special failure case
with potential high cost if not dealt with as efficiently as possible:
without knowledge that a sequence has been given-up, cumulative and useless
resending (and new sending) may overwhelm the Receiver.
We fault individual messages for various reasons, including lack of storage resource,
so why not fault such a multi-message failure?
AW: agree with this logic
Opinion: In case a Poll request is used by the sender, it does not cost much to give
notice of the failure (see example in Issue #2 below) in the poll response.
So even if we see this notification as no more than an "improvement" (Position #1)
we can fault the dropped messages in the response.
But when callback is possible - and requested - should the Receiver send:
(a)- nothing at all
(b)- a special PermanentProcessingFailure for the entire group (so will that look like responses we do
for "singleton" groups?)
(c)- a PermanentProcessingFailure for each dropped message.
I personally favor (b) as a consistent complement to my opinion in poll request
case. A MAY or a MUST? I think again its just an optimization, so a MAY is OK.
AW: a MAY causes ambiguity, as one implementation will send the Fault and the other will not. Hence, you have not necessarily provided the sender with the information that will precipitate quicker recovery action. My position is: MUST send the Fault or NEVER send the Fault.
DOES IT REALLY REQUIRE A LOT OF EFFORT TO SEND A SMALL FAULT MESSAGE? I don't see why this point is so controversial?
----- Issue 2:
When we talk of generating Faults, and how to get them to Sender in case neither
callback or response pattern is possible. In that case, Polling is the only way a
Sender can get faults generated on Receiver.
Well, it appears that the Poll response can actually give the fault code,
so the issue is moot(so much for me).
AW: Agree! It would be OK to mandate a Poll after message expiration to determine why the receiver has given up on ordered delivery
A response to a poll about an aborted sequence, (assuming Position 2)
would look like:
<SequenceReplies groupId="mid://20040202. email@example.com/">
<ReplyRange from="0" to="5"/>
<ReplyRange from="6" to="15" fault="wsrm:PermanentProcessingFailure"/>
NOTE: should we have a new fault code: "InvalidGroupId" for requests
concerning groups that are non-existent (possibly because already terminated?)
AW: That would be fine. I would prefer that a new fault code (not Processing Failure) be sent. That would provide specificity and be useful for failure analysis/ correlation.
Alan Weissberger DCT 2013 Acacia Ct Santa Clara, CA 95050-3482 1 408 863 6042 voice 1 408 863 6099 fax