Doug,
Some more comments and thoughts on
your proposal:
<dug>... When
or why an RMS uses CloseSequence is up to it to decide.
All we know is that
it wants to shut things down and get an accurate ACK from the RMD.</dug>
I still have not heard of a
plausible reason why an RMS “wants to shut things down” and the
current spec presents a problem. Comparing the spec as it stands today vs. the
spec + this proposal:
- TODAY:
RMS wants to end the sequence so it sends a LastMessage and must wait for a
complete set of acks; this might require retransmitting messages. Once a
full set of acks is received RMS sends TerminateSequence.
- TODAY
+ THIS PROPOSAL: RMS wants to end the sequence so it sends Close, waits
for a CloseResponse, possibly retransmitting the Close. Once a
CloseResponse is received RMS sends TerminateSequence.
The problem with the TODAY
scenario, as I’ve heard it in this forum, is that the RMS might have to
wait unacceptably long between sending LastMessage and getting a full ack
range. But if getting some messages or acks across proves difficult; why would
the RMS expect that getting Close across would be any easier?
<dug>The case
that I keep thinking about is one where the RMD is actually a cluster of
machines and when a sequence gets created it has an affinity to a certain
server in the cluster - meaning it processes all of the messages for that
sequence. If that server starts to have problems, and for some reason it just
can't seem to process any new app messages then the RMS can close down the
sequence and start up a new one. Hopefully, the new sequence will be directed
to a different server in the cluster. </dug>
There are two problems with this
scenario and the proposed solution.
1.
If an RMD has sequence-to-machine affinity
that should be strictly the RMDs decision and the RMDs problem. The RMS is
autonomous; this proposal puts expectations on the RMS’ behavior based on
particularities of the RMD implementation. To be clear, I’ll note that
affinity can be achieved in two ways:
i.
By performing stateful routing at
the RMD; basically the RMD has to remember every active sequence and what
machine it has affinity to. In this case it would be simple to change the
RMD’s routing table when a machine fails.
ii.
By generating different
EPR’s for each machine. For affinity to function this way two things are
necessary:
1.
Some sort of endpoint resolution
mechanism would have to be devised for the RMS to learn the EPR that it should
target.
2.
A mechanism for migrating that
EPR.
Clearly 1) and 2) are outside the scope of the TC
and, in my view, this proposal might be defining 2) in an informal way that is
specific to WS-RM.
2.
If the RMS somehow guesses that
there is a problem on the EPR to which it is sending its messages and somehow
decides that Closing the sequence and starting a new one is the right course of
action, ordering guarantees are compromised.
Finally, I agree with you that
considering a gap-filling mechanism would be a good thing for this TC to do.
--Stefan
From: Doug Davis
[mailto:dug@us.ibm.com]
Sent: Tuesday, August 30, 2005
7:56 AM
To: ws-rx@lists.oasis-open.org
Subject: RE: [ws-rx] i0019 - a
formal proposal - take 2
Stefan,
comments
inline
thanks
-Doug
"Stefan Batres"
<stefanba@microsoft.com>
08/29/2005
09:07 PM
|
To
|
Doug Davis/Raleigh/IBM@IBMUS,
<ws-rx@lists.oasis-open.org>
|
cc
|
|
Subject
|
RE: [ws-rx] i0019 - a formal proposal - take 2
|
|
Doug,
I have some questions, more than comments. I’m having trouble
imagining when one would use this mechanism given the issues this proposal
addresses and I’d like to hear how you think about this. For example,
i0019 says:
The RM Destination imperatively terminates a sequence due to
one of these unrecoverable errors:
- wsrm:SequenceTerminated
- wsrm:MessageNumberRollover
- wsrm:LastMessageNumberExceeded
Then any pending non-acknowledged message will be lost for
the sequence.
If a RMS receives either wsrm:MessageNumberRollover or
wsrm:LastMessageNumberExceeded it means it has a bug in its implementation of
the protocol no?
<dug> Not necessarily. LastMessageNumberExceeded
could be as a result of the RMS not knowing the RMD doesn't support the max
unsigned long </dug>
We’re not trying to help it recover from that are
we?
<dug> Maybe, maybe not. When or why an RMS uses
CloseSequence is up to it to decide. All we know is that it wants to shut
things down and get an accurate ACK from the RMD.</dug>
W.R.T. wsrm:SequenceTerminated, it seems what we’re trying
to do is define a way for the receiver to gracefully end the sequence –
but we’re calling it a non-fatal fault. I think it might be clearer if we
say that faults are fatal and leave entire sequences in doubt; and add to that
consideration of this mechanism explicitly as a way to allow the destinations
to initiate a graceful sequence termination.
<dug> I didn't change SequenceTerminated Fault to be
non-fatal - so getting/generating that Fault does still end the sequence
</dug>
i0028 says:
An RMS (or SA) may decide to stop using a sequence even
though some messages were not received (not acked)….
Why would an RMS “decide” to stop using a
sequence? Here is what I’ve heard so far:
<dug> w/o saying that I agree with some of these reasons I'll try to
answer each one... </dug>
a) Because it is going down for some reason (e.g. maintenance).
I don’t see this as a reason for ending an otherwise
perfectly good sequence; if you are doing this you could certainly end the
session as per the current spec – or if you have durability there should
be no problem at all.
<dug> if the RMS is going down and cant' wait for all of the sequence to
be ACKd then it must get an accurate accounting of the sequence before it shuts
down. W/o CloseSequence() how can it do that? </dug>
b) Because it implements a message expiration scheme and some of the
messages have expired.
There certainly is an issue with the gaps left on the
sequence as per the current spec, but the mechanism to deal with this
can’t be to end the sequence since ordering guarantees could be lost
(e.g. some of the messages expired but it has many more messages to send).
<dug> here you're talking about the notion of
filling-in gaps in a sequence. Which may be a good thing for the TC to
examine but I don't see it as being the same issue. However, w/o a
gap-filling-solution, if one message never gets ACKd and you're DA is
InOrder+ExaclytOnce, you're screwed. :-) Unless you use CloseSequence()
to get an accurate state of the Seq. W/o CloseSequence() if you just send
a TerminateSequence() there still could be a message or ack floating around the
network that could impact the true state of the sequence. Guess it
depends on this 'message expiration' thingy. </dug>
c) Because it has suffered some sort of partial state loss. For
Instance, an RMS multiplexing messages from several sources stored over a
single sequence and one of those sources fails.
I see this as having the same problem as b. I see how
Close/FinalAck enables the protocol to not doom the entire sequence (bad thing
since many apps are using it). But ordering guarantees for those applications
is lost. <dug> To be honest, I dunno. I
don't know what it means for one of these sources to fail since in my head the
message is already in the RM logic and implies any failure within the AS will
not impact RM's job </dug>
Do you see this mechanism helping in other scenarios that
I’m just not thinking about?
<dug> The case that I keep thinking about is one where
the RMD is actually a cluster of machines and when a sequence gets created it
has an affinity to a certain server in the cluster - meaning it processes all
of the messages for that sequence. If that server starts to have
problems, and for some reason it just can't seem to process any new app
messages then the RMS can close down the sequence and start up a new one.
Hopefully, the new sequence will be directed to a different server in the
cluster. But even w/o the notion of the RMS trying to do some kind of
recovery thru the use of a 2nd sequence (which might be controversial to some
people), I still believe it is valuable for the RMS to be able to reliably
obtain the state of an incomplete Sequence before a TerminateSequence is sent
</dug>
--Stefan
From: Doug Davis
[mailto:dug@us.ibm.com]
Sent: Monday, August 29, 2005 5:07 PM
To: ws-rx@lists.oasis-open.org
Subject: RE: [ws-rx] i0019 - a formal proposal - take 2
Additional comments inline.
All - any additional comments? I need to send out 'take 3' tomorrow.
thanks,
-Doug
Jacques Durand
<JDurand@us.fujitsu.com>
08/26/2005
08:44 PM
|
To
|
"'Giovanni Boschi'"
<gboschi@sonicsoftware.com>, Doug Davis/Raleigh/IBM@IBMUS,
ws-rx@lists.oasis-open.org
|
cc
|
|
Subject
|
RE: [ws-rx] i0019 - a formal proposal - take 2
|
|
Giovani:
I believe there is more in what you say below than what is needed to resolve
i019 and i028.
I am commenting on some of your points below - but I believe they can be largely
dissociated from the current issues at hand, and be treated separately.
Regards,
Jacques
From: Giovanni Boschi [mailto:gboschi@sonicsoftware.com]
Sent: Friday, August 26, 2005 9:01 AM
To: Jacques Durand; Doug Davis; ws-rx@lists.oasis-open.org
Subject: RE: [ws-rx] i0019 - a formal proposal - take 2
I don't see the current draft as directly specifying that acks are "on
receipt", although clearly an implementation could take that approach, and
it's probably the more intuitive one - but, specifically I think the current
draft allows an RMD to defer acking until the messages are "in order"
i.e. not acking those messages that are still sitting "behind gaps".
<JD> this is a very important point to clarify. What you are in effect
suggesting, is an Ack on delivery (since once in order, they can be
delivered.). But the spec is clear: Acknowledgement: The communication from the RM
Destination to the RM Source indicating the successful receipt of a message. (and in the messaging model, there is a clear distinction between
Receipt and Delivery, see Fig 1)
<dug> +1, current spec is for Ack on receipt not delivery </dug>
There is a specific benefit to the "ack when deliverable" (note deliverable,
not delivered) approach for low-resource situations (I can elaborate if needed,
let me know), so I would hesitate to assume that ack-on-receipt is the model
used by all implementations at all times.
<JD> Join the club. I have been favoring "ack on delivery" from
the start, but that seems to clash with the WS-RM model: the protocol would
become involved in the RMD-AD delivery assurance. Please try to convince
Chris...
<dug> I'm staying out of this one for now :-) </dug>
Now, of course, if my "ack when deliverable" approach is in use by
the RMD, then the final ack will be accurate: all the messages that have
been acked are safely deliverable after a close. It's the "ack
immediately on receipt" approach that has that problem - but to be clear,
I do not want the spec to impose an ack strategy, I think the freedom the spec
gives the RMD on choosing an ack strategy is one of the coolest things in the
current spec.
<JD> well... too much choice is not necessarily good here: an RMS must preferably
know what Ack means to the other party. I prefer the spec to be clear one way
or the other about this. I think it is now. Just not the way I prefer - because
as drafted today, I maintain, it is OK to acknowledge a message that will never
be delivered, and there is no provision for the RMS to know about this. But
that is another issue.
<dug> I seem to recall the notion of having some sort of handshaking
going on during the CreateSequence where the RMD would communicate the DA
in-use back to the RMS. I suppose the spec could also communicate the ACK
strategy as well, if there was more than one choice. Not really sure I'd
want to offer more than one but its something to think about </dug>
I think the general way out of this may be the following: If the original
use case was "I want to close the sequence and have an accurate final ack
so I know which ones to resend in a different sequence later", then it
seems to me that this is really only viable for sequences that do not have
InOrder requirements: If I will send some of them in sequence S2 later
there is no guarantee that they will be delivered in order with respect to the
ones I sent in sequence S1 earlier, and I am going to break the InOrder
requirement anyway.
<JD> I think if InOrder is required but not AtLeastOnce, that means we
accept message loss - and therefore we would not have any qualms not resending
these in S2. Even if InOrder+ AtLeastOnce is required, some gap may still be
there when closing the sequence S1. But again, S2 can forget about the missing
messages in S1: the DA is still satisfied if a delivery failure has been
notified for the missing,
<dug> A while ago there was a discussion about how to handle the linking
of sequences for cases where the MaxMsgNum was hit. While there wasn't a
formal decision by the TC, quite a few people said that that notion was
something that should be done at a higher level. I believe the notion of
how to recover a sequence when it is closed prematurely fits into that category
as well. So, while I can see your point about there still being an issue
of how to safely do some recovery, I think its another issue. This
current proposal simply focuses on how the RMS can get an accurate accounting
of the 'current' sequence when it is closed down early. What it does with
that info - if anything at all - is something else. And personally, while
I do agree there is some higher-level processing that can/should take place in
some situations, I do think the RM protocol could help make that processing easier
- but as I said, that's another issue. </dug>
The RMS knows from the final ack which messages the RMD "has"; if it
knew the the RMD<->AD DA, then it would know what to do:
-
If the
DA is InOrder, it knows that it cannot close and then restart a new sequence at
all without violating the underlying ordering requirements
-
If the
DA is not InOrder then it can close and restart a new sequence later, and if so
it should resend all messages not in the final ack.
<JD> These behaviors are somehow out of scope of the spec: there is no
requirement on dealing with missing messages across sequences. That is an
optimization that can indeed rely on out-of-band knowledge of the DA.
<dug> yup - current out of scope or that 'higher level' thing I mentioned
</dug>
But, the RMS does not know the RMD-AD DA; I guess we could propose that the
target endpoint publish its DA in its policy (or createSequence, whatever), and
I personally think it would be a good thing even for unrelated reasons - but I
suspect there could be a lot of opposition - You have to go back to a 2002
version of the member submission to find DA in the policy, and I think this was
removed very much intentionally. But maybe we could propose it and see?
A minor point on wording: I think rather than "MUST not accept"
we should say "MUST not deliver to the AD" as in the original text
below - "accepting" is not something that we define anywhere and it
could be misconstrued. Not delivering is what matters.
<JD> Right for the loose terminology. But again, you are opening a can of
worms: we do NOT want these messages to be acknowledged (not juts "not
delivered") as soon as the closing is effective. Maybe this would do: ...RM Destination MUST NOT acknowledge nor deliver any received
messages with a Sequence header for the specified sequence, other than those
already received at the time the <wsrm:Close> element is processed by the
RMD
"-Jacques
<dug> Well, it can still deliver old messages to the AD it just can't ACK
new ones. For example, if msg 3 out of 5 is missing and a Close() comes in, the
RMD can still deliver 1 and 2 to the AD (if it hasn't done so already).
It just can't deliver 4 and 5. I think 'accept' is the right
choice.</dug>
G.
From: Jacques Durand [mailto:JDurand@us.fujitsu.com]
Sent: Thursday, August 25, 2005 9:36 PM
To: 'Doug Davis'; ws-rx@lists.oasis-open.org
Subject: RE: [ws-rx] i0019 - a formal proposal - take 2
Inline <JD>
From: Doug Davis [mailto:dug@us.ibm.com]
Sent: Thursday, August 25, 2005 5:59 PM
To: ws-rx@lists.oasis-open.org
Subject: RE: [ws-rx] i0019 - a formal proposal - take 2
When InOrder DA is used the RMS knows that all messages after the first gap
were not delivered to the RMD's application - even if they were ACKed.
<JD> InOrder DA in itself does allow delivery of non-contiguous messages
( "...it says nothing about duplications or omission..." Section 2,
Core spec)
So, getting an ACK+Final guarantees to the RMS which messages were not just ACKed
but delivered - and any messages after the first gap can be recovered (e.g.
resent in a new sequence if it wants) without fear of them being processed
twice by the RMD's app.
Actually, thinking about it more, perhaps some of the text should remain, like:
When a Sequence is closed and there are messages at the RM Destination
that are waiting for lower-numbered messages to arrive (such as the
case when InOrder delivery is being enforced) before they can be
processed by the RM Destination's application, the RM Destination
MUST NOT deliver those messages.
Just to ensure that the RMD does not interpret the Close() as the trigger
to let all messages after the gap thru to the app.
thanks,
<JD> but again, because the semantics of Ack is just "on
receipt" and not "on delivery", an honest RMD developer may
decide to Ack these late messages, rendering the final Ack incorrect (or
unstable, depending when it is requested...). Another way to avoid adding this
text is to make the statement below more general, not limited to "new
application messages":
"...can send a <wsrm:Close>
element, in the body of a message, to the RM Destination to indicate that RM
Destination MUST NOT accept any new application messages for the specified
sequence."
Replace with:
"...can send a <wsrm:Close>
element, in the body of a message, to the RM Destination to indicate that RM
Destination MUST NOT accept any application messages for the specified
sequence, other than those already received at the time the <wsrm:Close>
element is interpreted by the RMD."
-jacques
-Doug
"Giovanni Boschi"
<gboschi@sonicsoftware.com>
08/25/2005
08:48 PM
|
To
|
Doug Davis/Raleigh/IBM@IBMUS, "Jacques Durand"
<JDurand@us.fujitsu.com>
|
cc
|
<ws-rx@lists.oasis-open.org>
|
Subject
|
RE: [ws-rx] i0019 - a formal proposal - take 2
|
|
If the RMD has already acked the out-of-order messages (and the spec at this
point doesn't say it can't or shouldn't), and we then preclude the RMD from delivering
them, then the final Ack is not accurate, which I thought was the original
goal. Even if we leave it undefined, the RMD may choose not to deliver
them, and the problem remains.
G.
From: Doug Davis [mailto:dug@us.ibm.com]
Sent: Thursday, August 25, 2005 7:23 PM
To: Jacques Durand
Cc: ws-rx@lists.oasis-open.org
Subject: RE: [ws-rx] i0019 - a formal proposal - take 2
Jacques Durand <JDurand@us.fujitsu.com> wrote on 08/25/2005 02:10:04 PM:
> When a Sequence is closed and there are messages at the RM
Destination
> that are waiting for lower-numbered messages to arrive (such as the
> case when InOrder delivery is being enforced) before they can be
> processed by the RM Destination's application, the RM Destination
> MUST NOT deliver those messages and a SequenceClosed fault MUST
> be generated for each one.
> <JD> it is important to also say that it should not acknowledge them
either.
If we change it so that it says nothing about those messages instead,
as Anish and Chris are suggesting, would that be ok with you?
So, basically, the semantics of undelivered messages would be undefined by
removing the above paragraph.
thanks,
-Doug