ws-rx message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: RE: [ws-rx] i0019 - a formal proposal - take 2
- From: Doug Davis <dug@us.ibm.com>
- To: ws-rx@lists.oasis-open.org
- Date: Mon, 5 Sep 2005 20:15:44 -0400
Stefan,
I disagree with the premise of
your note. The use cases for this feature are not limited to the
cases you've mentioned, nor are they limited to the cases I or anyone else
has mentioned. So trying to fit all possible use cases into the scope
you defined just doesn't fly for me. The reason behind why the RMS
wants to get an accurate and final ack state could be just about anything
- and as tempting as it is to rambling off yet another possible reason
why this feature would be useful I'd prefer to not let the conversation
get bogged down an attempt to limit the scope of this feature. As
I've mentioned, if as an implementor you don't think you'll ever need this
_optional_ feature then don't send it.
thanks
-Doug
"Stefan Batres"
<stefanba@microsoft.com>
09/05/2005 07:30 PM
|
To
| Doug Davis/Raleigh/IBM@IBMUS,
<ws-rx@lists.oasis-open.org>
|
cc
|
|
Subject
| RE: [ws-rx] i0019 - a formal
proposal - take 2 |
|
A quick correction to my comment
below:
Note that thus
far, we’ve managed to describe exactly one scenario that fits the #2 description:
[RMD] has separate state stores for session state and messages – the latter
fails but the former is still operable.
The scenario we’ve talked
about is where the RMD uses separate state stores, not the RMS.
--Stefan
From: Stefan Batres [mailto:stefanba@microsoft.com]
Sent: Thursday, September 01, 2005 10:40 AM
To: Doug Davis; ws-rx@lists.oasis-open.org
Subject: RE: [ws-rx] i0019 - a formal proposal - take 2
Doug,
I apologize
if my rant below is a bit to cryptic, let me try again:
1. When a catastrophic
failure occurs (e.g. RMD amnesia), an RMS has to react in some way; It
could return an error to the user or it can engage in a recovery mechanism
of some sort. I don’t believe you are trying to prescribe what the RMS’s
reaction ought to be.
2. As you’ve
said time and again, this proposal is about getting the RMS an accurate
ack set in cases where: 1. A full ack set will never be possible (or at
least not in a reasonable amount of time), 2.There are messages that have
been sent and for which no ack has been received and 3. The problem that
prevents a full ack set doesn’t prevent the exchange of protocol messages.
The point I
was trying to make is that given #1 above, #2 is an optimization for a
case that will be relatively rare. Note that I don’t question for a second
the correctness of your proposal – what concerns me is adding elements
to the protocol for this specific case, #2, especially since apps will
have to deal with #1 anyway.
Note that thus
far, we’ve managed to describe exactly one scenario that fits the #2 description:
RMS has separate state stores for session state and messages – the latter
fails but the former is still operable.
--Stefan
From: Doug Davis [mailto:dug@us.ibm.com]
Sent: Wednesday, August 31, 2005 3:58 AM
To: ws-rx@lists.oasis-open.org
Subject: RE: [ws-rx] i0019 - a formal proposal - take 2
I'm having a hard time following this. I sounds like you're
saying because the proposal does not solve all RM related problems you
don't want to have it in our 'bag of tricks' at all. Following that
logic, why should we distinguish between SequenceTerminated Fault and any
other Fault? We do it because we want to provide as much information
back to the RMS as possible. What it uses this information for is
up to it.
As I've said may times before, this proposal does not suggest ANY
recovery scheme. What I've done (outside of the proposal itself)
is discuss how I _think_ an RMS might use this information in some error
recovery mechanism but this proposal itself does not suggest one. This
proposal simply provides a mechanism for the RMS to get an accurate accounting
of the state of the sequence - that's it. How the RMS uses this information
is up to it. If for nothing else it may choose to simply log the
information - that alone is invaluable to someone trying to figure out
what's going on. And I'm having a hard time understanding why providing
an _optional_ mechanism that could aide in the RMS getting an accurate
accounting of the state of the sequence (without having to call up the
RMD's admin) is a bad thing.
thanks,
-Doug
"Stefan Batres" <stefanba@microsoft.com>
08/31/2005 01:48 AM
|
To
| Doug Davis/Raleigh/IBM@IBMUS,
<ws-rx@lists.oasis-open.org>
|
cc
|
|
Subject
| RE: [ws-rx] i0019 - a formal proposal
- take 2 |
|
Doug,
You mention a specific situation: An RMD experiences a failure that prevents
it from receiving application messages. I agree in so far as saying that
in such a failure case this proposal could be helpful in that it helps
the RMS to engage in recovery of some sort (either inform applications
that a specific message was not sent or open a new sequence, assuming ordering
is not important). But this is not the only failure case that applications
will want to deal with (with or without help from the protocol).
Consider the case where connectivity is lost for long enough for both sequences
to expire or consider the case where the destination suffers a loss of
session state. In such failure modes this solution is not helpful – yet
applications will need a recovery strategy of some sort. It might be that
it is application specific, or it might be that a general failure recovery
specification is created and ratified at some point. The important idea
is that the only way to deal with all failure modes is at higher level.
This proposal leverages the protocol to optimize recovery in specific circumstances
that should be relatively rare. RM implementations should not be required
to support failure mode recovery mechanisms that either don’t apply to
them or that they choose to implement in a uniform way at a higher level.
Thanks
--Stefan
From: Doug Davis [mailto:dug@us.ibm.com]
Sent: Tuesday, August 30, 2005 1:08 PM
To: ws-rx@lists.oasis-open.org
Subject: RE: [ws-rx] i0019 - a formal proposal - take 2
Yet more comments. :-)
-Doug
"Stefan Batres" <stefanba@microsoft.com>
08/30/2005 03:35 PM
|
To
| Doug Davis/Raleigh/IBM@IBMUS,
<ws-rx@lists.oasis-open.org>
|
cc
|
|
Subject
| RE: [ws-rx] i0019 - a formal proposal
- take 2 |
|
Doug,
Some more comments and thoughts on your proposal:
<dug>... When or why an RMS uses CloseSequence is up to it to decide.
All we know is that it wants to shut things down and get an accurate ACK
from the RMD.</dug>
I still have not heard of a plausible reason why an RMS “wants to shut
things down” and the current spec presents a problem. Comparing the spec
as it stands today vs. the spec + this proposal:
- TODAY: RMS wants to end the sequence so
it sends a LastMessage and must wait for a complete set of acks; this might
require retransmitting messages. Once a full set of acks is received RMS
sends TerminateSequence.
- TODAY + THIS PROPOSAL: RMS wants to end
the sequence so it sends Close, waits for a CloseResponse, possibly retransmitting
the Close. Once a CloseResponse is received RMS sends TerminateSequence.
The problem with the TODAY scenario, as I’ve heard it in this forum, is
that the RMS might have to wait unacceptably long between sending LastMessage
and getting a full ack range. But if getting some messages or acks across
proves difficult; why would the RMS expect that getting Close across would
be any easier?
<dug> 1 - I don't believe your text is accurate in that Close is
supposed to be used in cases where the sequence needs to end due to something
going wrong. You've described a case where the sequence is functioning
just fine - and while Close can be used in those cases as well, it provides
no additional value. 2- Sending a Close and sending application data
can have quite a different set of features executed so I don't think its
hard to imagine cases where RM messages can get processed just fine but
application messages run into problems. I believe Chris mentioned
on some call the notion of two different persistent stores - one for RM
data and one for app-data. Its possible that the app-data one is
running into problems. 3 - Using the CloseSequence operation is option
- if you feel that, as an RMS implementor, you'll never see its usefulness
then you're free to never implement/send it. However, I'd hate remove
this option for those of us who do see value in it. </dug>
<dug>The case that I keep thinking about is one where the RMD is
actually a cluster of machines and when a sequence gets created it has
an affinity to a certain server in the cluster - meaning it processes all
of the messages for that sequence. If that server starts to have problems,
and for some reason it just can't seem to process any new app messages
then the RMS can close down the sequence and start up a new one. Hopefully,
the new sequence will be directed to a different server in the cluster.
</dug>
There are two problems with this scenario and the proposed solution.
1. If
an RMD has sequence-to-machine affinity that should be strictly the RMDs
decision and the RMDs problem. The RMS is autonomous; this proposal puts
expectations on the RMS’ behavior based on particularities of the RMD
implementation. To be clear, I’ll note that affinity can be achieved in
two ways:
i.
By
performing stateful routing at the RMD; basically the RMD has to remember
every active sequence and what machine it has affinity to. In this case
it would be simple to change the RMD’s routing table when a machine fails.
ii.
By
generating different EPR’s for each machine. For affinity to function
this way two things are necessary:
1. Some
sort of endpoint resolution mechanism would have to be devised for the
RMS to learn the EPR that it should target.
2. A
mechanism for migrating that EPR.
Clearly 1) and 2) are outside the scope
of the TC and, in my view, this proposal might be defining 2) in an informal
way that is specific to WS-RM.
2. If
the RMS somehow guesses that there is a problem on the EPR to which it
is sending its messages and somehow decides that Closing the sequence and
starting a new one is the right course of action, ordering guarantees are
compromised.
<dug> I probably didn't state the problem very well. I didn't
intend to claim that the RMS knew about this affinity, but instead it knew
that something was wrong with the current sequence and in order to try
to fix the situation it decided to try another sequence. The affinity
bit was thrown in there to explain why starting a new sequence _might_
fix the problem.
I should also point out that while a lot of these discussions have focused
on InOrder+ExactlyOnce DA, this feature is still useful in other DAs. For
example, if the DA is just ExactlyOnce - having an accurate accounting
of the ACKs allows a subsequent sequence to send just the gaps from the
first, so getting an accurate list of the gaps becomes critical. And
this of course leads us to the discussion of how to determine the DA in
use - which I think might be part of issues 6, 9, 24 and 27.
</dug>
Finally, I agree with you that considering a gap-filling mechanism would
be a good thing for this TC to do.
--Stefan
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]