ws-rx message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: Re: [ws-rx] i0019 - a formal proposal - take 2
- From: Doug Davis <dug@us.ibm.com>
- To: ws-rx@lists.oasis-open.org
- Date: Wed, 7 Sep 2005 12:18:08 -0400
Doug.Bunting@Sun.COM wrote on 09/07/2005 11:23:44
AM:
> DougD,
>
> This discussion has been useful and may indicate we lack a requirement
> for the proposed change. The issues addressed are based on the
> assumption of that requirement.
>
> I may not be as certain as Stefan that no general case exists. However,
> your examples are starting to sound less convincing.
>
> The example below mixes delivery problems of individual messages with
> the RMS having an incorrect accounting of those failures. While
large
> messages may fail, small messages will still make it through and the
RMS
> will get its NACK.
Individual message delivery problems w.r.t the final
outcome of the
sequence is the entire point. Not having a mechanism
through which the
RMS can be assured that any late arriving messages
or ACKs will mess-up the
final accounting means it can never be sure. If
you have any messages
that, for whatever reason, never seems to get ACKed
as an RMS you have
no choice but to continually retry and pray that it
eventually works.
With this proposal an RMS can, at its own will, decide
to give up on this
sequence and be guaranteed of the final outcome. W/o
it, it must try
forever or live with this uncertainy. Your assertion
that
the RMS will get the NACK is not valid. The
RMS can never be assured,
w/o a feature like this, that it has received all
of the ACKs and NACKs.
I'm actually very confused by the assertion you and
Stefan seem to be
making that all sequences would be able to be successfully
fully ACKed.
Even Stefan has mentioned [1] that a "fill in
a gap" feature might be
useful for the TC to conider, so clearly even he admits
there will be
times when gaps can exist and not be filled. So,
I'm lost as to why
there would be a lack of a requirement.
thanks
-Doug
[1] http://www.oasis-open.org/apps/org/workgroup/ws-rx/email/archives/200508/msg00302.html
> thanx,
> dougb
>
> On 07/09/05 05:49, Doug Davis wrote:
>
> >
> > Stefan,
> > The proposal attempts to address the two issues, i019 and
i028.
> > Perhaps you're looking to close the issues as invalid?
During the
> > course of these discussions several use cases have been mentioned
as
> > possible situation in which the issues mentioned i019 and i028
will
> > occur. If you think those use cases fit into the "#1"
you mentioned,
> > and you believe that case to be rare then ok. I don't see
the
> > situations i019 and i028 talk about as being rare nor do I think
your
> > "#1" is the only case. I believe people have
mentioned cases much
> > less catastrophic, such as extremely large messages just can
not be
> > delivered due to some network issues (sadly something I run into
quite
> > a bit), that would still warrant the need for this solution.
But, I
> > don't see the need to iterate all of them since the entire point
of
> > the spec is that networks are not reliable and problems will
occur.
> > So running into one that prevents us from getting 100%
guaranteed
> > complete delivery every time isn't hard for me to imagine. But
that's
> > just me.
> > thanks,
> > -Doug
> >
> >
> >
> > *"Stefan Batres" <stefanba@microsoft.com>*
> >
> > 09/06/2005 01:13 PM
> >
> >
> > To
> > Doug Davis/Raleigh/IBM@IBMUS, <ws-rx@lists.oasis-open.org>
> > cc
> >
> > Subject
> > RE: [ws-rx] i0019 - a formal proposal - take 2
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Doug,
> >
> > What I’m trying to do is to /identify /the set of use cases
this
> > feature attempts to address – you might disagree with the set
I’ve
> > identified and that is perfectly valid. It is our job though
to
> > motivate changes to the contributed specs. If you disagree with
the
> > way I’ve characterized the set of use cases for this feature
then it
> > would really help if you could write down for me how you characterize
> > the use cases vs. the protocol as submitted. I hope you can take
doing
> > this seriously; I don’t think it is a good design process to
add
> > features to the protocol simply because we think they are helpful
and
> > refuse to do the leg work of 1) Defining the characteristics
of the
> > use cases when the features are helpful, 2) Compare that against
the
> > contributed documents and 3) Go through the exercise of identifying
> > real world use cases that match said characteristics.
> >
> > --Stefan
> >
> >
> >
> > ------------------------------------------------------------------------
> >
> > *From:* Doug Davis [mailto:dug@us.ibm.com] *
> > Sent:* Monday, September 05, 2005 5:16 PM*
> > To:* ws-rx@lists.oasis-open.org*
> > Subject:* RE: [ws-rx] i0019 - a formal proposal - take 2
> >
> >
> > Stefan,
> > I disagree with the premise of your note. The use
cases for this
> > feature are not limited to the cases you've mentioned, nor are
they
> > limited to the cases I or anyone else has mentioned. So
trying to fit
> > all possible use cases into the scope you defined just doesn't
fly for
> > me. The reason behind why the RMS wants to get an accurate
and final
> > ack state could be just about anything - and as tempting as it
is to
> > rambling off yet another possible reason why this feature would
be
> > useful I'd prefer to not let the conversation get bogged down
an
> > attempt to limit the scope of this feature. As I've mentioned,
if as
> > an implementor you don't think you'll ever need this _optional_
> > feature then don't send it.
> > thanks
> > -Doug
> >
> > *"Stefan Batres" <stefanba@microsoft.com>*
> >
> > 09/05/2005 07:30 PM
> >
> >
> > To
> > Doug Davis/Raleigh/IBM@IBMUS, <ws-rx@lists.oasis-open.org>
> > cc
> >
> > Subject
> > RE: [ws-rx] i0019 - a formal proposal - take 2
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > A quick correction to my comment below:
> > */
> > Note that thus far, we’ve managed to describe exactly one scenario
> > that fits the #2 description: [RMD] has separate state stores
for
> > session state and messages – the latter fails but the former
is still
> > operable./*
> >
> > The scenario we’ve talked about is where the RMD uses separate
state
> > stores, not the RMS.
> >
> > --Stefan
> >
> >
> >
> >
> > ------------------------------------------------------------------------
> >
> > *
> > From:* Stefan Batres [mailto:stefanba@microsoft.com] *
> > Sent:* Thursday, September 01, 2005 10:40 AM*
> > To:* Doug Davis; ws-rx@lists.oasis-open.org*
> > Subject:* RE: [ws-rx] i0019 - a formal proposal - take 2
> > */
> > Doug,/* */
> > /* */
> > I apologize if my rant below is a bit to cryptic, let me try
again:/* */
> > /* */
> > 1. When a catastrophic failure occurs (e.g. RMD amnesia), an
RMS has
> > to react in some way; It could return an error to the user or
it can
> > engage in a recovery mechanism of some sort. I don’t believe
you are
> > trying to prescribe what the RMS’s reaction ought to be./* */
> > 2. As you’ve said time and again, this proposal is about getting
the
> > RMS an accurate ack set in cases where: 1. A full ack set will
never
> > be possible (or at least not in a reasonable amount of time),
2.There
> > are messages that have been sent and for which no ack has been
> > received and 3. The problem that prevents a full ack set doesn’t
> > prevent the exchange of protocol messages.
> > /* */
> > The point I was trying to make is that given #1 above, #2 is
an
> > optimization for a case that will be relatively rare. Note that
I
> > don’t question for a second the correctness of your proposal
– what
> > concerns me is adding elements to the protocol for this specific
case,
> > #2, especially since apps will have to deal with #1 anyway./*
*/
> > /* */
> > Note that thus far, we’ve managed to describe exactly one scenario
> > that fits the #2 description: RMS has separate state stores for
> > session state and messages – the latter fails but the former
is still
> > operable./* */
> > /* */
> > --Stefan/*
> >
> >
> >
> > ------------------------------------------------------------------------
> >
> > *
> > From:* Doug Davis [mailto:dug@us.ibm.com] *
> > Sent:* Wednesday, August 31, 2005 3:58 AM*
> > To:* ws-rx@lists.oasis-open.org*
> > Subject:* RE: [ws-rx] i0019 - a formal proposal - take 2
> >
> >
> > I'm having a hard time following this. I sounds like you're
saying
> > because the proposal does not solve all RM related problems you
don't
> > want to have it in our 'bag of tricks' at all. Following
that logic,
> > why should we distinguish between SequenceTerminated Fault and
any
> > other Fault? We do it because we want to provide as much
information
> > back to the RMS as possible. What it uses this information
for is up
> > to it.
> > As I've said may times before, this proposal does not suggest
ANY
> > recovery scheme. What I've done (outside of the proposal
itself) is
> > discuss how I _think_ an RMS might use this information in some
error
> > recovery mechanism but this proposal itself does not suggest
one.
> > This proposal simply provides a mechanism for the RMS to
get an
> > accurate accounting of the state of the sequence - that's it.
How the
> > RMS uses this information is up to it. If for nothing else
it may
> > choose to simply log the information - that alone is invaluable
to
> > someone trying to figure out what's going on. And I'm having
a hard
> > time understanding why providing an _optional_ mechanism that
could
> > aide in the RMS getting an accurate accounting of the state of
the
> > sequence (without having to call up the RMD's admin) is a bad
thing.
> > thanks,
> > -Doug
> >
> > *"Stefan Batres" <stefanba@microsoft.com>*
> >
> > 08/31/2005 01:48 AM
> >
> >
> >
> >
> > To
> > Doug Davis/Raleigh/IBM@IBMUS, <ws-rx@lists.oasis-open.org>
> > cc
> >
> > Subject
> > RE: [ws-rx] i0019 - a formal proposal - take 2
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > */
> >
> > Doug,/*
> > */
> > You mention a specific situation: An RMD experiences a failure
that
> > prevents it from receiving application messages. I agree in so
far as
> > saying that in such a failure case this proposal could be helpful
in
> > that it helps the RMS to engage in recovery of some sort (either
> > inform applications that a specific message was not sent or open
a new
> > sequence, assuming ordering is not important). But this is not
the
> > only failure case that applications will want to deal with (with
or
> > without help from the protocol)./* */
> > Consider the case where connectivity is lost for long enough
for both
> > sequences to expire or consider the case where the destination
suffers
> > a loss of session state. In such failure modes this solution
is not
> > helpful – yet applications will need a recovery strategy of
some sort.
> > It might be that it is application specific, or it might be that
a
> > general failure recovery specification is created and ratified
at some
> > point. The important idea is that the only way to deal with all
> > failure modes is at higher level. This proposal leverages the
protocol
> > to optimize recovery in specific circumstances that should be
> > relatively rare. RM implementations should not be required to
support
> > failure mode recovery mechanisms that either don’t apply to
them or
> > that they choose to implement in a uniform way at a higher level./*
> > */
> > Thanks/*
> > */
> > --Stefan/*
> >
> >
> >
> >
> >
> >
> > ------------------------------------------------------------------------
> >
> > *
> >
> > From:* Doug Davis [mailto:dug@us.ibm.com] *
> > Sent:* Tuesday, August 30, 2005 1:08 PM*
> > To:* ws-rx@lists.oasis-open.org*
> > Subject:* RE: [ws-rx] i0019 - a formal proposal - take 2
> >
> >
> > Yet more comments. :-)
> > -Doug
> >
> > *"Stefan Batres" <stefanba@microsoft.com>*
> >
> > 08/30/2005 03:35 PM
> >
> >
> >
> >
> >
> >
> > To
> > Doug Davis/Raleigh/IBM@IBMUS, <ws-rx@lists.oasis-open.org>
> > cc
> >
> > Subject
> > RE: [ws-rx] i0019 - a formal proposal - take 2
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Doug,
> >
> > Some more comments and thoughts on your proposal:
> > *
> >
> > <dug>... When or why an RMS uses CloseSequence is up to
it to decide.
> > All we know is that it wants to shut things down and get an accurate
> > ACK from the RMD.</dug>*
> >
> > I still have not heard of a plausible reason why an RMS “wants
to shut
> > things down” and the current spec presents a problem. Comparing
the
> > spec as it stands today vs. the spec + this proposal:
> >
> >
> > * TODAY: RMS wants to end the sequence so it sends
a LastMessage
> > and must wait for a complete set of acks;
this might require
> > retransmitting messages. Once a full set
of acks is received RMS
> > sends TerminateSequence.
> >
> >
> >
> > * TODAY + THIS PROPOSAL: RMS wants to end the sequence
so it sends
> > Close, waits for a CloseResponse, possibly
retransmitting the
> > Close. Once a CloseResponse is received
RMS sends TerminateSequence.
> >
> >
> > The problem with the TODAY scenario, as I’ve heard it in this
forum,
> > is that the RMS might have to wait unacceptably long between
sending
> > LastMessage and getting a full ack range. But if getting some
messages
> > or acks across proves difficult; why would the RMS expect that
getting
> > Close across would be any easier? *
> >
> > <dug> 1 - I don't believe your text is accurate in that
Close is
> > supposed to be used in cases where the sequence needs to end
due to
> > something going wrong. You've described a case where the
sequence is
> > functioning just fine - and while Close can be used in those
cases as
> > well, it provides no additional value. 2- Sending a Close
and sending
> > application data can have quite a different set of features executed
> > so I don't think its hard to imagine cases where RM messages
can get
> > processed just fine but application messages run into problems.
I
> > believe Chris mentioned on some call the notion of two different
> > persistent stores - one for RM data and one for app-data. Its
> > possible that the app-data one is running into problems. 3
- Using
> > the CloseSequence operation is option - if you feel that, as
an RMS
> > implementor, you'll never see its usefulness then you're free
to never
> > implement/send it. However, I'd hate remove this option
for those of
> > us who do see value in it. </dug>*
> >
> >
> > *
> >
> > <dug>The case that I keep thinking about is one where the
RMD is
> > actually a cluster of machines and when a sequence gets created
it has
> > an affinity to a certain server in the cluster - meaning it processes
> > all of the messages for that sequence. If that server starts
to have
> > problems, and for some reason it just can't seem to process any
new
> > app messages then the RMS can close down the sequence and start
up a
> > new one. Hopefully, the new sequence will be directed to a different
> > server in the cluster. </dug>*
> >
> > There are two problems with this scenario and the proposed solution.
> > 1. If an RMD has sequence-to-machine affinity
that should be
> > strictly the RMDs decision and the RMDs problem. The RMS is
> > autonomous; this proposal puts expectations on the RMS’ behavior
based
> > on particularities of the RMD implementation. To be clear, I’ll
note
> > that affinity can be achieved in two ways:
> >
i.
By
> > performing stateful routing at the RMD; basically the RMD has
to
> > remember every active sequence and what machine it has affinity
to. In
> > this case it would be simple to change the RMD’s routing table
when a
> > machine fails.
> >
ii.
By
> > generating different EPR’s for each machine. For affinity to
function
> > this way two things are necessary:
> > 1. Some sort of endpoint resolution mechanism
would have to be
> > devised for the RMS to learn the EPR that it should target.
> > 2. A mechanism for migrating that EPR.
> >
> > Clearly 1) and 2) are outside the scope of the TC and, in my
view,
> > this proposal might be defining 2) in an informal way that is
specific
> > to WS-RM.
> >
> >
> > 2. If the RMS somehow guesses that there
is a problem on the EPR
> > to which it is sending its messages and somehow decides that
Closing
> > the sequence and starting a new one is the right course of action,
> > ordering guarantees are compromised. *
> >
> > <dug> I probably didn't state the problem very well. I
didn't intend
> > to claim that the RMS knew about this affinity, but instead it
knew
> > that something was wrong with the current sequence and in order
to try
> > to fix the situation it decided to try another sequence. The
affinity
> > bit was thrown in there to explain why starting a new sequence
_might_
> > fix the problem.
> >
> > I should also point out that while a lot of these discussions
have
> > focused on InOrder+ExactlyOnce DA, this feature is still useful
in
> > other DAs. For example, if the DA is just ExactlyOnce -
having an
> > accurate accounting of the ACKs allows a subsequent sequence
to send
> > just the gaps from the first, so getting an accurate list of
the gaps
> > becomes critical. And this of course leads us to the discussion
of
> > how to determine the DA in use - which I think might be part
of issues
> > 6, 9, 24 and 27.* *
> > </dug>*
> >
> > Finally, I agree with you that considering a gap-filling mechanism
> > would be a good thing for this TC to do.
> >
> >
> > --Stefan
> >
> >
> >
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]