ws-rx message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: RE: [ws-rx] i0019 - a formal proposal - take 2
- From: Doug Davis <dug@us.ibm.com>
- To: ws-rx@lists.oasis-open.org
- Date: Wed, 7 Sep 2005 19:59:24 -0400
Stefan,
(I should have waited a bit longer
before sending my previous note - I think your note helps me
understand what you're thinking)
To answer your question "is
this...problem something that this proposal is suited for?":
Yes.
The problem isn't how to resend the
large message but instead its how the RMS can
reliably cleanup the sequence. You
seem to be focused on the higher-level recovery aspect
which isn't part of the issues or proposal.
That is a very important distinction.
All of the solutions you talk
about may be valid for that higher-level processing but that isn't
the focus of this proposal. Regardless
of what kind of error recovery (if any)
takes place afterwards the RMS still
needs to resolve this current bad state. And knowing the
final ack state is key (IMO) to doing
that. Now, you may say that the sequence should never
have gotten into this position to begin
with by not sending such a large message but that would
assume that the RMS knows about this
'large message issue' in advance - and we can't
assume that. Remember, this is
just one of possibly many reasons why a message could
not be delivered - its impossible for
the RMS to prevent or know about all of them in
advance - if it could then we wouldn't
need RM :-)
thanks
-Doug
"Stefan Batres" <stefanba@microsoft.com>
wrote on 09/07/2005 07:38:07 PM:
> Doug et al,
>
> This, IMHO, is an example of what I’m talking
about. Is the very-large-message
> problem something that this proposal is suited for? I don’t think
so (note that it
> doesn’t fit with “#1”); we’re speculating that it might be that
an RMS wants to
> gracefully complete a sequence on which there are messages that have
not been
> acknowledged because transmission failed due to the size of the messages.
In my
> view, the problem in this case is not how to complete a sequence with
holes, but
> rather, dealing with very-large-messages. For instance, a way to address
that
> problem would be with a message fragmentation protocol that would
work on top of
> WS-RM. Another way might be to fragment the messages at the app layer
and reflect
> that in the contract. Yet another way could be to limit the size of
messages via
> policy. Like the use case given before involving machine affinity
and machine
> failure[1], there are more appropriate ways to address this use case.
>
> [1] http://www.oasis-open.org/apps/org/workgroup/ws-rx/email/archives/200508/msg00303.html
>
> --Stefan
>
>
>
> From: Doug Davis [mailto:dug@us.ibm.com]
> Sent: Wednesday, September 07, 2005 5:50 AM
> To: ws-rx@lists.oasis-open.org
> Subject: RE: [ws-rx] i0019 - a formal proposal - take 2
>
>
> Stefan,
> The proposal attempts to address the two issues, i019 and i028.
Perhaps you're
> looking to close the issues as invalid? During the course of
these discussions
> several use cases have been mentioned as possible situation in which
the issues
> mentioned i019 and i028 will occur. If you think those use cases
fit into the "#1"
> you mentioned, and you believe that case to be rare then ok. I
don't see the
> situations i019 and i028 talk about as being rare nor do I think your
"#1" is the
> only case. I believe people have mentioned cases much less catastrophic,
such as
> extremely large messages just can not be delivered due to some network
issues
> (sadly something I run into quite a bit), that would still warrant
the need for
> this solution. But, I don't see the need to iterate all of them
since the entire
> point of the spec is that networks are not reliable and problems will
occur. So
> running into one that prevents us from getting 100% guaranteed complete
delivery
> every time isn't hard for me to imagine. But that's just me.
> thanks,
> -Doug
>
>
> "Stefan Batres" <stefanba@microsoft.com>
> 09/06/2005 01:13 PM
>
> To
>
> Doug Davis/Raleigh/IBM@IBMUS, <ws-rx@lists.oasis-open.org>
>
> cc
>
>
>
> Subject
>
> RE: [ws-rx] i0019 - a formal proposal - take 2
>
>
>
>
>
>
>
>
>
>
> Doug,
>
> What I’m trying to do is to identify the set of use cases this feature
attempts to
> address – you might disagree with the set I’ve identified and that
is perfectly
> valid. It is our job though to motivate changes to the contributed
specs. If you
> disagree with the way I’ve characterized the set of use cases for
this feature then
> it would really help if you could write down for me how you characterize
the use
> cases vs. the protocol as submitted. I hope you can take doing this
seriously; I
> don’t think it is a good design process to add features to the protocol
simply
> because we think they are helpful and refuse to do the leg work of
1) Defining the
> characteristics of the use cases when the features are helpful, 2)
Compare that
> against the contributed documents and 3) Go through the exercise of
identifying
> real world use cases that match said characteristics.
>
> --Stefan
>
>
>
>
>
> From: Doug Davis [mailto:dug@us.ibm.com]
> Sent: Monday, September 05, 2005 5:16 PM
> To: ws-rx@lists.oasis-open.org
> Subject: RE: [ws-rx] i0019 - a formal proposal - take 2
>
>
> Stefan,
> I disagree with the premise of your note. The use cases
for this feature are not
> limited to the cases you've mentioned, nor are they limited to the
cases I or
> anyone else has mentioned. So trying to fit all possible use
cases into the scope
> you defined just doesn't fly for me. The reason behind why the
RMS wants to get an
> accurate and final ack state could be just about anything - and as
tempting as it
> is to rambling off yet another possible reason why this feature would
be useful I'd
> prefer to not let the conversation get bogged down an attempt to limit
the scope of
> this feature. As I've mentioned, if as an implementor you don't
think you'll ever
> need this _optional_ feature then don't send it.
> thanks
> -Doug
>
> "Stefan Batres" <stefanba@microsoft.com>
> 09/05/2005 07:30 PM
>
>
>
> To
>
> Doug Davis/Raleigh/IBM@IBMUS, <ws-rx@lists.oasis-open.org>
>
> cc
>
>
>
> Subject
>
> RE: [ws-rx] i0019 - a formal proposal - take 2
>
>
>
>
>
>
>
>
>
>
>
>
>
> A quick correction to my comment below:
>
> Note that thus far, we’ve managed to describe exactly one scenario
that fits the #2
> description: [RMD] has separate state stores for session state and
messages – the
> latter fails but the former is still operable.
>
> The scenario we’ve talked about is where the RMD uses separate state
stores, not the RMS.
>
> --Stefan
>
>
>
>
>
>
> From: Stefan Batres [mailto:stefanba@microsoft.com]
> Sent: Thursday, September 01, 2005 10:40 AM
> To: Doug Davis; ws-rx@lists.oasis-open.org
> Subject: RE: [ws-rx] i0019 - a formal proposal - take 2
>
> Doug,
>
> I apologize if my rant below is a bit to cryptic, let me try again:
>
> 1. When a catastrophic failure occurs (e.g. RMD amnesia), an RMS has
to react in
> some way; It could return an error to the user or it can engage in
a recovery
> mechanism of some sort. I don’t believe you are trying to prescribe
what the RMS’s
> reaction ought to be.
> 2. As you’ve said time and again, this proposal is about getting
the RMS an
> accurate ack set in cases where: 1. A full ack set will never be possible
(or at
> least not in a reasonable amount of time), 2.There are messages that
have been sent
> and for which no ack has been received and 3. The problem that prevents
a full ack
> set doesn’t prevent the exchange of protocol messages.
>
> The point I was trying to make is that given #1 above, #2 is an optimization
for a
> case that will be relatively rare. Note that I don’t question for
a second the
> correctness of your proposal – what concerns me is adding elements
to the protocol
> for this specific case, #2, especially since apps will have to deal
with #1 anyway.
>
> Note that thus far, we’ve managed to describe exactly one scenario
that fits the #2
> description: RMS has separate state stores for session state and messages
– the
> latter fails but the former is still operable.
>
> --Stefan
>
>
>
>
>
>
> From: Doug Davis [mailto:dug@us.ibm.com]
> Sent: Wednesday, August 31, 2005 3:58 AM
> To: ws-rx@lists.oasis-open.org
> Subject: RE: [ws-rx] i0019 - a formal proposal - take 2
>
>
> I'm having a hard time following this. I sounds like you're
saying because the
> proposal does not solve all RM related problems you don't want to
have it in our
> 'bag of tricks' at all. Following that logic, why should we
distinguish between
> SequenceTerminated Fault and any other Fault? We do it because
we want to provide
> as much information back to the RMS as possible. What it uses
this information for
> is up to it.
> As I've said may times before, this proposal does not suggest ANY
recovery scheme.
> What I've done (outside of the proposal itself) is discuss how I _think_
an RMS
> might use this information in some error recovery mechanism but this
proposal
> itself does not suggest one. This proposal simply provides a
mechanism for the RMS
> to get an accurate accounting of the state of the sequence - that's
it. How the
> RMS uses this information is up to it. If for nothing else it
may choose to simply
> log the information - that alone is invaluable to someone trying to
figure out
> what's going on. And I'm having a hard time understanding why
providing an
> _optional_ mechanism that could aide in the RMS getting an accurate
accounting of
> the state of the sequence (without having to call up the RMD's admin)
is a bad thing.
> thanks,
> -Doug
>
> "Stefan Batres" <stefanba@microsoft.com>
> 08/31/2005 01:48 AM
>
>
>
>
> To
>
> Doug Davis/Raleigh/IBM@IBMUS, <ws-rx@lists.oasis-open.org>
>
> cc
>
>
>
> Subject
>
> RE: [ws-rx] i0019 - a formal proposal - take 2
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Doug,
>
> You mention a specific situation: An RMD experiences a failure that
prevents it
> from receiving application messages. I agree in so far as saying that
in such a
> failure case this proposal could be helpful in that it helps the RMS
to engage in
> recovery of some sort (either inform applications that a specific
message was not
> sent or open a new sequence, assuming ordering is not important).
But this is not
> the only failure case that applications will want to deal with (with
or without
> help from the protocol).
> Consider the case where connectivity is lost for long enough for both
sequences to
> expire or consider the case where the destination suffers a loss of
session state.
> In such failure modes this solution is not helpful – yet applications
will need a
> recovery strategy of some sort. It might be that it is application
specific, or it
> might be that a general failure recovery specification is created
and ratified at
> some point. The important idea is that the only way to deal with all
failure modes
> is at higher level. This proposal leverages the protocol to optimize
recovery in
> specific circumstances that should be relatively rare. RM implementations
should
> not be required to support failure mode recovery mechanisms that either
don’t apply
> to them or that they choose to implement in a uniform way at a higher
level.
>
> Thanks
>
> --Stefan
>
>
>
>
>
>
>
>
>
> From: Doug Davis [mailto:dug@us.ibm.com]
> Sent: Tuesday, August 30, 2005 1:08 PM
> To: ws-rx@lists.oasis-open.org
> Subject: RE: [ws-rx] i0019 - a formal proposal - take 2
>
>
> Yet more comments. :-)
> -Doug
>
> "Stefan Batres" <stefanba@microsoft.com>
> 08/30/2005 03:35 PM
>
>
>
>
>
> To
>
> Doug Davis/Raleigh/IBM@IBMUS, <ws-rx@lists.oasis-open.org>
>
> cc
>
>
>
> Subject
>
> RE: [ws-rx] i0019 - a formal proposal - take 2
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Doug,
>
> Some more comments and thoughts on your proposal:
>
>
> <dug>... When or why an RMS uses CloseSequence is up to it to
decide.
> All we know is that it wants to shut things down and get an accurate
ACK from the RMD.</dug>
>
> I still have not heard of a plausible reason why an RMS “wants to
shut things down”
> and the current spec presents a problem. Comparing the spec as it
stands today vs.
> the spec + this proposal:
>
> TODAY: RMS wants to end the sequence so it sends
a LastMessage and must wait for a
> complete set of acks; this might require retransmitting messages.
Once a full set
> of acks is received RMS sends TerminateSequence.
>
> TODAY + THIS PROPOSAL: RMS wants to end the sequence
so it sends Close, waits for a
> CloseResponse, possibly retransmitting the Close. Once a CloseResponse
is received
> RMS sends TerminateSequence.
>
> The problem with the TODAY scenario, as I’ve heard it in this forum,
is that the
> RMS might have to wait unacceptably long between sending LastMessage
and getting a
> full ack range. But if getting some messages or acks across proves
difficult; why
> would the RMS expect that getting Close across would be any easier?
>
> <dug> 1 - I don't believe your text is accurate in that Close
is supposed to be
> used in cases where the sequence needs to end due to something going
wrong. You've
> described a case where the sequence is functioning just fine - and
while Close can
> be used in those cases as well, it provides no additional value. 2-
Sending a
> Close and sending application data can have quite a different set
of features
> executed so I don't think its hard to imagine cases where RM messages
can get
> processed just fine but application messages run into problems. I
believe Chris
> mentioned on some call the notion of two different persistent stores
- one for RM
> data and one for app-data. Its possible that the app-data one
is running into
> problems. 3 - Using the CloseSequence operation is option -
if you feel that, as
> an RMS implementor, you'll never see its usefulness then you're free
to never
> implement/send it. However, I'd hate remove this option for
those of us who do see
> value in it. </dug>
>
>
>
>
> <dug>The case that I keep thinking about is one where the RMD
is actually a cluster
> of machines and when a sequence gets created it has an affinity to
a certain server
> in the cluster - meaning it processes all of the messages for that
sequence. If
> that server starts to have problems, and for some reason it just can't
seem to
> process any new app messages then the RMS can close down the sequence
and start up
> a new one. Hopefully, the new sequence will be directed to a different
server in
> the cluster. </dug>
>
> There are two problems with this scenario and the proposed solution.
> 1. If an RMD has sequence-to-machine affinity
that should be strictly the RMDs
> decision and the RMDs problem. The RMS is autonomous; this proposal
puts
> expectations on the RMS’ behavior based on particularities of the
RMD
> implementation. To be clear, I’ll note that affinity can be achieved
in two ways:
>
i.
By performing
> stateful routing at the RMD; basically the RMD has to remember every
active
> sequence and what machine it has affinity to. In this case it would
be simple to
> change the RMD’s routing table when a machine fails.
>
ii.
By generating
> different EPR’s for each machine. For affinity to function this way
two things are necessary:
> 1. Some sort of endpoint resolution mechanism
would have to be devised for the
> RMS to learn the EPR that it should target.
> 2. A mechanism for migrating that EPR.
> Clearly 1) and 2) are outside the scope of the
TC and, in my view, this proposal
> might be defining 2) in an informal way that is specific to WS-RM.
>
> 2. If the RMS somehow guesses that there is a
problem on the EPR to which it
> is sending its messages and somehow decides that Closing the sequence
and starting
> a new one is the right course of action, ordering guarantees are compromised.
>
> <dug> I probably didn't state the problem very well. I
didn't intend to claim that
> the RMS knew about this affinity, but instead it knew that something
was wrong with
> the current sequence and in order to try to fix the situation it decided
to try
> another sequence. The affinity bit was thrown in there to explain
why starting a
> new sequence _might_ fix the problem.
>
> I should also point out that while a lot of these discussions have
focused on
> InOrder+ExactlyOnce DA, this feature is still useful in other DAs.
For example, if
> the DA is just ExactlyOnce - having an accurate accounting of the
ACKs allows a
> subsequent sequence to send just the gaps from the first, so getting
an accurate
> list of the gaps becomes critical. And this of course leads
us to the discussion
> of how to determine the DA in use - which I think might be part of
issues 6, 9, 24 and 27.
> </dug>
>
> Finally, I agree with you that considering a gap-filling mechanism
would be a good
> thing for this TC to do.
>
>
> --Stefan
>
>
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]