Marc:
Let me put on my T-shirt "Gaps
happen" ;-)
When I said hat the use case is about
sequences with gaps (regardless of the mode they cease to be used - i019
or i028) I only describe a situation that - as unfortunate as it is -
we have to live with: some sequences will have gaps regardless of how much
effort we deploy to get missing messages through *within this sequence*.
If we really wanted, we could make sure a
sequence is always complete by the time it is terminated, but at a very high
cost: an RMS would have to imperatively close any sequence for which it gave up
retransmission efforts on a missing message. That is not realistic: will it
stop sending further messages until it gets acks for all previous ones, just to
be sure not to create gaps in the first place?
Given this, the proposal on the table
allows a Source to at least know precisely what these gaps are at the time the
sequence is terminated. Nothing more.
Now, this information can be dealt with in
different ways that are beyond this issue:
-
some RMS
may decide to resend these in a new sequence, later.
-
Some AS
may just be happy to get the error raised and know about it (that will surely
help fulfill a DA like AtLeastOnce that requires delivery failures be notified
at one endpoint, preferably the one that can do something about it.)
But this "dealing with the failure"
phase is beyond the issues at stake.
Jacques
From: Doug Davis
[mailto:dug@us.ibm.com]
Sent: Thursday, September 01, 2005
5:47 AM
To: ws-rx@lists.oasis-open.org
Subject: RE: [ws-rx] i0019 - a
formal proposal - take 2
Just to make sure its said yet again, the proposal does
not suggest a mechanism through which gaps can be filled.
A
solution to fill gaps is a totally different issue.
-Doug
"Marc
Goodner" <mgoodner@microsoft.com> wrote on 09/01/2005 02:45:10 AM:
> Jaques, you said:
>
"The general use case is the one where gaps exist and persist and for a
variety of
> reason, were not / could not be filled at the
time the sequence is no longer to be
> used (for whatever reason) and needs to be
disposed of."
>
> In
following the many proposals being made here it continued to strike me that
what
> you are looking for is a way to fill gaps so
it is nice to see that confirmed.
> Couldn't filling gaps in a sequence be
done in a much simpler manner than the
> current proposal?
>
>
> From: Jacques
Durand [mailto:JDurand@us.fujitsu.com]
> Sent: Wednesday, August 31, 2005 1:24 PM
> To: Stefan Batres; Doug Davis;
ws-rx@lists.oasis-open.org
> Subject: RE: [ws-rx] i0019 - a formal
proposal - take 2
>
> 2
comments Inline <JD>
>
>
> From: Stefan Batres
[mailto:stefanba@microsoft.com]
> Sent: Tuesday, August 30, 2005 10:48 PM
> To: Doug Davis; ws-rx@lists.oasis-open.org
> Subject: RE: [ws-rx] i0019 - a formal
proposal - take 2
>
> Doug,
>
> You
mention a specific situation: An RMD experiences a failure that prevents it
> from receiving application messages. I agree
in so far as saying that in such a
> failure case this proposal could be helpful
in that it helps the RMS to engage in
> recovery of some sort (either inform
applications that a specific message was not
> sent or open a new sequence, assuming
ordering is not important). But this is not
> the only failure case that applications will
want to deal with (with or without
> help from the protocol).
>
Consider the case where connectivity is lost for long enough for both sequences
to
> expire or consider the case where the
destination suffers a loss of session state.
> In such failure modes this solution is not
helpful - yet applications will need a
> recovery strategy of some sort. It might be
that it is application specific, or it
> might be that a general failure recovery
specification is created and ratified at
> some point. The important idea is that the
only way to deal with all failure modes
> is at higher level. This proposal leverages
the protocol to optimize recovery in
> specific circumstances that should be
relatively rare. RM implementations should
> not be required to support failure mode
recovery mechanisms that either don't apply
> to them or that they choose to implement in a
uniform way at a higher level.
>
>
<JD> I do not see better recovery as the main driver behind resolving
i019 - though
> enhanced recovery can certainly be a
byproduct of it, yet in no different way than
> say the recovery made possible by the
mechanisms behind AtLeastOnce DA ("... or
> else an error will be raised on at least one
endpoint"). Such error-raising is
> serving a purpose, whatever usage is made of
these "errors", (and indeed in many
> cases they require application-level
handling as you said - sometimes also just
> application awareness may have great value).
But just because of this, we want
> errors to be raised as accurately as
possible. I believe the proposal for i019
> allows for achieving greater awareness of
delivery failure on RMS / AS side at no
> greater cost, and that applies not just to
I019 but to i028 as well, where the
> sequence is not faulted. The general use case
is the one where gaps exist and
> persist and for a variety of reason, were not
/ could not be filled at the time the
> sequence is no longer to be used (for
whatever reason) and needs to be disposed of.
>
> Thanks
>
>
--Stefan
>
>
>
> From: Doug Davis [mailto:dug@us.ibm.com]
> Sent: Tuesday, August 30, 2005 1:08 PM
> To: ws-rx@lists.oasis-open.org
> Subject: RE: [ws-rx] i0019 - a formal
proposal - take 2
>
>
> Yet more comments. :-)
> -Doug
>
> "Stefan Batres"
<stefanba@microsoft.com>
>
08/30/2005 03:35 PM
>
> To
>
> Doug Davis/Raleigh/IBM@IBMUS,
<ws-rx@lists.oasis-open.org>
>
> cc
>
>
>
> Subject
>
> RE: [ws-rx] i0019 - a formal proposal - take
2
>
>
>
>
>
>
>
>
>
>
> Doug,
>
> Some more comments and thoughts on your
proposal:
>
>
> <dug>... When or why an RMS uses
CloseSequence is up to it to decide.
> All we know is that it wants to shut things
down and get an accurate ACK from the RMD.</dug>
>
> I still have not heard of a plausible reason
why an RMS "wants to shut things down"
> and the current spec presents a problem.
Comparing the spec as it stands today vs.
> the spec + this proposal:
>
> TODAY:
RMS wants to end the sequence so it sends a LastMessage and must wait for a
> complete set of acks; this might require
retransmitting messages. Once a full set
> of acks is received RMS sends
TerminateSequence.
>
> TODAY +
THIS PROPOSAL: RMS wants to end the sequence so it sends Close, waits for a
> CloseResponse, possibly retransmitting the
Close. Once a CloseResponse is received
> RMS sends TerminateSequence.
>
> The problem with the TODAY scenario, as I've
heard it in this forum, is that the
> RMS might have to wait unacceptably long between
sending LastMessage and getting a
> full ack range. But if getting some messages
or acks across proves difficult; why
> would the RMS expect that getting Close
across would be any easier?
>
<JD> Some messages may not have made it to RMD for various reasons that
do not
> necessarily apply to the Close op. You may
also have the option of resending the
> Close op in a way (say over 24h) that you
could not afford to do on large scale via
> a policy that has to apply to all regular
messages, due to network bandwidth or due
> to the time-bound value of these messages
(message may loose value if untimely -
> yet RMS and AS want to be sure which
ones were lost) . So even a delayed closing
> still have value for accuracy of
acknowledgements.
>
> <dug> 1 - I don't believe your text is
accurate in that Close is supposed to be
> used in cases where the sequence needs to end
due to something going wrong. You've
> described a case where the sequence is
functioning just fine - and while Close can
> be used in those cases as well, it provides
no additional value. 2- Sending a
> Close and sending application data can have
quite a different set of features
> executed so I don't think its hard to imagine
cases where RM messages can get
> processed just fine but application messages
run into problems. I believe Chris
> mentioned on some call the notion of two
different persistent stores - one for RM
> data and one for app-data. Its possible
that the app-data one is running into
> problems. 3 - Using the CloseSequence
operation is option - if you feel that, as
> an RMS implementor, you'll never see its
usefulness then you're free to never
> implement/send it. However, I'd hate
remove this option for those of us who do see
> value in it. </dug>
>
>
>
> <dug>The case that I keep thinking
about is one where the RMD is actually a cluster
> of machines and when a sequence gets created
it has an affinity to a certain server
> in the cluster - meaning it processes all of
the messages for that sequence. If
> that server starts to have problems, and for
some reason it just can't seem to
> process any new app messages then the RMS can
close down the sequence and start up
> a new one. Hopefully, the new sequence will
be directed to a different server in
> the cluster. </dug>
>
> There are two problems with this scenario and
the proposed solution.
> 1. If an RMD has
sequence-to-machine affinity that should be strictly the RMDs
> decision and the RMDs problem. The RMS is
autonomous; this proposal puts
> expectations on the RMS' behavior based on
particularities of the RMD
> implementation. To be clear, I'll note that
affinity can be achieved in two ways:
>
i.
By performing
> stateful routing at the RMD; basically the
RMD has to remember every active
> sequence and what machine it has affinity to.
In this case it would be simple to
> change the RMD's routing table when a machine
fails.
>
ii.
By generating
> different EPR's for each machine. For
affinity to function this way two things are necessary:
> 1. Some sort of endpoint
resolution mechanism would have to be devised for the
> RMS to learn the EPR that it should target.
> 2. A mechanism for
migrating that EPR.
> Clearly
1) and 2) are outside the scope of the TC and, in my view, this proposal
> might be defining 2) in an informal way that
is specific to WS-RM.
>
> 2. If the RMS somehow
guesses that there is a problem on the EPR to which it
> is sending its messages and somehow decides
that Closing the sequence and starting
> a new one is the right course of action,
ordering guarantees are compromised.
>
> <dug> I probably didn't state the
problem very well. I didn't intend to claim that
> the RMS knew about this affinity, but instead
it knew that something was wrong with
> the current sequence and in order to try to
fix the situation it decided to try
> another sequence. The affinity bit was
thrown in there to explain why starting a
> new sequence _might_ fix the problem.
>
> I should also point out that while a lot of
these discussions have focused on
> InOrder+ExactlyOnce DA, this feature is still
useful in other DAs. For example, if
> the DA is just ExactlyOnce - having an
accurate accounting of the ACKs allows a
> subsequent sequence to send just the gaps
from the first, so getting an accurate
> list of the gaps becomes critical. And
this of course leads us to the discussion
> of how to determine the DA in use - which I
think might be part of issues 6, 9, 24 and 27.
> </dug>
>
> Finally, I agree with you that considering a
gap-filling mechanism would be a good
> thing for this TC to do.
>
>
> --Stefan
>
>