Jaques, you said:
“The general use case is the one where gaps
exist and persist and for a variety of reason, were not / could not be filled
at the time the sequence is no longer to be used (for whatever reason) and
needs to be disposed of.”
In following the many proposals being made
here it continued to strike me that what you are looking for is a way to fill
gaps so it is nice to see that confirmed. Couldn’t filling gaps in a
sequence be done in a much simpler manner than the current proposal?
From: Jacques Durand
Sent: Wednesday, August 31, 2005
To: Stefan Batres; Doug Davis;
Subject: RE: [ws-rx] i0019 - a
formal proposal - take 2
2 comments Inline
From: Stefan Batres
Sent: Tuesday, August 30, 2005
To: Doug Davis;
Subject: RE: [ws-rx] i0019 - a formal
proposal - take 2
mention a specific situation: An RMD experiences a failure that prevents it
from receiving application messages. I agree in so far as saying that in such a
failure case this proposal could be helpful in that it helps the RMS to engage
in recovery of some sort (either inform applications that a specific message
was not sent or open a new sequence, assuming ordering is not important). But
this is not the only failure case that applications will want to deal with
(with or without help from the protocol).
the case where connectivity is lost for long enough for both sequences to
expire or consider the case where the destination suffers a loss of session
state. In such failure modes this solution is not helpful - yet applications
will need a recovery strategy of some sort. It might be that it is application
specific, or it might be that a general failure recovery specification is
created and ratified at some point. The important idea is that the only way to
deal with all failure modes is at higher level. This proposal leverages the
protocol to optimize recovery in specific circumstances that should be
relatively rare. RM implementations should not be required to support
failure mode recovery mechanisms that either don't apply to them or that they
choose to implement in a uniform way at a higher level.
I do not see better recovery as the main driver behind resolving i019 - though
enhanced recovery can certainly be a byproduct of it, yet in no different way
than say the recovery made possible by the mechanisms behind AtLeastOnce DA
("... or else an error will be raised on at least one
endpoint"). Such error-raising is serving a purpose, whatever usage is
made of these "errors", (and indeed in many cases they require application-level
handling as you said - sometimes also just application awareness may have great
value). But just because of this, we want errors to be raised as accurately as
possible. I believe the proposal for i019 allows for achieving greater awareness
of delivery failure on RMS / AS side at no greater cost, and that applies not
just to I019 but to i028 as well, where the sequence is not faulted. The
general use case is the one where gaps exist and persist and for a variety of
reason, were not / could not be filled at the time the sequence is no longer to
be used (for whatever reason) and needs to be disposed of.
From: Doug Davis
Sent: Tuesday, August 30, 2005
Subject: RE: [ws-rx] i0019 - a
formal proposal - take 2
Yet more comments. :-)
RE: [ws-rx] i0019 - a formal proposal - take 2
more comments and thoughts on your proposal:
<dug>... When or why an RMS uses CloseSequence is up to
it to decide.
All we know is that it wants to shut things down and get an
accurate ACK from the RMD.</dug>
still have not heard of a plausible reason why an RMS "wants to shut
things down" and the current spec presents a problem. Comparing the spec
as it stands today vs. the spec + this proposal:
- TODAY: RMS wants to end the sequence so it
sends a LastMessage and must wait for a complete set of acks; this might
require retransmitting messages. Once a full set of acks is received RMS
- TODAY + THIS PROPOSAL: RMS wants to end the
sequence so it sends Close, waits for a CloseResponse, possibly
retransmitting the Close. Once a CloseResponse is received RMS sends
The problem with the TODAY scenario, as I've heard it in
this forum, is that the RMS might have to wait unacceptably long between
sending LastMessage and getting a full ack range. But if getting some messages
or acks across proves difficult; why would the RMS expect that getting Close
across would be any easier?
messages may not have made it to RMD for various reasons that do not
necessarily apply to the Close op. You may also have the option of resending the
Close op in a way (say over 24h) that you could not afford to do on large scale
via a policy that has to apply to all regular messages, due to network
bandwidth or due to the time-bound value of these messages (message may loose
value if untimely - yet RMS and AS want to be sure which ones were lost)
. So even a delayed closing still have value for accuracy of acknowledgements.
<dug> 1 - I don't
believe your text is accurate in that Close is supposed to be used in cases
where the sequence needs to end due to something going wrong. You've
described a case where the sequence is functioning just fine - and while Close
can be used in those cases as well, it provides no additional value. 2-
Sending a Close and sending application data can have quite a different set of
features executed so I don't think its hard to imagine cases where RM messages
can get processed just fine but application messages run into problems. I
believe Chris mentioned on some call the notion of two different persistent
stores - one for RM data and one for app-data. Its possible that the
app-data one is running into problems. 3 - Using the CloseSequence
operation is option - if you feel that, as an RMS implementor, you'll never see
its usefulness then you're free to never implement/send it. However, I'd
hate remove this option for those of us who do see value in it.
<dug>The case that I keep thinking about is one where
the RMD is actually a cluster of machines and when a sequence gets created it
has an affinity to a certain server in the cluster - meaning it processes all
of the messages for that sequence. If that server starts to have problems, and
for some reason it just can't seem to process any new app messages then the RMS
can close down the sequence and start up a new one. Hopefully, the new sequence
will be directed to a different server in the cluster. </dug>
are two problems with this scenario and the proposed solution.
1. If an
RMD has sequence-to-machine affinity that should be strictly the RMDs decision
and the RMDs problem. The RMS is autonomous; this proposal puts expectations on
the RMS' behavior based on particularities of the RMD implementation. To be
clear, I'll note that affinity can be achieved in two ways:
stateful routing at the RMD; basically the RMD has to remember every active
sequence and what machine it has affinity to. In this case it would be simple
to change the RMD's routing table when a machine fails.
different EPR's for each machine. For affinity to function this way two things
1. Some sort
of endpoint resolution mechanism would have to be devised for the RMS to learn
the EPR that it should target.
mechanism for migrating that EPR.
1) and 2) are outside the scope of the TC and, in my view, this proposal might
be defining 2) in an informal way that is specific to WS-RM.
If the RMS somehow guesses that
there is a problem on the EPR to which it is sending its messages and somehow
decides that Closing the sequence and starting a new one is the right course of
action, ordering guarantees are compromised.
<dug> I probably didn't
state the problem very well. I didn't intend to claim that the RMS knew
about this affinity, but instead it knew that something was wrong with the
current sequence and in order to try to fix the situation it decided to try
another sequence. The affinity bit was thrown in there to explain why
starting a new sequence _might_ fix the problem.
I should also point out that
while a lot of these discussions have focused on InOrder+ExactlyOnce DA, this
feature is still useful in other DAs. For example, if the DA is just
ExactlyOnce - having an accurate accounting of the ACKs allows a subsequent
sequence to send just the gaps from the first, so getting an accurate list of
the gaps becomes critical. And this of course leads us to the discussion
of how to determine the DA in use - which I think might be part of issues 6, 9,
24 and 27.
I agree with you that considering a gap-filling mechanism would be a good thing
for this TC to do.