Doug, you said below:
“I
don't believe your text is accurate in that Close is supposed to be used in
cases where the sequence needs to end due to something going wrong.”
i019 is titled Sequence termination on
Fault and is concerned with the RMD terminating a sequence and the RMS wanting
to know the final state of sent messages. So if Close is not for addressing
situations where something has gone wrong, like faults that this issue is
concerned with, then what is it for? Furthermore this proposal is all about
actions taken by the RMS, how does that solve the issue of problems originating
at the RMD?
You then go on to say:
“You've described a case where the sequence is
functioning just fine - and while Close can be used in those cases as well, it
provides no additional value.”
So if Close is not for addressing the
fault issues in i019 and it isn’t for use when a sequence is fine then
what is it for?
You then go on to specify a use of the
Close operation when there is a problem. It is unclear to me if the problem you
describe is at the RMS or RMD. I’m also now even more confused as to what
problem this is solving based on your own descriptions it seems we have
wandered away from the issue this is supposed to be addressing.
I also share concerns with this that it
doesn’t work with all of the DAs. That makes it unique in a bad way, I
can’t think of another feature in RM that would break when one DA was in
place but not another.
From: Doug Davis
[mailto:dug@us.ibm.com]
Sent: Tuesday, August 30, 2005
1:08 PM
To: ws-rx@lists.oasis-open.org
Subject: RE: [ws-rx] i0019 - a
formal proposal - take 2
Yet more comments. :-)
-Doug
"Stefan Batres"
<stefanba@microsoft.com>
08/30/2005 03:35 PM
|
To
|
Doug Davis/Raleigh/IBM@IBMUS,
<ws-rx@lists.oasis-open.org>
|
cc
|
|
Subject
|
RE: [ws-rx] i0019 - a formal proposal - take
2
|
|
Doug,
Some
more comments and thoughts on your proposal:
<dug>... When or why an RMS uses CloseSequence is up to
it to decide.
All we know is that it wants to shut things down and get an
accurate ACK from the RMD.</dug>
I
still have not heard of a plausible reason why an RMS “wants to shut
things down” and the current spec presents a problem. Comparing the spec
as it stands today vs. the spec + this proposal:
- TODAY: RMS wants to end the sequence so it
sends a LastMessage and must wait for a complete set of acks; this might
require retransmitting messages. Once a full set of acks is received RMS
sends TerminateSequence.
- TODAY + THIS PROPOSAL: RMS wants to end the sequence
so it sends Close, waits for a CloseResponse, possibly retransmitting the
Close. Once a CloseResponse is received RMS sends TerminateSequence.
The
problem with the TODAY scenario, as I’ve heard it in this forum, is that
the RMS might have to wait unacceptably long between sending LastMessage and
getting a full ack range. But if getting some messages or acks across proves
difficult; why would the RMS expect that getting Close across would be any
easier?
<dug> 1 - I don't
believe your text is accurate in that Close is supposed to be used in cases
where the sequence needs to end due to something going wrong. You've
described a case where the sequence is functioning just fine - and while Close
can be used in those cases as well, it provides no additional value. 2-
Sending a Close and sending application data can have quite a different set of
features executed so I don't think its hard to imagine cases where RM messages
can get processed just fine but application messages run into problems. I
believe Chris mentioned on some call the notion of two different persistent
stores - one for RM data and one for app-data. Its possible that the
app-data one is running into problems. 3 - Using the CloseSequence
operation is option - if you feel that, as an RMS implementor, you'll never see
its usefulness then you're free to never implement/send it. However, I'd
hate remove this option for those of us who do see value in it.
</dug>
<dug>The case that I keep thinking about is one where
the RMD is actually a cluster of machines and when a sequence gets created it
has an affinity to a certain server in the cluster - meaning it processes all
of the messages for that sequence. If that server starts to have problems, and
for some reason it just can't seem to process any new app messages then the RMS
can close down the sequence and start up a new one. Hopefully, the new sequence
will be directed to a different server in the cluster. </dug>
There
are two problems with this scenario and the proposed solution.
1. If an
RMD has sequence-to-machine affinity that should be strictly the RMDs decision
and the RMDs problem. The RMS is autonomous; this proposal puts expectations on
the RMS’ behavior based on particularities of the RMD implementation. To
be clear, I’ll note that affinity can be achieved in two ways:
i.
By performing
stateful routing at the RMD; basically the RMD has to remember every active
sequence and what machine it has affinity to. In this case it would be simple
to change the RMD’s routing table when a machine fails.
ii.
By generating
different EPR’s for each machine. For affinity to function this way two
things are necessary:
1. Some
sort of endpoint resolution mechanism would have to be devised for the RMS to
learn the EPR that it should target.
2. A
mechanism for migrating that EPR.
Clearly
1) and 2) are outside the scope of the TC and, in my view, this proposal might
be defining 2) in an informal way that is specific to WS-RM.
2. If the
RMS somehow guesses that there is a problem on the EPR to which it is sending
its messages and somehow decides that Closing the sequence and starting a new
one is the right course of action, ordering guarantees are compromised.
<dug> I probably didn't
state the problem very well. I didn't intend to claim that the RMS knew
about this affinity, but instead it knew that something was wrong with the
current sequence and in order to try to fix the situation it decided to try
another sequence. The affinity bit was thrown in there to explain why
starting a new sequence _might_ fix the problem.
I should also point out that
while a lot of these discussions have focused on InOrder+ExactlyOnce DA, this
feature is still useful in other DAs. For example, if the DA is just
ExactlyOnce - having an accurate accounting of the ACKs allows a subsequent
sequence to send just the gaps from the first, so getting an accurate list of
the gaps becomes critical. And this of course leads us to the discussion
of how to determine the DA in use - which I think might be part of issues 6, 9,
24 and 27.
</dug>
Finally,
I agree with you that considering a gap-filling mechanism would be a good thing
for this TC to do.
--Stefan