ws-rx message

Subject: RE: [ws-rx] i0019 - a formal proposal - take 2

From: Doug Davis <dug@us.ibm.com>
To: ws-rx@lists.oasis-open.org
Date: Thu, 1 Sep 2005 08:46:55 -0400

Just to make sure its said yet again, the proposal does not suggest a mechanism through which gaps can be filled.
A solution to fill gaps is a totally different issue.
-Doug

"Marc Goodner" <mgoodner@microsoft.com> wrote on 09/01/2005 02:45:10 AM: > Jaques, you said:
> “The general use case is the one where gaps exist and persist and for a variety of > reason, were not / could not be filled at the time the sequence is no longer to be > used (for whatever reason) and needs to be disposed of.”
>
> In following the many proposals being made here it continued to strike me that what > you are looking for is a way to fill gaps so it is nice to see that confirmed. > Couldn’t filling gaps in a sequence be done in a much simpler manner than the > current proposal?
>
> > From: Jacques Durand [mailto:JDurand@us.fujitsu.com] > Sent: Wednesday, August 31, 2005 1:24 PM > To: Stefan Batres; Doug Davis; ws-rx@lists.oasis-open.org > Subject: RE: [ws-rx] i0019 - a formal proposal - take 2
>
> 2 comments Inline <JD>
>
> > From: Stefan Batres [mailto:stefanba@microsoft.com] > Sent: Tuesday, August 30, 2005 10:48 PM > To: Doug Davis; ws-rx@lists.oasis-open.org > Subject: RE: [ws-rx] i0019 - a formal proposal - take 2
>
> Doug,
>
> You mention a specific situation: An RMD experiences a failure that prevents it > from receiving application messages. I agree in so far as saying that in such a > failure case this proposal could be helpful in that it helps the RMS to engage in > recovery of some sort (either inform applications that a specific message was not > sent or open a new sequence, assuming ordering is not important). But this is not > the only failure case that applications will want to deal with (with or without > help from the protocol).
> Consider the case where connectivity is lost for long enough for both sequences to > expire or consider the case where the destination suffers a loss of session state. > In such failure modes this solution is not helpful - yet applications will need a > recovery strategy of some sort. It might be that it is application specific, or it > might be that a general failure recovery specification is created and ratified at > some point. The important idea is that the only way to deal with all failure modes > is at higher level. This proposal leverages the protocol to optimize recovery in > specific circumstances that should be relatively rare. RM implementations should > not be required to support failure mode recovery mechanisms that either don't apply > to them or that they choose to implement in a uniform way at a higher level.
>
> <JD> I do not see better recovery as the main driver behind resolving i019 - though > enhanced recovery can certainly be a byproduct of it, yet in no different way than > say the recovery made possible by the mechanisms behind AtLeastOnce DA ("... or > else an error will be raised on at least one endpoint"). Such error-raising is > serving a purpose, whatever usage is made of these "errors", (and indeed in many > cases they require application-level handling as you said - sometimes also just > application awareness may have great value). But just because of this, we want > errors to be raised as accurately as possible. I believe the proposal for i019 > allows for achieving greater awareness of delivery failure on RMS / AS side at no > greater cost, and that applies not just to I019 but to i028 as well, where the > sequence is not faulted. The general use case is the one where gaps exist and > persist and for a variety of reason, were not / could not be filled at the time the > sequence is no longer to be used (for whatever reason) and needs to be disposed of.
>
> Thanks
>
> --Stefan
>
>
> > From: Doug Davis [mailto:dug@us.ibm.com] > Sent: Tuesday, August 30, 2005 1:08 PM > To: ws-rx@lists.oasis-open.org > Subject: RE: [ws-rx] i0019 - a formal proposal - take 2
>
> > Yet more comments. :-) > -Doug
> > "Stefan Batres" <stefanba@microsoft.com>
> 08/30/2005 03:35 PM
> > To
> > Doug Davis/Raleigh/IBM@IBMUS, <ws-rx@lists.oasis-open.org>
> > cc
> >
> > Subject
> > RE: [ws-rx] i0019 - a formal proposal - take 2
> >
> >
> >
> > > > > Doug, > > Some more comments and thoughts on your proposal: > > > <dug>... When or why an RMS uses CloseSequence is up to it to decide. > All we know is that it wants to shut things down and get an accurate ACK from the RMD.</dug> > > I still have not heard of a plausible reason why an RMS "wants to shut things down" > and the current spec presents a problem. Comparing the spec as it stands today vs. > the spec + this proposal: >
> TODAY: RMS wants to end the sequence so it sends a LastMessage and must wait for a > complete set of acks; this might require retransmitting messages. Once a full set > of acks is received RMS sends TerminateSequence.
>
> TODAY + THIS PROPOSAL: RMS wants to end the sequence so it sends Close, waits for a > CloseResponse, possibly retransmitting the Close. Once a CloseResponse is received > RMS sends TerminateSequence.
> > The problem with the TODAY scenario, as I've heard it in this forum, is that the > RMS might have to wait unacceptably long between sending LastMessage and getting a > full ack range. But if getting some messages or acks across proves difficult; why > would the RMS expect that getting Close across would be any easier?
> <JD> Some messages may not have made it to RMD for various reasons that do not > necessarily apply to the Close op. You may also have the option of resending the > Close op in a way (say over 24h) that you could not afford to do on large scale via > a policy that has to apply to all regular messages, due to network bandwidth or due > to the time-bound value of these messages (message may loose value if untimely - > yet RMS and AS want to be sure which ones were lost) . So even a delayed closing > still have value for accuracy of acknowledgements.
> > <dug> 1 - I don't believe your text is accurate in that Close is supposed to be > used in cases where the sequence needs to end due to something going wrong. You've > described a case where the sequence is functioning just fine - and while Close can > be used in those cases as well, it provides no additional value. 2- Sending a > Close and sending application data can have quite a different set of features > executed so I don't think its hard to imagine cases where RM messages can get > processed just fine but application messages run into problems. I believe Chris > mentioned on some call the notion of two different persistent stores - one for RM > data and one for app-data. Its possible that the app-data one is running into > problems. 3 - Using the CloseSequence operation is option - if you feel that, as > an RMS implementor, you'll never see its usefulness then you're free to never > implement/send it. However, I'd hate remove this option for those of us who do see > value in it. </dug> > > > > <dug>The case that I keep thinking about is one where the RMD is actually a cluster > of machines and when a sequence gets created it has an affinity to a certain server > in the cluster - meaning it processes all of the messages for that sequence. If > that server starts to have problems, and for some reason it just can't seem to > process any new app messages then the RMS can close down the sequence and start up > a new one. Hopefully, the new sequence will be directed to a different server in > the cluster. </dug> > > There are two problems with this scenario and the proposed solution. > 1. If an RMD has sequence-to-machine affinity that should be strictly the RMDs > decision and the RMDs problem. The RMS is autonomous; this proposal puts > expectations on the RMS' behavior based on particularities of the RMD > implementation. To be clear, I'll note that affinity can be achieved in two ways: > i. By performing > stateful routing at the RMD; basically the RMD has to remember every active > sequence and what machine it has affinity to. In this case it would be simple to > change the RMD's routing table when a machine fails. > ii. By generating > different EPR's for each machine. For affinity to function this way two things are necessary: > 1. Some sort of endpoint resolution mechanism would have to be devised for the > RMS to learn the EPR that it should target. > 2. A mechanism for migrating that EPR.
> Clearly 1) and 2) are outside the scope of the TC and, in my view, this proposal > might be defining 2) in an informal way that is specific to WS-RM.
> > 2. If the RMS somehow guesses that there is a problem on the EPR to which it > is sending its messages and somehow decides that Closing the sequence and starting > a new one is the right course of action, ordering guarantees are compromised. > > <dug> I probably didn't state the problem very well. I didn't intend to claim that > the RMS knew about this affinity, but instead it knew that something was wrong with > the current sequence and in order to try to fix the situation it decided to try > another sequence. The affinity bit was thrown in there to explain why starting a > new sequence _might_ fix the problem. > > I should also point out that while a lot of these discussions have focused on > InOrder+ExactlyOnce DA, this feature is still useful in other DAs. For example, if > the DA is just ExactlyOnce - having an accurate accounting of the ACKs allows a > subsequent sequence to send just the gaps from the first, so getting an accurate > list of the gaps becomes critical. And this of course leads us to the discussion > of how to determine the DA in use - which I think might be part of issues 6, 9, 24 and 27. > </dug> > > Finally, I agree with you that considering a gap-filling mechanism would be a good > thing for this TC to do. >
> > --Stefan > >

References:
- RE: [ws-rx] i0019 - a formal proposal - take 2
  - From: "Marc Goodner" <mgoodner@microsoft.com>