[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [ws-rx] i0019 - a formal proposal - take 2
+1 Christopher Ferris STSM, Emerging e-business Industry Architecture email: chrisfer@us.ibm.com blog: http://webpages.charter.net/chrisfer/blog.html phone: +1 508 377 9295 Doug Davis/Raleigh/IBM@IBMUS wrote on 09/07/2005 07:59:24 PM: > > Stefan, > (I should have waited a bit longer before sending my previous note - I think your note helps me > understand what you're thinking) > To answer your question "is this...problem something that this proposal is suited for?": Yes. > The problem isn't how to resend the large message but instead its how the RMS can > reliably cleanup the sequence. You seem to be focused on the higher-level recovery aspect > which isn't part of the issues or proposal. That is a very important distinction. > All of the solutions you talk about may be valid for that higher-level processing but that isn't > the focus of this proposal. Regardless of what kind of error recovery (if any) > takes place afterwards the RMS still needs to resolve this current bad state. And knowing the > final ack state is key (IMO) to doing that. Now, you may say that the sequence should never > have gotten into this position to begin with by not sending such a large message but that would > assume that the RMS knows about this 'large message issue' in advance - and we can't > assume that. Remember, this is just one of possibly many reasons why a message could > not be delivered - its impossible for the RMS to prevent or know about all of them in > advance - if it could then we wouldn't need RM :-) > thanks > -Doug > > > "Stefan Batres" <stefanba@microsoft.com> wrote on 09/07/2005 07:38:07 PM: > > > Doug et al, > > > > This, IMHO, is an example of what I?m talking about. Is the very-large-message > > problem something that this proposal is suited for? I don?t think so (note that it > > doesn?t fit with ?#1?); we?re speculating that it might be that an RMS wants to > > gracefully complete a sequence on which there are messages that have not been > > acknowledged because transmission failed due to the size of the messages. In my > > view, the problem in this case is not how to complete a sequence with holes, but > > rather, dealing with very-large-messages. For instance, a way to address that > > problem would be with a message fragmentation protocol that would work on top of > > WS-RM. Another way might be to fragment the messages at the app layer and reflect > > that in the contract. Yet another way could be to limit the size of messages via > > policy. Like the use case given before involving machine affinity and machine > > failure[1], there are more appropriate ways to address this use case. > > > > [1] http://www.oasis-open.org/apps/org/workgroup/ws-rx/email/archives/200508/msg00303.html > > > > --Stefan > > > > > > > > From: Doug Davis [mailto:dug@us.ibm.com] > > Sent: Wednesday, September 07, 2005 5:50 AM > > To: ws-rx@lists.oasis-open.org > > Subject: RE: [ws-rx] i0019 - a formal proposal - take 2 > > > > > > Stefan, > > The proposal attempts to address the two issues, i019 and i028. Perhaps you're > > looking to close the issues as invalid? During the course of these discussions > > several use cases have been mentioned as possible situation in which the issues > > mentioned i019 and i028 will occur. If you think those use cases fit into the "#1" > > you mentioned, and you believe that case to be rare then ok. I don't see the > > situations i019 and i028 talk about as being rare nor do I think your "#1" is the > > only case. I believe people have mentioned cases much less catastrophic, such as > > extremely large messages just can not be delivered due to some network issues > > (sadly something I run into quite a bit), that would still warrant the need for > > this solution. But, I don't see the need to iterate all of them since the entire > > point of the spec is that networks are not reliable and problems will occur. So > > running into one that prevents us from getting 100% guaranteed complete delivery > > every time isn't hard for me to imagine. But that's just me. > > thanks, > > -Doug > > > > > > > "Stefan Batres" <stefanba@microsoft.com> > > 09/06/2005 01:13 PM > > > > To > > > > Doug Davis/Raleigh/IBM@IBMUS, <ws-rx@lists.oasis-open.org> > > > > cc > > > > > > > > Subject > > > > RE: [ws-rx] i0019 - a formal proposal - take 2 > > > > > > > > > > > > > > > > > > > > > > Doug, > > > > What I?m trying to do is to identify the set of use cases this feature attempts to > > address ? you might disagree with the set I?ve identified and that is perfectly > > valid. It is our job though to motivate changes to the contributed specs. If you > > disagree with the way I?ve characterized the set of use cases for this feature then > > it would really help if you could write down for me how you characterize the use > > cases vs. the protocol as submitted. I hope you can take doing this seriously; I > > don?t think it is a good design process to add features to the protocol simply > > because we think they are helpful and refuse to do the leg work of 1) Defining the > > characteristics of the use cases when the features are helpful, 2) Compare that > > against the contributed documents and 3) Go through the exercise of identifying > > real world use cases that match said characteristics. > > > > --Stefan > > > > > > > > > > > > From: Doug Davis [mailto:dug@us.ibm.com] > > Sent: Monday, September 05, 2005 5:16 PM > > To: ws-rx@lists.oasis-open.org > > Subject: RE: [ws-rx] i0019 - a formal proposal - take 2 > > > > > > Stefan, > > I disagree with the premise of your note. The use cases for this feature are not > > limited to the cases you've mentioned, nor are they limited to the cases I or > > anyone else has mentioned. So trying to fit all possible use cases into the scope > > you defined just doesn't fly for me. The reason behind why the RMS wants to get an > > accurate and final ack state could be just about anything - and as tempting as it > > is to rambling off yet another possible reason why this feature would be useful I'd > > prefer to not let the conversation get bogged down an attempt to limit the scope of > > this feature. As I've mentioned, if as an implementor you don't think you'll ever > > need this _optional_ feature then don't send it. > > thanks > > -Doug > > > > "Stefan Batres" <stefanba@microsoft.com> > > 09/05/2005 07:30 PM > > > > > > > > To > > > > Doug Davis/Raleigh/IBM@IBMUS, <ws-rx@lists.oasis-open.org> > > > > cc > > > > > > > > Subject > > > > RE: [ws-rx] i0019 - a formal proposal - take 2 > > > > > > > > > > > > > > > > > > > > > > > > > > > > A quick correction to my comment below: > > > > Note that thus far, we?ve managed to describe exactly one scenario that fits the #2 > > description: [RMD] has separate state stores for session state and messages ? the > > latter fails but the former is still operable. > > > > The scenario we?ve talked about is where the RMD uses separate state stores, not the RMS. > > > > --Stefan > > > > > > > > > > > > > > From: Stefan Batres [mailto:stefanba@microsoft.com] > > Sent: Thursday, September 01, 2005 10:40 AM > > To: Doug Davis; ws-rx@lists.oasis-open.org > > Subject: RE: [ws-rx] i0019 - a formal proposal - take 2 > > > > Doug, > > > > I apologize if my rant below is a bit to cryptic, let me try again: > > > > 1. When a catastrophic failure occurs (e.g. RMD amnesia), an RMS has to react in > > some way; It could return an error to the user or it can engage in a recovery > > mechanism of some sort. I don?t believe you are trying to prescribe what the RMS?s > > reaction ought to be. > > 2. As you?ve said time and again, this proposal is about getting the RMS an > > accurate ack set in cases where: 1. A full ack set will never be possible (or at > > least not in a reasonable amount of time), 2.There are messages that have been sent > > and for which no ack has been received and 3. The problem that prevents a full ack > > set doesn?t prevent the exchange of protocol messages. > > > > The point I was trying to make is that given #1 above, #2 is an optimization for a > > case that will be relatively rare. Note that I don?t question for a second the > > correctness of your proposal ? what concerns me is adding elements to the protocol > > for this specific case, #2, especially since apps will have to deal with #1 anyway. > > > > Note that thus far, we?ve managed to describe exactly one scenario that fits the #2 > > description: RMS has separate state stores for session state and messages ? the > > latter fails but the former is still operable. > > > > --Stefan > > > > > > > > > > > > > > From: Doug Davis [mailto:dug@us.ibm.com] > > Sent: Wednesday, August 31, 2005 3:58 AM > > To: ws-rx@lists.oasis-open.org > > Subject: RE: [ws-rx] i0019 - a formal proposal - take 2 > > > > > > I'm having a hard time following this. I sounds like you're saying because the > > proposal does not solve all RM related problems you don't want to have it in our > > 'bag of tricks' at all. Following that logic, why should we distinguish between > > SequenceTerminated Fault and any other Fault? We do it because we want to provide > > as much information back to the RMS as possible. What it uses this information for > > is up to it. > > As I've said may times before, this proposal does not suggest ANY recovery scheme. > > What I've done (outside of the proposal itself) is discuss how I _think_ an RMS > > might use this information in some error recovery mechanism but this proposal > > itself does not suggest one. This proposal simply provides a mechanism for the RMS > > to get an accurate accounting of the state of the sequence - that's it. How the > > RMS uses this information is up to it. If for nothing else it may choose to simply > > log the information - that alone is invaluable to someone trying to figure out > > what's going on. And I'm having a hard time understanding why providing an > > _optional_ mechanism that could aide in the RMS getting an accurate accounting of > > the state of the sequence (without having to call up the RMD's admin) is a bad thing. > > thanks, > > -Doug > > > > "Stefan Batres" <stefanba@microsoft.com> > > 08/31/2005 01:48 AM > > > > > > > > > > To > > > > Doug Davis/Raleigh/IBM@IBMUS, <ws-rx@lists.oasis-open.org> > > > > cc > > > > > > > > Subject > > > > RE: [ws-rx] i0019 - a formal proposal - take 2 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Doug, > > > > You mention a specific situation: An RMD experiences a failure that prevents it > > from receiving application messages. I agree in so far as saying that in such a > > failure case this proposal could be helpful in that it helps the RMS to engage in > > recovery of some sort (either inform applications that a specific message was not > > sent or open a new sequence, assuming ordering is not important). But this is not > > the only failure case that applications will want to deal with (with or without > > help from the protocol). > > Consider the case where connectivity is lost for long enough for both sequences to > > expire or consider the case where the destination suffers a loss of session state. > > In such failure modes this solution is not helpful ? yet applications will need a > > recovery strategy of some sort. It might be that it is application specific, or it > > might be that a general failure recovery specification is created and ratified at > > some point. The important idea is that the only way to deal with all failure modes > > is at higher level. This proposal leverages the protocol to optimize recovery in > > specific circumstances that should be relatively rare. RM implementations should > > not be required to support failure mode recovery mechanisms that either don?t apply > > to them or that they choose to implement in a uniform way at a higher level. > > > > Thanks > > > > --Stefan > > > > > > > > > > > > > > > > > > > > From: Doug Davis [mailto:dug@us.ibm.com] > > Sent: Tuesday, August 30, 2005 1:08 PM > > To: ws-rx@lists.oasis-open.org > > Subject: RE: [ws-rx] i0019 - a formal proposal - take 2 > > > > > > Yet more comments. :-) > > -Doug > > > > "Stefan Batres" <stefanba@microsoft.com> > > 08/30/2005 03:35 PM > > > > > > > > > > > > To > > > > Doug Davis/Raleigh/IBM@IBMUS, <ws-rx@lists.oasis-open.org> > > > > cc > > > > > > > > Subject > > > > RE: [ws-rx] i0019 - a formal proposal - take 2 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Doug, > > > > Some more comments and thoughts on your proposal: > > > > > > <dug>... When or why an RMS uses CloseSequence is up to it to decide. > > All we know is that it wants to shut things down and get an accurate ACK from the RMD.</dug> > > > > I still have not heard of a plausible reason why an RMS ?wants to shut things down? > > and the current spec presents a problem. Comparing the spec as it stands today vs. > > the spec + this proposal: > > > > TODAY: RMS wants to end the sequence so it sends a LastMessage and must wait for a > > complete set of acks; this might require retransmitting messages. Once a full set > > of acks is received RMS sends TerminateSequence. > > > > TODAY + THIS PROPOSAL: RMS wants to end the sequence so it sends Close, waits for a > > CloseResponse, possibly retransmitting the Close. Once a CloseResponse is received > > RMS sends TerminateSequence. > > > > The problem with the TODAY scenario, as I?ve heard it in this forum, is that the > > RMS might have to wait unacceptably long between sending LastMessage and getting a > > full ack range. But if getting some messages or acks across proves difficult; why > > would the RMS expect that getting Close across would be any easier? > > > > <dug> 1 - I don't believe your text is accurate in that Close is supposed to be > > used in cases where the sequence needs to end due to something going wrong. You've > > described a case where the sequence is functioning just fine - and while Close can > > be used in those cases as well, it provides no additional value. 2- Sending a > > Close and sending application data can have quite a different set of features > > executed so I don't think its hard to imagine cases where RM messages can get > > processed just fine but application messages run into problems. I believe Chris > > mentioned on some call the notion of two different persistent stores - one for RM > > data and one for app-data. Its possible that the app-data one is running into > > problems. 3 - Using the CloseSequence operation is option - if you feel that, as > > an RMS implementor, you'll never see its usefulness then you're free to never > > implement/send it. However, I'd hate remove this option for those of us who do see > > value in it. </dug> > > > > > > > > > > <dug>The case that I keep thinking about is one where the RMD is actually a cluster > > of machines and when a sequence gets created it has an affinity to a certain server > > in the cluster - meaning it processes all of the messages for that sequence. If > > that server starts to have problems, and for some reason it just can't seem to > > process any new app messages then the RMS can close down the sequence and start up > > a new one. Hopefully, the new sequence will be directed to a different server in > > the cluster. </dug> > > > > There are two problems with this scenario and the proposed solution. > > 1. If an RMD has sequence-to-machine affinity that should be strictly the RMDs > > decision and the RMDs problem. The RMS is autonomous; this proposal puts > > expectations on the RMS? behavior based on particularities of the RMD > > implementation. To be clear, I?ll note that affinity can be achieved in two ways: > > i. By performing > > stateful routing at the RMD; basically the RMD has to remember every active > > sequence and what machine it has affinity to. In this case it would be simple to > > change the RMD?s routing table when a machine fails. > > ii. By generating > > different EPR?s for each machine. For affinity to function this way two things are necessary: > > 1. Some sort of endpoint resolution mechanism would have to be devised for the > > RMS to learn the EPR that it should target. > > 2. A mechanism for migrating that EPR. > > Clearly 1) and 2) are outside the scope of the TC and, in my view, this proposal > > might be defining 2) in an informal way that is specific to WS-RM. > > > > 2. If the RMS somehow guesses that there is a problem on the EPR to which it > > is sending its messages and somehow decides that Closing the sequence and starting > > a new one is the right course of action, ordering guarantees are compromised. > > > > <dug> I probably didn't state the problem very well. I didn't intend to claim that > > the RMS knew about this affinity, but instead it knew that something was wrong with > > the current sequence and in order to try to fix the situation it decided to try > > another sequence. The affinity bit was thrown in there to explain why starting a > > new sequence _might_ fix the problem. > > > > I should also point out that while a lot of these discussions have focused on > > InOrder+ExactlyOnce DA, this feature is still useful in other DAs. For example, if > > the DA is just ExactlyOnce - having an accurate accounting of the ACKs allows a > > subsequent sequence to send just the gaps from the first, so getting an accurate > > list of the gaps becomes critical. And this of course leads us to the discussion > > of how to determine the DA in use - which I think might be part of issues 6, 9, 24 and 27. > > </dug> > > > > Finally, I agree with you that considering a gap-filling mechanism would be a good > > thing for this TC to do. > > > > > > --Stefan > > > >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]