OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

ws-rx message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [ws-rx] i0019 - a formal proposal - take 2


+1

Christopher Ferris
STSM, Emerging e-business Industry Architecture
email: chrisfer@us.ibm.com
blog: http://webpages.charter.net/chrisfer/blog.html
phone: +1 508 377 9295

Doug Davis/Raleigh/IBM@IBMUS wrote on 09/07/2005 07:59:24 PM:

> 
> Stefan, 
>   (I should have waited a bit longer before sending my previous note - I 
think your note helps me 
> understand what you're thinking) 
>   To answer your question "is this...problem something that  this 
proposal is suited for?":  Yes. 
> The problem isn't how to resend the large message but instead its how 
the RMS can 
> reliably cleanup the sequence.  You seem to be focused on the 
higher-level recovery aspect 
> which isn't part of the issues or proposal. That is a very important 
distinction. 
>   All of the solutions you talk about may be valid for that higher-level 
processing but that isn't 
> the focus of this proposal.  Regardless of what kind of error recovery 
(if any) 
> takes place afterwards the RMS still needs to resolve this current bad 
state.  And knowing the 
> final ack state is key (IMO) to doing that.  Now, you may say that the 
sequence should never 
> have gotten into this position to begin with by not sending such a large 
message but that would 
> assume that the RMS knows about this 'large message issue' in advance - 
and we can't 
> assume that.  Remember, this is just one of possibly many reasons why a 
message could 
> not be delivered - its impossible for the RMS to prevent or know about 
all of them in 
> advance - if it could then we wouldn't need RM  :-) 
> thanks 
> -Doug 
> 
> 
> "Stefan Batres" <stefanba@microsoft.com> wrote on 09/07/2005 07:38:07 
PM:
> 
> > Doug et al, 
> > 
> > This, IMHO, is an example of what I?m talking about. Is the 
very-large-message 
> > problem something that this proposal is suited for? I don?t think so 
(note that it 
> > doesn?t fit with ?#1?); we?re speculating that it might be that an RMS 
wants to 
> > gracefully complete a sequence on which there are messages that have 
not been 
> > acknowledged because transmission failed due to the size of the 
messages. In my 
> > view, the problem in this case is not how to complete a sequence with 
holes, but 
> > rather, dealing with very-large-messages. For instance, a way to 
address that 
> > problem would be with a message fragmentation protocol that would work 
on top of 
> > WS-RM. Another way might be to fragment the messages at the app layer 
and reflect 
> > that in the contract. Yet another way could be to limit the size of 
messages via 
> > policy. Like the use case given before involving machine affinity and 
machine 
> > failure[1], there are more appropriate ways to address this use case. 
> > 
> > [1] 
http://www.oasis-open.org/apps/org/workgroup/ws-rx/email/archives/200508/msg00303.html 

> > 
> > --Stefan 
> > 
> > 
> > 
> > From: Doug Davis [mailto:dug@us.ibm.com] 
> > Sent: Wednesday, September 07, 2005 5:50 AM
> > To: ws-rx@lists.oasis-open.org
> > Subject: RE: [ws-rx] i0019 - a formal proposal - take 2 
> > 
> > 
> > Stefan, 
> >  The proposal attempts to address the two issues, i019 and i028. 
Perhaps you're 
> > looking to close the issues as invalid?  During the course of these 
discussions 
> > several use cases have been mentioned as possible situation in which 
the issues 
> > mentioned i019 and i028 will occur.  If you think those use cases fit 
into the "#1"
> > you mentioned, and you believe that case to be rare then ok.  I don't 
see the 
> > situations i019 and i028 talk about as being rare nor do I think your 
"#1" is the 
> > only case.  I believe people have mentioned cases much less 
catastrophic, such as 
> > extremely large messages just can not be delivered due to some network 
issues 
> > (sadly something I run into quite a bit), that would still warrant the 
need for 
> > this solution.  But, I don't see the need to iterate all of them since 
the entire 
> > point of the spec is that networks are not reliable and problems will 
occur.  So 
> > running into one that prevents us from getting 100% guaranteed 
complete delivery 
> > every time isn't hard for me to imagine.  But that's just me. 
> > thanks, 
> > -Doug 
> > 
> 
> > 
> > "Stefan Batres" <stefanba@microsoft.com> 
> > 09/06/2005 01:13 PM 
> > 
> > To 
> > 
> > Doug Davis/Raleigh/IBM@IBMUS, <ws-rx@lists.oasis-open.org> 
> > 
> > cc 
> > 
> > 
> > 
> > Subject 
> > 
> > RE: [ws-rx] i0019 - a formal proposal - take 2 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > Doug, 
> > 
> > What I?m trying to do is to identify the set of use cases this feature 
attempts to 
> > address ? you might disagree with the set I?ve identified and that is 
perfectly 
> > valid. It is our job though to motivate changes to the contributed 
specs. If you 
> > disagree with the way I?ve characterized the set of use cases for this 
feature then
> > it would really help if you could write down for me how you 
characterize the use 
> > cases vs. the protocol as submitted. I hope you can take doing this 
seriously; I 
> > don?t think it is a good design process to add features to the 
protocol simply 
> > because we think they are helpful and refuse to do the leg work of 1) 
Defining the 
> > characteristics of the use cases when the features are helpful, 2) 
Compare that 
> > against the contributed documents and 3) Go through the exercise of 
identifying 
> > real world use cases that match said characteristics. 
> > 
> > --Stefan 
> > 
> > 
> > 
> > 
> > 
> > From: Doug Davis [mailto:dug@us.ibm.com] 
> > Sent: Monday, September 05, 2005 5:16 PM
> > To: ws-rx@lists.oasis-open.org
> > Subject: RE: [ws-rx] i0019 - a formal proposal - take 2 
> > 
> > 
> > Stefan, 
> >  I disagree with the premise of your note.  The use cases for this 
feature are not 
> > limited to the cases you've mentioned, nor are they limited to the 
cases I or 
> > anyone else has mentioned.  So trying to fit all possible use cases 
into the scope 
> > you defined just doesn't fly for me.  The reason behind why the RMS 
wants to get an
> > accurate and final ack state could be just about anything - and as 
tempting as it 
> > is to rambling off yet another possible reason why this feature would 
be useful I'd
> > prefer to not let the conversation get bogged down an attempt to limit 
the scope of
> > this feature.  As I've mentioned, if as an implementor you don't think 
you'll ever 
> > need this _optional_ feature then don't send it. 
> > thanks 
> > -Doug 
> > 
> > "Stefan Batres" <stefanba@microsoft.com> 
> > 09/05/2005 07:30 PM 
> > 
> > 
> > 
> > To 
> > 
> > Doug Davis/Raleigh/IBM@IBMUS, <ws-rx@lists.oasis-open.org> 
> > 
> > cc 
> > 
> > 
> > 
> > Subject 
> > 
> > RE: [ws-rx] i0019 - a formal proposal - take 2 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > A quick correction to my comment below: 
> > 
> > Note that thus far, we?ve managed to describe exactly one scenario 
that fits the #2
> > description: [RMD] has separate state stores for session state and 
messages ? the 
> > latter fails but the former is still operable. 
> > 
> > The scenario we?ve talked about is where the RMD uses separate state 
stores, not the RMS. 
> > 
> > --Stefan 
> > 
> > 
> > 
> > 
> > 
> > 
> > From: Stefan Batres [mailto:stefanba@microsoft.com] 
> > Sent: Thursday, September 01, 2005 10:40 AM
> > To: Doug Davis; ws-rx@lists.oasis-open.org
> > Subject: RE: [ws-rx] i0019 - a formal proposal - take 2 
> > 
> > Doug, 
> > 
> > I apologize if my rant below is a bit to cryptic, let me try again: 
> > 
> > 1. When a catastrophic failure occurs (e.g. RMD amnesia), an RMS has 
to react in 
> > some way; It could return an error to the user or it can engage in a 
recovery 
> > mechanism of some sort. I don?t believe you are trying to prescribe 
what the RMS?s 
> > reaction ought to be. 
> > 2. As you?ve said time and again, this proposal is about getting the 
RMS an 
> > accurate ack set in cases where: 1. A full ack set will never be 
possible (or at 
> > least not in a reasonable amount of time), 2.There are messages that 
have been sent
> > and for which no ack has been received and 3. The problem that 
prevents a full ack 
> > set doesn?t prevent the exchange of protocol messages. 
> > 
> > The point I was trying to make is that given #1 above, #2 is an 
optimization for a 
> > case that will be relatively rare. Note that I don?t question for a 
second the 
> > correctness of your proposal ? what concerns me is adding elements to 
the protocol 
> > for this specific case, #2, especially since apps will have to deal 
with #1 anyway. 
> > 
> > Note that thus far, we?ve managed to describe exactly one scenario 
that fits the #2
> > description: RMS has separate state stores for session state and 
messages ? the 
> > latter fails but the former is still operable. 
> > 
> > --Stefan 
> > 
> > 
> > 
> > 
> > 
> > 
> > From: Doug Davis [mailto:dug@us.ibm.com] 
> > Sent: Wednesday, August 31, 2005 3:58 AM
> > To: ws-rx@lists.oasis-open.org
> > Subject: RE: [ws-rx] i0019 - a formal proposal - take 2 
> > 
> > 
> > I'm having a hard time following this.  I sounds like you're saying 
because the 
> > proposal does not solve all RM related problems you don't want to have 
it in our 
> > 'bag of tricks' at all.  Following that logic, why should we 
distinguish between 
> > SequenceTerminated Fault and any other Fault?  We do it because we 
want to provide 
> > as much information back to the RMS as possible.  What it uses this 
information for
> > is up to it. 
> > As I've said may times before, this proposal does not suggest ANY 
recovery scheme. 
> > What I've done (outside of the proposal itself) is discuss how I 
_think_ an RMS 
> > might use this information in some error recovery mechanism but this 
proposal 
> > itself does not suggest one.  This proposal simply provides a 
mechanism for the RMS
> > to get an accurate accounting of the state of the sequence - that's 
it.  How the 
> > RMS uses this information is up to it.  If for nothing else it may 
choose to simply
> > log the information - that alone is invaluable to someone trying to 
figure out 
> > what's going on.  And I'm having a hard time understanding why 
providing an 
> > _optional_ mechanism that could aide in the RMS getting an accurate 
accounting of 
> > the state of the sequence (without having to call up the RMD's admin) 
is a bad thing. 
> > thanks, 
> > -Doug 
> > 
> > "Stefan Batres" <stefanba@microsoft.com> 
> > 08/31/2005 01:48 AM 
> > 
> > 
> > 
> > 
> > To 
> > 
> > Doug Davis/Raleigh/IBM@IBMUS, <ws-rx@lists.oasis-open.org> 
> > 
> > cc 
> > 
> > 
> > 
> > Subject 
> > 
> > RE: [ws-rx] i0019 - a formal proposal - take 2 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > Doug, 
> > 
> > You mention a specific situation: An RMD experiences a failure that 
prevents it 
> > from receiving application messages. I agree in so far as saying that 
in such a 
> > failure case this proposal could be helpful in that it helps the RMS 
to engage in 
> > recovery of some sort (either inform applications that a specific 
message was not 
> > sent or open a new sequence, assuming ordering is not important). But 
this is not 
> > the only failure case that applications will want to deal with (with 
or without 
> > help from the protocol). 
> > Consider the case where connectivity is lost for long enough for both 
sequences to 
> > expire or consider the case where the destination suffers a loss of 
session state. 
> > In such failure modes this solution is not helpful ? yet applications 
will need a 
> > recovery strategy of some sort. It might be that it is application 
specific, or it 
> > might be that a general failure recovery specification is created and 
ratified at 
> > some point. The important idea is that the only way to deal with all 
failure modes 
> > is at higher level. This proposal leverages the protocol to optimize 
recovery in 
> > specific circumstances that should be relatively rare. RM 
implementations should 
> > not be required to support failure mode recovery mechanisms that 
either don?t apply
> > to them or that they choose to implement in a uniform way at a higher 
level. 
> > 
> > Thanks 
> > 
> > --Stefan 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > From: Doug Davis [mailto:dug@us.ibm.com] 
> > Sent: Tuesday, August 30, 2005 1:08 PM
> > To: ws-rx@lists.oasis-open.org
> > Subject: RE: [ws-rx] i0019 - a formal proposal - take 2 
> > 
> > 
> > Yet more comments. :-) 
> > -Doug 
> > 
> > "Stefan Batres" <stefanba@microsoft.com> 
> > 08/30/2005 03:35 PM 
> > 
> > 
> > 
> > 
> > 
> > To 
> > 
> > Doug Davis/Raleigh/IBM@IBMUS, <ws-rx@lists.oasis-open.org> 
> > 
> > cc 
> > 
> > 
> > 
> > Subject 
> > 
> > RE: [ws-rx] i0019 - a formal proposal - take 2 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > Doug, 
> > 
> > Some more comments and thoughts on your proposal: 
> > 
> > 
> > <dug>... When or why an RMS uses CloseSequence is up to it to decide. 
> > All we know is that it wants to shut things down and get an accurate 
ACK from the RMD.</dug>
> > 
> > I still have not heard of a plausible reason why an RMS ?wants to shut 
things down?
> > and the current spec presents a problem. Comparing the spec as it 
stands today vs. 
> > the spec + this proposal: 
> > 
> > TODAY: RMS wants to end the sequence so it sends a LastMessage and 
must wait for a 
> > complete set of acks; this might require retransmitting messages. Once 
a full set 
> > of acks is received RMS sends TerminateSequence. 
> > 
> > TODAY + THIS PROPOSAL: RMS wants to end the sequence so it sends 
Close, waits for a
> > CloseResponse, possibly retransmitting the Close. Once a CloseResponse 
is received 
> > RMS sends TerminateSequence. 
> > 
> > The problem with the TODAY scenario, as I?ve heard it in this forum, 
is that the 
> > RMS might have to wait unacceptably long between sending LastMessage 
and getting a 
> > full ack range. But if getting some messages or acks across proves 
difficult; why 
> > would the RMS expect that getting Close across would be any easier? 
> > 
> > <dug> 1 - I don't believe your text is accurate in that Close is 
supposed to be 
> > used in cases where the sequence needs to end due to something going 
wrong.  You've
> > described a case where the sequence is functioning just fine - and 
while Close can 
> > be used in those cases as well, it provides no additional value.  2- 
Sending a 
> > Close and sending application data can have quite a different set of 
features 
> > executed so I don't think its hard to imagine cases where RM messages 
can get 
> > processed just fine but application messages run into problems.  I 
believe Chris 
> > mentioned on some call the notion of two different persistent stores - 
one for RM 
> > data and one for app-data.  Its possible that the app-data one is 
running into 
> > problems.  3 - Using the CloseSequence operation is option - if you 
feel that, as 
> > an RMS implementor, you'll never see its usefulness then you're free 
to never 
> > implement/send it.  However, I'd hate remove this option for those of 
us who do see
> > value in it.  </dug> 
> > 
> > 
> > 
> > 
> > <dug>The case that I keep thinking about is one where the RMD is 
actually a cluster
> > of machines and when a sequence gets created it has an affinity to a 
certain server
> > in the cluster - meaning it processes all of the messages for that 
sequence. If 
> > that server starts to have problems, and for some reason it just can't 
seem to 
> > process any new app messages then the RMS can close down the sequence 
and start up 
> > a new one. Hopefully, the new sequence will be directed to a different 
server in 
> > the cluster. </dug> 
> > 
> > There are two problems with this scenario and the proposed solution. 
> > 1.      If an RMD has sequence-to-machine affinity that should be 
strictly the RMDs
> > decision and the RMDs problem. The RMS is autonomous; this proposal 
puts 
> > expectations on the RMS? behavior based on particularities of the RMD 
> > implementation. To be clear, I?ll note that affinity can be achieved 
in two ways: 
> >                                                      i.            By 
performing 
> > stateful routing at the RMD; basically the RMD has to remember every 
active 
> > sequence and what machine it has affinity to. In this case it would be 
simple to 
> > change the RMD?s routing table when a machine fails. 
> >                                                     ii.            By 
generating 
> > different EPR?s for each machine. For affinity to function this way 
two things are necessary:
> > 1.      Some sort of endpoint resolution mechanism would have to be 
devised for the
> > RMS to learn the EPR that it should target. 
> > 2.      A mechanism for migrating that EPR. 
> > Clearly 1) and 2) are outside the scope of the TC and, in my view, 
this proposal 
> > might be defining 2) in an informal way that is specific to WS-RM. 
> > 
> > 2.      If the RMS somehow guesses that there is a problem on the EPR 
to which it 
> > is sending its messages and somehow decides that Closing the sequence 
and starting 
> > a new one is the right course of action, ordering guarantees are 
compromised. 
> > 
> > <dug> I probably didn't state the problem very well.  I didn't intend 
to claim that
> > the RMS knew about this affinity, but instead it knew that something 
was wrong with
> > the current sequence and in order to try to fix the situation it 
decided to try 
> > another sequence.  The affinity bit was thrown in there to explain why 
starting a 
> > new sequence _might_ fix the problem. 
> > 
> > I should also point out that while a lot of these discussions have 
focused on 
> > InOrder+ExactlyOnce DA, this feature is still useful in other DAs. For 
example, if
> > the DA is just ExactlyOnce - having an accurate accounting of the ACKs 
allows a 
> > subsequent sequence to send just the gaps from the first, so getting 
an accurate 
> > list of the gaps becomes critical.  And this of course leads us to the 
discussion 
> > of how to determine the DA in use - which I think might be part of 
issues 6, 9, 24 and 27. 
> > </dug> 
> > 
> > Finally, I agree with you that considering a gap-filling mechanism 
would be a good 
> > thing for this TC to do. 
> > 
> > 
> > --Stefan 
> > 
> > 


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]