ws-rx message

Subject: RE: [ws-rx] Issue i022, RM Assertions
From: "Gilbert Pilz" <Gilbert.Pilz@bea.com>
To: <tom@coastin.com>, <vikas@sonoasystems.com>
Date: Tue, 25 Oct 2005 20:44:19 -0700
Its easy to pick some values for which the protocol won't work:

InactivityTimeout = {no value}
BaseRetransmissionInterval = {no value}

Since "no value" does not imply any default, its possible that the RMS
will settle on a retransmission interval of 5 seconds while the RMD
decides to use an inactivity timeout of 3 seconds. The protocol will now
work only if there are no lost messages. On the first lost message the
RMS will wait 5 seconds before retransmitting by which time the RMD will
have terminted the sequence due to inactivity.

- g

> -----Original Message-----
> From: Tom Rutt [mailto:tom@coastin.com] 
> Sent: Thursday, October 20, 2005 3:23 PM
> To: vikas@sonoasystems.com
> Cc: 'Bob Freund'; ws-rx@lists.oasis-open.org
> Subject: Re: [ws-rx] Issue i022, RM Assertions
> 
> Vikas Deolaliker wrote:
> 
> My comments are inline:
> 
> > Bob,
> >
> > We are all pessimistic that is why we are trying to add a layer of 
> > reliability on top of TCP.
> >
> TCP is under an http request response. However, if the tcp 
> connection goes down before the response is received, http 
> has no way to recover. This is where ws Reliable messaging 
> comes into play.
> 
> > BTW, we are in agreement but looks like it is a violent one.
> >
> > I agree with you that these parameters are not static and 
> so assertion 
> > mechanism is the wrong way to introduce them into a any reliable 
> > exchange system. Where they should be introduced is in the 
> mechanisms 
> > of the system which are dynamic. So ideally this should be 
> part of the 
> > protocol.
> >
> The protocol works regardless of the parameter values.
> 
> As long as the rms re-transmits until it gets an ack response 
> for a message, the protocol will work.
> 
> Tom Rutt
> 
> > If we agree to the above, again I agree with you that two 
> ends cannot 
> > declare these parameters statically but continuously adjust 
> them based 
> > on traffic pattern. RFC 1323 is a good starting point, but the 
> > mechanisms that we borrow from it should be part of the 
> core protocol.
> >
> > So I guess I am agreeing with you on everything but I get 
> the feeling 
> > people think our views are divergent.
> >
> > Vikas
> >
> > 
> ----------------------------------------------------------------------
> > --
> >
> > *From:* Bob Freund [mailto:bob.freund@hitachisoftware.com]
> > *Sent:* Thursday, October 20, 2005 1:42 PM
> > *To:* vikas@sonoasystems.com; ws-rx@lists.oasis-open.org
> > *Subject:* RE: [ws-rx] Issue i022, RM Assertions
> >
> > Vikas,
> >
> > Are you optimistic or pessimistic?
> >
> > If you are optimistic, and you expect that messages will usually be 
> > received and acknowledgements not lost, then you might 
> consider that 
> > re-transmissions are part of error recovery.
> >
> > Examples of optimistic systems might include non-blocking crossbar 
> > connected multiprocessors, or even what we have come to expect via 
> > normal high speed wired network connectivity, examples of 
> pessimistic 
> > systems included most radio based communications.
> >
> > A pessimistic system might operate with a high re-try ratio.
> >
> > It is also true that the optimal parameters may change during the 
> > duration of a connection.
> >
> > The problem is that nobody knows, in the general case, what is the 
> > nature of the stuff between sender and receiver. This implies that 
> > what is needed is a mechanism that can cover the ranges of 
> possibilities.
> >
> > As far as I know, no static parameter set can cover these 
> > eventualities. That is why I propose that a mechanism 
> similar to that 
> > which is used in RFC 1323 is indicated. In that mechanism, the 
> > parameters are learned through a moving average mechanism based on 
> > actual measured response timed.
> >
> > Other systems, such as my cross-bar interconnected multiprocessors, 
> > would use possibly a hardware assisted mechanism.
> >
> > Thanks
> >
> > -bob
> >
> > 
> ----------------------------------------------------------------------
> > --
> >
> > *From:* Vikas Deolaliker [mailto:vikas@sonoasystems.com]
> > *Sent:* Thursday, October 20, 2005 4:13 PM
> > *To:* Bob Freund; ws-rx@lists.oasis-open.org
> > *Subject:* RE: [ws-rx] Issue i022, RM Assertions
> >
> > Bob,
> >
> > I realize it not like the flow control like done in lower layers of 
> > transports.
> >
> > But none the less it is a control of flow because when you 
> implement 
> > it, it affects exchange of messages between RMS and RMD. 
> And when it 
> > goes wrong, you compromise reliable exchange which is the 
> purpose of 
> > this spec.
> >
> > Vikas
> >
> > 
> ----------------------------------------------------------------------
> > --
> >
> > *From:* Bob Freund [mailto:bob.freund@hitachisoftware.com]
> > *Sent:* Thursday, October 20, 2005 12:13 PM
> > *To:* vikas@sonoasystems.com; ws-rx@lists.oasis-open.org
> > *Subject:* RE: [ws-rx] Issue i022, RM Assertions
> >
> > Vikas,
> >
> > I don't think that these parameters have much to do with 
> traditional 
> > flow control, just the re-try behaviors of each end.
> >
> > The delay times and intervals are not interoperability concerns as 
> > much as they are path and performance optimizations.
> >
> > Flow control, if I understand you correctly, is something like the 
> > sdlc mechanism of rr/rnr whereby the sender could ask if 
> the receiver 
> > was ready to receive a message or not. This tends to be 
> part of much 
> > lower level protocols involving physical transport. The 
> spec assumes 
> > that there is some sort of transport underneath that has the 
> > responsibility of managing what one might call (to coin a term) the 
> > link layer or transport layer.
> >
> > This protocol depends on retry to achieve reliability, but 
> the timing 
> > characteristics have the only impact of swamping the channel if too 
> > short and reducing errpr recovery and thus performance if too long.
> >
> > Even with negotiated algorithms, it is not predictable what 
> the error 
> > rates on the channels might be. Oftentimes, errors are 
> bursty and what 
> > works very well under normal circumstances will fail during a burst.
> > This what we applied in the discussion leading to RFC1323
> >
> > No, I believe that the parameters BaseRetransmission, 
> > ExponentialBackoff and AcknowledgementInterval all need to 
> be removed, 
> > but a discussion of retransmission should be added to the 
> base spec, 
> > otherwise, we don't have a reliability protocol.
> >
> > Thanks
> >
> > -bob
> >
> > 
> ----------------------------------------------------------------------
> > --
> >
> > *From:* Vikas Deolaliker [mailto:vikas@sonoasystems.com]
> > *Sent:* Thursday, October 20, 2005 1:58 PM
> > *To:* Bob Freund; ws-rx@lists.oasis-open.org
> > *Subject:* RE: [ws-rx] Issue i022, RM Assertions
> >
> > Bob,
> >
> > I agree with you in so far as implementers will implement 
> flow control 
> > in ways suitable for them. Ideally, what is needed is a 
> mechanism for 
> > the RMS and RMD to negotiate and agree upon a flow control 
> algorithm.
> > Part of this negotiation would entail exchange of schema related to 
> > parameters necessary to follow the algorithm. Should such a 
> mechanism 
> > be created by this WG, it should then be part of the core reliable 
> > messaging protocol and not in the assertions model.
> >
> > Vikas
> >
> > 
> ----------------------------------------------------------------------
> > --
> >
> > *From:* Bob Freund [mailto:bob.freund@hitachisoftware.com]
> > *Sent:* Thursday, September 22, 2005 7:26 AM
> > *To:* vikas@sonoasystems.com; ws-rx@lists.oasis-open.org
> > *Subject:* RE: [ws-rx] Issue i022, RM Assertions
> >
> > Retransmission parameters as well as algorithms are problematic for 
> > the following reasons:
> >
> > 1) The characteristics of the path from source to destination are 
> > often unknown and often are time-variant.
> >
> > 2) 2) Retransmissions if too frequent cause flooding and potential 
> > catastrophic degradation if the path is near saturation
> >
> > 3) The Path may consist of not only transmission means, but also 
> > intermediaries with attendant processing delays
> >
> > 4) Exponential backoff may be implemented many ways, there is more 
> > than one algorithm any they have different parameters
> >
> > 5) Backoff algorithm selection may be implementation 
> specific, what is 
> > good for cell phones may not be good for cluster 
> interconnected nodes
> >
> > 6) I have found no theoretical modeling available of the 
> case of web 
> > services cum intermediaries
> >
> > 7) Most published data concerning the behavior of backoff 
> algorithms 
> > examine fairly simple network segment related saturation and do not 
> > address client, server, let alone intermediary saturation.
> >
> > 8) Exponential backoff algorithms need a recovery mechanism 
> for those 
> > situations where there is a high standard deviation of delay.
> >
> > 9) TCP/IP experience has shown that efficiencies are 
> improved with an 
> > adaptive mechanism as described in TCP Extensions for High 
> Performance 
> > (see RFC 1323 RTTM)
> >
> > Proposal:
> >
> > Clearly a backoff mechanism is required; however implementation 
> > specific needs are not served well by the selection of any specific 
> > algorithm for all potential implementations of this 
> specification. It 
> > is recommended that implementers utilizing IP based 
> transmission media 
> > consider the mechanism described in RFC 1323. Delete all 
> > re-transmission parameters as described in the specification since 
> > they are unnecessary and unhelpful should the implementer use an 
> > algorithm with a different set of controls.
> >
> > Thanks
> >
> > -bob
> >
> > 
> ----------------------------------------------------------------------
> > --
> >
> > *From:* Vikas Deolaliker [mailto:vikas@sonoasystems.com]
> > *Sent:* Thursday, August 04, 2005 9:53 AM
> > *To:* ws-rx@lists.oasis-open.org
> > *Subject:* [ws-rx] Issue i022, RM Assertions
> >
> > Description:
> >
> > (revised)
> >
> > The RM policy assertions, specifically, InActivityTimeout, 
> > BaseRetransmissionInterval and ExponentialBackoff 
> parameters need to 
> > be more finely specified.
> >
> > The following are the areas which need finer specification
> >
> > a) Default Value for InActivityTimeout, 
> BaseRetransmissionInterval and
> > ExponentialBackoff:
> >
> > There needs to be a default set for these parameters. Currently the 
> > specification says "If omitted, there is no implied value." Since 
> > these parameters dictate the delivery of the message, an 
> > implementation is going to assume a default anyways. Not specifying 
> > this will make implementations assume a different default value and 
> > cause unwanted timeouts.
> >
> > b) Definition of InActivity
> >
> > There needs to be a discussion of definition of inactivity. If RMS 
> > sends a sequence to RMD and is waiting for the response which is 
> > delayed for whatever reason, is that inactivity on the link between 
> > RMS and RMD counted towards InActivityTimeout? If yes, then it is 
> > entirely possible that while waiting for a sequence response, RMS 
> > could timeout due to InActivity.
> >
> > c) Applicability of InActivityTimeout:
> >
> > It needs to be specified to which end this parameter is 
> applicable. It 
> > seems like sequence creator starts the timer for 
> InActivityTimeout. If 
> > the intention is that this timer exists on both ends of a 
> sender and 
> > receiver engaged in a RM sequence, we need to define a method for 
> > synchronization of the timer value of this parameter 
> between them. For 
> > example an KeepAlive message would need to be defined for keeping 
> > sequence alive.
> >
> > d) Corner Case Handling:
> >
> > There needs to be a discussion of the corner case when the 
> > BaseRetransmissionInterval exceeds InActivityTimeout. This 
> can happen 
> > when the RMD is indisposed and ExponentialBackoff drives up 
> the value 
> > of BaseRetransmissionInterval. In this case my retransmission is 
> > schedule later than the timeout that I need to abide to. What state 
> > does the RMS enter in this situation?
> >
> > e) BaseRetransmissionInterval Needs an Upper Bound:
> >
> > If an RMD is offline for extended period of time, one can 
> expect the 
> > BaseRetransmissionInterval to be exponentially backed off 
> i.e. become 
> > large enough to be not meaningful anymore. Having an upper bound on 
> > this parameter will enable the RMS to stop retransmitting 
> and report a 
> > fault.
> >
> > Proposal:
> >
> > (revised)
> >
> > 1) InActivityTimeout and BaseRetransmissionInterval can be 
> merged into 
> > one i.e. BaseRetransmissionTimeout. Having just one counter 
> on the RMS 
> > and RMD will reduce the run-time resources (much simpler state
> > machine) required to implement RM-Assertions and avoid confusion 
> > (unknown states in state machine) caused by two timeouts. Having a 
> > separate timeout for sequence and retransmission may not be 
> necessary 
> > as activity on the RM link is transmission/retransmission. 
> I believe 
> > one timeout i.e. BaseRetransmissionTimeout does not change the 
> > behavior of the system. Once this timeout occurs the 
> sequence has to 
> > timeout as the implication of the timeout is the 
> destination is either 
> > congested or offline.
> >
> > 2) If InActivityTimeout has to be there as a parameter, we need to 
> > fully specify it with mechanisms for synchronization and 
> keepalive. In 
> > addition, we need to discuss how the corner cases and other 
> conflicts 
> > that occur when one has two timeout (as discussed in a-e above) are 
> > handled.
> >
> > Vikas
> >
> > Sonoa Systems, Inc.
> >
> > 3900 Freedom Circle, Suite #101
> >
> > Santa Clara, CA 95054
> >
> > (408) 748-1730 x100
> >
> 
> 
> --
> ----------------------------------------------------
> Tom Rutt	email: tom@coastin.com; trutt@us.fujitsu.com
> Tel: +1 732 801 5744          Fax: +1 732 774 5133
> 
> 
>