ws-rx message

Subject: RE: [ws-rx] Issue i022, RM Assertions
From: "Patil, Sanjay" <sanjay.patil@sap.com>
To: "Bob Freund-Hitachi" <bob.freund@hitachisoftware.com>, "Gilbert Pilz" <Gilbert.Pilz@bea.com>, <tom@coastin.com>, <vikas@sonoasystems.com>
Date: Wed, 26 Oct 2005 21:33:29 -0700
 

Hi Bob,

My comments are inline below ...

> -----Original Message-----
> From: Bob Freund-Hitachi [mailto:bob.freund@hitachisoftware.com] 
> Sent: Wednesday, Oct 26, 2005 19:00 PM
> To: Patil, Sanjay; Gilbert Pilz; tom@coastin.com; 
> vikas@sonoasystems.com
> Cc: Bob Freund-Hitachi; ws-rx@lists.oasis-open.org
> Subject: RE: [ws-rx] Issue i022, RM Assertions
> 
> 
> 
> > -----Original Message-----
> Sanjay,
> All good questions, but it remains a problem to define these 
> parameters
> for an RMS or RMD that service different applications.
> Say, for example, that one application is a logging application that
> receives messages from time to time based on say user activity (an atm
> audit service may be a good example here)  In this case, 
> there are times
> of day when activity is brisk, and times of day when it is 
> slow.  During
> the brisk times, no activity for even a few seconds is 
> symptomatic of a
> failure, however during the night when activity is low, it may be
> minutes between transactions.  That rmd would need an 
> inactivity timeout
> to accommodate the longest period expected to avoid throwing spurious
> errors, or maybe it might be appropriate that the rmd use a 
> time of day
> based algorithm to determine inactivity timeout.  Through a fluke of
> deployment, that same rmd collects data from some of the atms that
> communicate via satellite links such that they have the 
> additional delay
> on the order of three seconds longer round trip, which is typical for
> geosynchronous satellite.  That same rmd may be deal other 
> applications
> as well that have very different inactivity timeouts.
> The question in all of these cases is what is the purpose of 
> inactivity
> timeout?  It is NOT to alert the application that an 
> unusually long time
> has past since the last message has been received, instead the
> inactivity timeout has value only with regard to triggering resource
> reclamation and forcing a sequence termination.  Although it is true
> that there are potentially other ways to trigger resource reclamation,
> such as when resources have been depleted to a critically low 
> threshold,
> an inactivity timeout is one additional mechanism.  I might 
> be convinced
> to argue that since inactivity timeout can be set to some point such
> that it is possible that resources become depleted before any 
> inactivity
> timeout occurs, it is necessary for all competent implementations be
> able to deal with resource depletion without wedging 
> independent of any
> particular setting of inactivity timeout.  If I would make this
> argument, then I would further state that since resource reclamation
> algorithms are necessary that function independent of 
> inactivity timeout
> settings, that the value of inactivity timeout for this 
> purpose is low,
> unhelpful, and complicating.  Should a sequence be terminated simply
> because inactivity timeout occurs even though resources are plentiful?
> I am not so sure.  
> Gosh, I am beginning to talk myself into believing that
> inactivitytimeout should be chucked as well

Agree. If resources become depleted before any inactivity timeouts
occcur, it is necessary for all competent implementations to be able to
deal with this situation. Now how do you deal with this situation? A>
Start closing open sequences randomly! That doesn't sound very good,
right? Or B> Send a ServerBusy exception on the links, and wait for the
open sequences to either complete normally or timeout by exceeding
InactivityTimeout. Sure there are possible other solutoins, but
InactivityTimeout seems to be reasonably useful here, isn't it?

Given a choice, implementations would like to avoid resource depletion
situations and one solution in this regard would again be - to use
InactivityTimeout on sequences so that resources are used sparringly.
Sure there is that case where some sequences might benefit if they were
held open beyond InactivityTimeout since there is no dearth of
resources, but we would always have such cases for whom the terms are
not most favorable!

Now as far as the static vs dynamic nature of InactivityTimeout is
concerned, I agree that there are convincing cases for both. Perhaps we
should discuss a solution for this problem. But this argument does not
seem to be a reason for chucking out the InactivityTimeout parameter
completely, IMHO.

> 
> As for acknowledgementinterval, it is clearly an early 
> optimization that
> is probably not useful for most cases.  Let's say I use the 
> old rule of
> thumb that queues do not build until a bottleneck approached 70%
> utilization.  In that case, it appears off-hand, that most of 
> the time,
> there is low value in even coding logic that implements
> acknowledgementinterval because most well tuned systems will 
> not operate
> at 70% utilization for any significant part of the day.  The 
> useful case
> for acknowledgement interval, is precisely when the rmd nears 
> the point
> of growing queues because if set longer than this time 
> provide no relief
> and shorter than this time make no difference.  In other 
> words, when the
> utilization is high and the acknowledgement queues are growing, then
> there exists an optimization algorithm that consolidates
> acknowledgements into fewer messages. This is somewhat like the disk
> driver optimization algorithm for consolidating adjacent reads or for
> ordering seeks.  They do no good unless the queue is non 
> empty.  Gosh!,
> I think that we can handle that just fine without a parameter too.
> I think that these optimizations ought to not be normative 
> and that they
> might be part of each vendor's secret sauce.

This reasoning seems to be different from what is quoted in the spec for
having an AcknowledgementInterval parameter. As per the spec,
AcknowledgementInterval is used to set a time limit on the RMD as to how
long it should wait for an outgoing application message to show up so
that the Acnowledgment could be piggybacked upon it.

Thanks,
Sanjay

> 
> Thanks
> -bob
> 
> > From: Patil, Sanjay [mailto:sanjay.patil@sap.com]
> > Sent: Wednesday, October 26, 2005 12:22 PM
> > To: Gilbert Pilz; tom@coastin.com; vikas@sonoasystems.com
> > Cc: Bob Freund-Hitachi; ws-rx@lists.oasis-open.org
> > Subject: RE: [ws-rx] Issue i022, RM Assertions
> > 
> > 
> > If you would allow me to tease this issue a little furter ...
> > 
> > In the scenario described below, I am not sure if it is 
> correct to say
> > that the protocol has not worked. It seems to have worked 
> in the sense
> > that it did not enter into an inderminate state. Sure it could have
> > worked better by optimizing the parameter values, but that 
> is not the
> > same as the failure of the protocol.
> > 
> > One might argue that the RMD and RMS can learn and adapt dynamically
> > their parameter values to optimize the protocol behavior. For
> instance,
> > the RMS in the following scenario might get a sequence terminated
> > message from the RMD and assuming the reason for termination
> (expiration
> > of inactitvity timeout) is also conveyed, the RMS can 
> deduce a better
> > value for its retransmission interaval parameter for the next
> seqeuence
> > and this cycle may continue until things get evenly settled. I don't
> > dispute that this is not feasible, but the question I have is -
> whether
> > such dynamic adjustment is a good idea for a high level reliable
> > messaging protocol. Could we not arrive with certain middle ground
> > solution that would meet the large number of use cases. In that
> regard,
> > I believe that it is sufficient to have RMD specify its
> > InactivityTimeout and AcknowledgementInterval parameters. With this,
> RMS
> > can easily infer the appropriate values for its internal parameters
> > (which are not needed to be conveyed to the other side, so we don't
> have
> > to spec them). Just my 2 cents ...
> > 
> > Thanks,
> > Sanjay
> > 
> > > -----Original Message-----
> > > From: Gilbert Pilz [mailto:Gilbert.Pilz@bea.com]
> > > Sent: Tuesday, Oct 25, 2005 20:44 PM
> > > To: tom@coastin.com; vikas@sonoasystems.com
> > > Cc: Bob Freund; ws-rx@lists.oasis-open.org
> > > Subject: RE: [ws-rx] Issue i022, RM Assertions
> > >
> > > Its easy to pick some values for which the protocol won't work:
> > >
> > > InactivityTimeout = {no value}
> > > BaseRetransmissionInterval = {no value}
> > >
> > > Since "no value" does not imply any default, its possible that the
> RMS
> > > will settle on a retransmission interval of 5 seconds 
> while the RMD
> > > decides to use an inactivity timeout of 3 seconds. The
> > > protocol will now
> > > work only if there are no lost messages. On the first lost message
> the
> > > RMS will wait 5 seconds before retransmitting by which time
> > > the RMD will
> > > have terminted the sequence due to inactivity.
> > >
> > > - g
> > >
> > > > -----Original Message-----
> > > > From: Tom Rutt [mailto:tom@coastin.com]
> > > > Sent: Thursday, October 20, 2005 3:23 PM
> > > > To: vikas@sonoasystems.com
> > > > Cc: 'Bob Freund'; ws-rx@lists.oasis-open.org
> > > > Subject: Re: [ws-rx] Issue i022, RM Assertions
> > > >
> > > > Vikas Deolaliker wrote:
> > > >
> > > > My comments are inline:
> > > >
> > > > > Bob,
> > > > >
> > > > > We are all pessimistic that is why we are trying to add a
> > > layer of
> > > > > reliability on top of TCP.
> > > > >
> > > > TCP is under an http request response. However, if the tcp
> > > > connection goes down before the response is received, http
> > > > has no way to recover. This is where ws Reliable messaging
> > > > comes into play.
> > > >
> > > > > BTW, we are in agreement but looks like it is a violent one.
> > > > >
> > > > > I agree with you that these parameters are not static and
> > > > so assertion
> > > > > mechanism is the wrong way to introduce them into a 
> any reliable
> > > > > exchange system. Where they should be introduced is in the
> > > > mechanisms
> > > > > of the system which are dynamic. So ideally this should be
> > > > part of the
> > > > > protocol.
> > > > >
> > > > The protocol works regardless of the parameter values.
> > > >
> > > > As long as the rms re-transmits until it gets an ack response
> > > > for a message, the protocol will work.
> > > >
> > > > Tom Rutt
> > > >
> > > > > If we agree to the above, again I agree with you that two
> > > > ends cannot
> > > > > declare these parameters statically but continuously adjust
> > > > them based
> > > > > on traffic pattern. RFC 1323 is a good starting point, but the
> > > > > mechanisms that we borrow from it should be part of the
> > > > core protocol.
> > > > >
> > > > > So I guess I am agreeing with you on everything but I get
> > > > the feeling
> > > > > people think our views are divergent.
> > > > >
> > > > > Vikas
> > > > >
> > > > >
> > > >
> > >
> ----------------------------------------------------------------------
> > > > > --
> > > > >
> > > > > *From:* Bob Freund [mailto:bob.freund@hitachisoftware.com]
> > > > > *Sent:* Thursday, October 20, 2005 1:42 PM
> > > > > *To:* vikas@sonoasystems.com; ws-rx@lists.oasis-open.org
> > > > > *Subject:* RE: [ws-rx] Issue i022, RM Assertions
> > > > >
> > > > > Vikas,
> > > > >
> > > > > Are you optimistic or pessimistic?
> > > > >
> > > > > If you are optimistic, and you expect that messages will
> > > usually be
> > > > > received and acknowledgements not lost, then you might
> > > > consider that
> > > > > re-transmissions are part of error recovery.
> > > > >
> > > > > Examples of optimistic systems might include non-blocking
> > > crossbar
> > > > > connected multiprocessors, or even what we have come to
> > > expect via
> > > > > normal high speed wired network connectivity, examples of
> > > > pessimistic
> > > > > systems included most radio based communications.
> > > > >
> > > > > A pessimistic system might operate with a high re-try ratio.
> > > > >
> > > > > It is also true that the optimal parameters may change during
> the
> > > > > duration of a connection.
> > > > >
> > > > > The problem is that nobody knows, in the general case,
> > > what is the
> > > > > nature of the stuff between sender and receiver. This
> > > implies that
> > > > > what is needed is a mechanism that can cover the ranges of
> > > > possibilities.
> > > > >
> > > > > As far as I know, no static parameter set can cover these
> > > > > eventualities. That is why I propose that a mechanism
> > > > similar to that
> > > > > which is used in RFC 1323 is indicated. In that mechanism, the
> > > > > parameters are learned through a moving average mechanism
> > > based on
> > > > > actual measured response timed.
> > > > >
> > > > > Other systems, such as my cross-bar interconnected
> > > multiprocessors,
> > > > > would use possibly a hardware assisted mechanism.
> > > > >
> > > > > Thanks
> > > > >
> > > > > -bob
> > > > >
> > > > >
> > > >
> > >
> ----------------------------------------------------------------------
> > > > > --
> > > > >
> > > > > *From:* Vikas Deolaliker [mailto:vikas@sonoasystems.com]
> > > > > *Sent:* Thursday, October 20, 2005 4:13 PM
> > > > > *To:* Bob Freund; ws-rx@lists.oasis-open.org
> > > > > *Subject:* RE: [ws-rx] Issue i022, RM Assertions
> > > > >
> > > > > Bob,
> > > > >
> > > > > I realize it not like the flow control like done in lower
> > > layers of
> > > > > transports.
> > > > >
> > > > > But none the less it is a control of flow because when you
> > > > implement
> > > > > it, it affects exchange of messages between RMS and RMD.
> > > > And when it
> > > > > goes wrong, you compromise reliable exchange which is the
> > > > purpose of
> > > > > this spec.
> > > > >
> > > > > Vikas
> > > > >
> > > > >
> > > >
> > >
> ----------------------------------------------------------------------
> > > > > --
> > > > >
> > > > > *From:* Bob Freund [mailto:bob.freund@hitachisoftware.com]
> > > > > *Sent:* Thursday, October 20, 2005 12:13 PM
> > > > > *To:* vikas@sonoasystems.com; ws-rx@lists.oasis-open.org
> > > > > *Subject:* RE: [ws-rx] Issue i022, RM Assertions
> > > > >
> > > > > Vikas,
> > > > >
> > > > > I don't think that these parameters have much to do with
> > > > traditional
> > > > > flow control, just the re-try behaviors of each end.
> > > > >
> > > > > The delay times and intervals are not interoperability
> > > concerns as
> > > > > much as they are path and performance optimizations.
> > > > >
> > > > > Flow control, if I understand you correctly, is something
> > > like the
> > > > > sdlc mechanism of rr/rnr whereby the sender could ask if
> > > > the receiver
> > > > > was ready to receive a message or not. This tends to be
> > > > part of much
> > > > > lower level protocols involving physical transport. The
> > > > spec assumes
> > > > > that there is some sort of transport underneath that has the
> > > > > responsibility of managing what one might call (to coin a
> > > term) the
> > > > > link layer or transport layer.
> > > > >
> > > > > This protocol depends on retry to achieve reliability, but
> > > > the timing
> > > > > characteristics have the only impact of swamping the
> > > channel if too
> > > > > short and reducing errpr recovery and thus performance if
> > > too long.
> > > > >
> > > > > Even with negotiated algorithms, it is not predictable what
> > > > the error
> > > > > rates on the channels might be. Oftentimes, errors are
> > > > bursty and what
> > > > > works very well under normal circumstances will fail
> > > during a burst.
> > > > > This what we applied in the discussion leading to RFC1323
> > > > >
> > > > > No, I believe that the parameters BaseRetransmission,
> > > > > ExponentialBackoff and AcknowledgementInterval all need to
> > > > be removed,
> > > > > but a discussion of retransmission should be added to the
> > > > base spec,
> > > > > otherwise, we don't have a reliability protocol.
> > > > >
> > > > > Thanks
> > > > >
> > > > > -bob
> > > > >
> > > > >
> > > >
> > >
> ----------------------------------------------------------------------
> > > > > --
> > > > >
> > > > > *From:* Vikas Deolaliker [mailto:vikas@sonoasystems.com]
> > > > > *Sent:* Thursday, October 20, 2005 1:58 PM
> > > > > *To:* Bob Freund; ws-rx@lists.oasis-open.org
> > > > > *Subject:* RE: [ws-rx] Issue i022, RM Assertions
> > > > >
> > > > > Bob,
> > > > >
> > > > > I agree with you in so far as implementers will implement
> > > > flow control
> > > > > in ways suitable for them. Ideally, what is needed is a
> > > > mechanism for
> > > > > the RMS and RMD to negotiate and agree upon a flow control
> > > > algorithm.
> > > > > Part of this negotiation would entail exchange of schema
> > > related to
> > > > > parameters necessary to follow the algorithm. Should such a
> > > > mechanism
> > > > > be created by this WG, it should then be part of the core
> > > reliable
> > > > > messaging protocol and not in the assertions model.
> > > > >
> > > > > Vikas
> > > > >
> > > > >
> > > >
> > >
> ----------------------------------------------------------------------
> > > > > --
> > > > >
> > > > > *From:* Bob Freund [mailto:bob.freund@hitachisoftware.com]
> > > > > *Sent:* Thursday, September 22, 2005 7:26 AM
> > > > > *To:* vikas@sonoasystems.com; ws-rx@lists.oasis-open.org
> > > > > *Subject:* RE: [ws-rx] Issue i022, RM Assertions
> > > > >
> > > > > Retransmission parameters as well as algorithms are
> > > problematic for
> > > > > the following reasons:
> > > > >
> > > > > 1) The characteristics of the path from source to destination
> are
> > > > > often unknown and often are time-variant.
> > > > >
> > > > > 2) 2) Retransmissions if too frequent cause flooding and
> > > potential
> > > > > catastrophic degradation if the path is near saturation
> > > > >
> > > > > 3) The Path may consist of not only transmission 
> means, but also
> > > > > intermediaries with attendant processing delays
> > > > >
> > > > > 4) Exponential backoff may be implemented many ways,
> > > there is more
> > > > > than one algorithm any they have different parameters
> > > > >
> > > > > 5) Backoff algorithm selection may be implementation
> > > > specific, what is
> > > > > good for cell phones may not be good for cluster
> > > > interconnected nodes
> > > > >
> > > > > 6) I have found no theoretical modeling available of the
> > > > case of web
> > > > > services cum intermediaries
> > > > >
> > > > > 7) Most published data concerning the behavior of backoff
> > > > algorithms
> > > > > examine fairly simple network segment related saturation
> > > and do not
> > > > > address client, server, let alone intermediary saturation.
> > > > >
> > > > > 8) Exponential backoff algorithms need a recovery mechanism
> > > > for those
> > > > > situations where there is a high standard deviation of delay.
> > > > >
> > > > > 9) TCP/IP experience has shown that efficiencies are
> > > > improved with an
> > > > > adaptive mechanism as described in TCP Extensions for High
> > > > Performance
> > > > > (see RFC 1323 RTTM)
> > > > >
> > > > > Proposal:
> > > > >
> > > > > Clearly a backoff mechanism is required; however 
> implementation
> > > > > specific needs are not served well by the selection of
> > > any specific
> > > > > algorithm for all potential implementations of this
> > > > specification. It
> > > > > is recommended that implementers utilizing IP based
> > > > transmission media
> > > > > consider the mechanism described in RFC 1323. Delete all
> > > > > re-transmission parameters as described in the
> > > specification since
> > > > > they are unnecessary and unhelpful should the 
> implementer use an
> > > > > algorithm with a different set of controls.
> > > > >
> > > > > Thanks
> > > > >
> > > > > -bob
> > > > >
> > > > >
> > > >
> > >
> ----------------------------------------------------------------------
> > > > > --
> > > > >
> > > > > *From:* Vikas Deolaliker [mailto:vikas@sonoasystems.com]
> > > > > *Sent:* Thursday, August 04, 2005 9:53 AM
> > > > > *To:* ws-rx@lists.oasis-open.org
> > > > > *Subject:* [ws-rx] Issue i022, RM Assertions
> > > > >
> > > > > Description:
> > > > >
> > > > > (revised)
> > > > >
> > > > > The RM policy assertions, specifically, InActivityTimeout,
> > > > > BaseRetransmissionInterval and ExponentialBackoff
> > > > parameters need to
> > > > > be more finely specified.
> > > > >
> > > > > The following are the areas which need finer specification
> > > > >
> > > > > a) Default Value for InActivityTimeout,
> > > > BaseRetransmissionInterval and
> > > > > ExponentialBackoff:
> > > > >
> > > > > There needs to be a default set for these parameters.
> > > Currently the
> > > > > specification says "If omitted, there is no implied value."
> Since
> > > > > these parameters dictate the delivery of the message, an
> > > > > implementation is going to assume a default anyways. Not
> > > specifying
> > > > > this will make implementations assume a different default
> > > value and
> > > > > cause unwanted timeouts.
> > > > >
> > > > > b) Definition of InActivity
> > > > >
> > > > > There needs to be a discussion of definition of
> > > inactivity. If RMS
> > > > > sends a sequence to RMD and is waiting for the 
> response which is
> > > > > delayed for whatever reason, is that inactivity on the
> > > link between
> > > > > RMS and RMD counted towards InActivityTimeout? If yes, then it
> is
> > > > > entirely possible that while waiting for a sequence response,
> RMS
> > > > > could timeout due to InActivity.
> > > > >
> > > > > c) Applicability of InActivityTimeout:
> > > > >
> > > > > It needs to be specified to which end this parameter is
> > > > applicable. It
> > > > > seems like sequence creator starts the timer for
> > > > InActivityTimeout. If
> > > > > the intention is that this timer exists on both ends of a
> > > > sender and
> > > > > receiver engaged in a RM sequence, we need to define a method
> for
> > > > > synchronization of the timer value of this parameter
> > > > between them. For
> > > > > example an KeepAlive message would need to be defined for
> keeping
> > > > > sequence alive.
> > > > >
> > > > > d) Corner Case Handling:
> > > > >
> > > > > There needs to be a discussion of the corner case when the
> > > > > BaseRetransmissionInterval exceeds InActivityTimeout. This
> > > > can happen
> > > > > when the RMD is indisposed and ExponentialBackoff drives up
> > > > the value
> > > > > of BaseRetransmissionInterval. In this case my 
> retransmission is
> > > > > schedule later than the timeout that I need to abide to.
> > > What state
> > > > > does the RMS enter in this situation?
> > > > >
> > > > > e) BaseRetransmissionInterval Needs an Upper Bound:
> > > > >
> > > > > If an RMD is offline for extended period of time, one can
> > > > expect the
> > > > > BaseRetransmissionInterval to be exponentially backed off
> > > > i.e. become
> > > > > large enough to be not meaningful anymore. Having an
> > > upper bound on
> > > > > this parameter will enable the RMS to stop retransmitting
> > > > and report a
> > > > > fault.
> > > > >
> > > > > Proposal:
> > > > >
> > > > > (revised)
> > > > >
> > > > > 1) InActivityTimeout and BaseRetransmissionInterval can be
> > > > merged into
> > > > > one i.e. BaseRetransmissionTimeout. Having just one counter
> > > > on the RMS
> > > > > and RMD will reduce the run-time resources (much simpler state
> > > > > machine) required to implement RM-Assertions and 
> avoid confusion
> > > > > (unknown states in state machine) caused by two timeouts.
> > > Having a
> > > > > separate timeout for sequence and retransmission may not be
> > > > necessary
> > > > > as activity on the RM link is transmission/retransmission.
> > > > I believe
> > > > > one timeout i.e. BaseRetransmissionTimeout does not change the
> > > > > behavior of the system. Once this timeout occurs the
> > > > sequence has to
> > > > > timeout as the implication of the timeout is the
> > > > destination is either
> > > > > congested or offline.
> > > > >
> > > > > 2) If InActivityTimeout has to be there as a parameter,
> > > we need to
> > > > > fully specify it with mechanisms for synchronization and
> > > > keepalive. In
> > > > > addition, we need to discuss how the corner cases and other
> > > > conflicts
> > > > > that occur when one has two timeout (as discussed in a-e
> > > above) are
> > > > > handled.
> > > > >
> > > > > Vikas
> > > > >
> > > > > Sonoa Systems, Inc.
> > > > >
> > > > > 3900 Freedom Circle, Suite #101
> > > > >
> > > > > Santa Clara, CA 95054
> > > > >
> > > > > (408) 748-1730 x100
> > > > >
> > > >
> > > >
> > > > --
> > > > ----------------------------------------------------
> > > > Tom Rutt	email: tom@coastin.com; trutt@us.fujitsu.com
> > > > Tel: +1 732 801 5744          Fax: +1 732 774 5133
> > > >
> > > >
> > > >
> > >
>