wsrm message

Subject: RE: [wsrm] about ExpiryTime as "transport QoS"

From: "Bob Freund" <Bob.Freund@hitachisoftware.com>
To: "Jacques Durand" <JDurand@fsw.fujitsu.com>,"Alan Weissberger" <ajwdct@technologist.com>,<wsrm@lists.oasis-open.org>
Date: Mon, 17 Nov 2003 17:30:59 -0500

Are we assuming that WS-RM operate correctly over a non reliable transport protocol such as UDP that has no retry mechanism?

I find written in the Requirements doc version 0.9 R4.2

"All that is expected of the transport layer is that it will not deliver a corrupted message to the reliability layer"

I guess that this means yes.

Attempting to define efficient timeouts at the transport level are complicated unless an effort is made similar to the RTTM (Round Trip Time Measurement) described in TCP extensions (rfc 1323). Experience with TCP taught us that retry intervals too short lead to network collapse and intervals too long were grossly inefficient (throughput dropped dramatically with only a modest error rate). In addition, the nature of the routing fabric was that transit delays varied over time due to a number of factors not knowable by sender or receiver. Thus the need arose for a dynamically sampled round trip timing mechanism. The retry interval is a dynamic value computed from samples taken at the transport layer. Indeed, it was necessary to sample more often than once per window since the Nyquist criteria would otherwise be violated.

There has been a lot of good work and experience behind the development of "reliable" transports such as tcp which I shudder at replicating.

R4.2 puts few demands on the underlying transport and makes the job of WS-RM more difficult to achieve in an efficient manner.

As for the other time parameters, we need to make a clear decision. Are they transport parameters or are they application parameters?

Certainly, if they are application parameters, it is possible to design algorithms to improve implementation efficiency with the understanding of the exact application semantics. If they are transport parameters, then there existence should be carefully justified.

Drifting off into a somewhat different topic,

There has been an draft by Jacques of a message state diagram which might need revisiting. Due to persistence, there may be the need to extend the message state diagram to comprehend application delivery.

Once we have that complete, then each state transition should have clear implications as to what part of state storage may be released and what response may be made to the other end of the channel. We ought to see on that state diagram what state transitions occur upon the expiration of each time parameter.

thanks

-bob

-----Original Message-----
From: Jacques Durand [mailto:JDurand@fsw.fujitsu.com]
Sent: Monday, November 17, 2003 2:53 AM
To: 'Alan Weissberger'; Jacques Durand; 'wsrm@lists.oasis-open.org'
Subject: RE: [wsrm] about ExpiryTime as "transport QoS"

I think that sooner or later we'll have to face the question of the precise semantics of our RM parameters.

Some have transport QoS semantics.

Clearly, parameters like "retry interval" and "retry count" are for managing transport problems: they say:

"our RMP is willing to make an effort up to this extent - but no more - in order to transmit the message successfully.

If that effort is not sufficient, our RMP will raise a red flag (a Fault), and conclude that something is wrong with

the transport. Making a greater effort is likely to be useless, and to degrade the performance of our RMP."

So I think these parameters should be adjusted based on reasonable expectation from the performance of the

transport layer, and also what the RMP is able to handle. It's RMP tuning... no application semantics.

The only sure guarantee we provide to the application, is a notification in case of failure, while trying

to reduce these cases of failure.

Some other parameters have clear application semantics, like GroupExpiryTime and MAxGroupIdleTime,

as the expected duration of a group, or the time after which you can consider an inactive group is terminated,

obviously depend on the type of application that generate these messages.

So I think of these parameters as under control of the apps (e.g. via RMP API).

Now we have ExpiryTime, which could very well have either transport QoS semantics, or application semantics...

(but not both)

1. transport QoS semantics: (latest time to reach the receiver RMP)

- most appropriate for expressing the expected level of transport QoS, as Alan suggests (though it should normally not clash

with retries, as a Sender should give-up retries as soon as ExpiryTime is reached)

- if we do that we'll have to explain better the capping with GroupExpiryTime (app-level)

2. app semantics: (latest time to deliver to app)

- makes it easier to deal with how long to keep an out-of-order sequence.

- but if it has app semantics, then has to be set by the application (how? is there alwaqys a business time limit?

is a Sender app expected to translate "business time" into clock time?

if a message has expired business-wise, is that a good enough reason to drop-it transport-wise?)

Even though that does not have to reflect in the spec, we need to make sure these parameters will

be meaningful to users in practice (how are they expected to set them?).

I'll send tomorrow another set of Rel50, 52, 57 adjusted for the app semantics of ExpiryTime,

to contrast with the previous set adjusted for transport semantics.

Jacques

-----Original Message-----
From: Alan Weissberger [mailto:ajwdct@technologist.com]
Sent: Sunday, November 16, 2003 7:41 PM
To: Jacques Durand; 'wsrm@lists.oasis-open.org'
Subject: Re: [wsrm] about ExpiryTime as "transport QoS"
All

Not only do I agree with what Jacques says, but want to add that Expiry time detected at RMP receiver is an IMMEDIATE indication of some anomaly/problem with the communications faciltiy. The retry counter at the transmitter, may not have reached terminal count (down count=0) when the receiver detects message has expired. Hence, Expiry time could be a much faster indication of a network failure/fault. I also think a Fault message should be sent back to sender when receiver detects Expiry time has elapsed.

alan

----- Original Message -----
From: Jacques Durand <JDURAND@FSW.FUJITSU.COM>
Date: Tue, 11 Nov 2003 22:53:58 -0800
To: "''wsrm@lists.oasis-open.org''"
Subject: [wsrm] about ExpiryTime as "transport QoS"

I think a more precise semantics of the meaning of RM quantitative parameters
(not just ExpiryTime, but also retry-interval, retrycount...)
is overdue, and Bob had a good point on this.
(I'll address Sunil's issue in another mail) Let me try:

I'll start with an opening remark: if we don't use ExpiryTime,
why not also get rid of "retryCount" and just retry forever a failed message,
instead of deciding of an arbitrary number of times?

We all know the implicit answer, I believe, though it was never explicit in our spec:

We do expect in fact some minimal level of quality from the transport layer
(which includes intermediate nodes, etc). Why? because we simply consider
this level necessary for our RMP to do a good job.

This level is precisely measured by:
- decent number of retries (and retryintervals) that should be necessary
to get a message through (i.e. to the destination RMP).
- decent transport time necessary to get a message through (to other RMP)

If the transport can meet these reqs, then our RMPs can do their job in an optimal way.
If the transport cannot fulfill this contract, then we decide that getting our RMP
trying in spite of such failure would be counterproductive: instead, we prefer the RMP
to give-up and notify a failure to its application(s), which would then advise.

So yes, the choice of these parameters is arbitrary - i.e. our choice.
But they make sense with regard to good operating conditions for our RMP, which
in turn is necessary to serve the application layer well.

Resending forever a failed message would threaten the good operating conditions
our RMP needs (e.g. it could spend all its cycles just doing this if many msg failed .)
Just as accepting overly delayed messages would ignore unacceptable transport problems,
and also put undue burden on persistence and duplicate elimination.
So, detecting "bad transport conditions" and notifying the app layer, is also one form
of reliability.

(Note that ExpiryTime could be same for all messages, that simplification is not a problem
if we consider that the expected transport quality should be same for all messages.)

Jacques
Alan Weissberger
2013 Acacia Ct
Santa Clara, CA 95050-3482
1 408 863 6042 voice
1 408 863 6099 fax