wsrm message

Subject: RE: [wsrm] R: [wsrm] Rel YY

From: Jacques Durand <JDurand@fsw.fujitsu.com>
To: "'Doug Bunting'" <Doug.Bunting@sun.com>, Paolo Romano <paolo.romano@dis.uniroma1.it>
Date: Wed, 24 Sep 2003 15:45:28 -0700

Title: RE: [wsrm] R: [wsrm] Rel YY

I think there are two points in Doug's message:
(1) the cost of the solution (is it really less "simple" than what we have today?)
(2) the relevance of the use case I put forward.

I believe (1) is not hard to answer: nothing more than the presence of seq number
in all messages, is required. A sender can still decide to use a different groupID for
each message. A special element (MessageOrder) will signal the ordering feature,
like we do for duplicate elimination and ack requested.
I don't see more complexity here than what was before, even if it is a tiny bit
more verbose on the protocol. (I personally would think it is simpler,
as we have only deal with one pattern of message identity (groupID+SequenceNo),
instead of one for ordered, and one for unordered without seq number elt.)

(2) is more delicate as it is based on assumed requirements.
It is something that is difficult to debate about:
I guess everyone here has to evaluate the feature based on their own experience,
and on use cases they see relevant to their company's customers, etc.
About my use case: it would not show benefit in a situation where most incoming messages
come from sources that are all different (so we could not assume a grouping in sequences).
Yet, I had in mind more a supply-chain model, where a pre-defined set of customers
and suppliers are interacting at every tier (B2B). Typically, the largest supply-chains
(e.g. automotive) involve no more than 50000 potential partners for a particular tier.
E.g. GM has 14000 dealers. In such cases, the steady stream of messages between these
partners (e.g. dealer to manufacturer) can be largely sequenced.
Reasoning on the size estimate, the groupID + duplicate info for 50000 sequences
is of the order of 50Mb, still much better than storing/searching all message IDs.
As the storage capacity needed is proportional to the number of sequences
(rather than total number of messages), it is also more predictable and controllable as well.

My intent in presenting my use case, was not to show that generalized
sequence numbers is beneficial in all cases, but rather that there are some significant
use cases for which we can lower the barrier to deployment, and that it really "costs" nothing
to address these: no spec trade-offs or user constaints:
users can still decide to not do sequencing.
Also, showing that WS-R scales-up well to such use cases, is enough to deflect criticism
of inefficiency from those users/vendors who might otherwise question the way we
address performance or ignore implementation aspects.

As Paolo suggests, there is a variety of algorithms that can leverage
sequence numbers -even when there are holes in the sequence.
Relying on sequence numbers may also affect how we deal with other issues (e.g. multiple-Ack
would be treated differently if we can assume generalized seq numbers, as Iwasa suggested).
That is why I think "Rel YY" is important to deal with early on, one way or the other...

Regards

Jacques

-----Original Message-----
From: Doug Bunting [mailto:Doug.Bunting@sun.com]
Sent: Monday, September 22, 2003 5:28 PM
To: Paolo Romano
Cc: 'Jacques Durand'; wsrm@lists.oasis-open.org
Subject: Re: [wsrm] R: [wsrm] Rel YY

Paulo and Jacques,

This appears to be a standard performance / storage trade off. The
direction chosen at our last face to face meeting emphasizes simplicity
of the specification and implementations. The suggestions below
optimize on the other side, toward reduced storage requirements. The
included reasoning depends upon the number of messages received overall
(in some time period) and how many of those messages come from the same
origin. Without clear numbers applicable to all WS-Reliability use
cases (something we are not going to get), I find it hard to choose.

Absent concrete numbers, I would probably lean toward the simpler
solution already chosen by the group.

thanx,
doug

On 22-Sep-03 16:49, Paolo Romano wrote:

> I like Jacques proposal. We should address both the problems of (1)
> minimizing the state information that needs to be persistently stored
> and of (2) how to efficiently acknowledge batches of messages. The key
> idea of Jacques suggestion is to have a sequential, strictly
> increasing identifier. Such a property could then be exploited to
> record/acknowledge continous sequences of messages (i.e. messages with
> sequential MessageIDs) by referincing only the lower and upper bound of
> the sequence.
> With this approach, it is possible to reduce the amount of memory
> required at the purpose of duplicates elimination even furtherly:in fact
> if, for a given GroupID, a set of messages with
> MessageID={0,1,..i..,n-1,n} has been received, then it is only necessary
> to store the MessageID "n", to indicate that messages up to n have
> already been received.
> As an example, consider a receiver RMP which has caught the messages
> carrying GroupID:X and MessageID={1,2,3,4,5,6,10,15,16}. It should only
> record that "messageIDs up to 6" and MessageID={10,15,16} have
> been received. In general, if a sequence of MessageIDs={j,j+1,..,k-1,k}
> with k>j is received, it is sufficient to store just j and k. Note that
> this property is independent on having enabled the ordering
> functionality at the RMP level.
> On the other hand, GroupID is essential (and sufficient) to allow
> univoque identification of the "conversation".
>
> Paolo
>
>
>
>
>
>     Here is a more precise wording of the issue I see with limiting
>     the use of sequence numbers to ordering only.
>
>     Jacques
>
>
>     A new issue, (related to Rel-36, Rel-88)
>     ------------
>
>     Precluding the use of sequence numbers, when message ordering is NOT
>     required,
>     will pose serious scalability issues for duplicate elimination
>     algorithms.
>     Indeed, in doing so the GroupID would actually behave as a Message ID,
>     and will require to be stored for each past message for the
>     duplicate look up.
>     Consider the two following deployment cases of WS-R:
>
>     Case 1: Assume a messaging Hub, that must guarantee exactly once
>     delivery under
>     the following conditions:
>     Throughput: 1000 messages/sec
>     Size of GroupID values (approximately): 30 bytes
>     Scope of duplicate checks: messages over last 5 days
>     Using a messageID-based duplicate check, this Hub must keep a
>     database of GroupIDs able to store:
>     1000 mesg * 432,000 (Number of seconds in 5 days) = 432,000,000
>     GroupIDs to store.
>     Database size: 12.9 GB.
>     Besides the significant resource investment needed, (which may
>     please database vendors!)
>     this solution may simply not be feasible or at least cause quite an
>     additional headache and cost:
>     the database may not keep up with the speed required for duplicate
>     checks:
>     fast retrieval would require indexing. But the high rate of updates
>     of such an index
>     (2000/sec counting additions and removals) offsets the performance
>     gain in indexed search,
>     as we know indexes are costly to update, especially on a large table.
>     One way or the other, the overhead of duplicate checking may simply
>     be overwhelming.
>
>     Case 2: Assume a messaging end-point, receiving 10 messages/sec
>     average.
>     Duplicate search is required over the last 30 days. We would still need
>     a full-fledged database (or B-tree) engine, with data size close to
>     1 GB.
>
>
>     Proposal: (P1 + P2 + P3)
>     ---------
>
>     P1: Make it possible for Senders to use GroupID + SequenceNo to do
>     any grouping they want,
>     even when ordering is not required. The only requirement is, these
>     elements
>     should be used as they are for ordering, i.e.:
>     - (GroupID + SequenceNo) must be globally unique,
>     - the sender must generate contiguous sequence numbers within a group.
>
>     P2:SequenceNo element is mandatory in the header, even when NOT
>     requiring ordering.
>     In both cases, a Sender can at discretion:
>     - (a) send messages with a different GroupID each time (1 message
>     per group, with smallest
>     SequenceNo)
>     - (b) send longer sequences of messages within a group.
>
>     P3: To signal the Receiver to do ordered delivery on a sequence, the
>     Sender will add
>     messageOrder element in the header (close to option (3) in Rel 88),
>     and is required to
>     do so at least in the first message(s) of a group.
>
>     NOTE1: On the receiver side, duplicate message elimination would use
>     SequenceNo
>     for fast duplicate elimination within a group, and use an indexed
>     search over a store
>     of GroupIDs for all groups currently active. Clearly, in the extreme
>     case (a) above,
>     duplicate elimination would be as costly as over conventional
>     message IDs.
>     But if Case 1 above actually has only about 1000 "active groups" at
>     any time
>     (e.g. 1000 concurrent senders generating each a single long sequence
>     at rate of 1 mesg/sec),
>     then the GroupID store and seq number info associated with it, can
>     hold in the memory
>     of a small device (estimate: 1 Mb) and does not need a DBMS.
>
>     NOTE2: The requirement (Rel 26) to allow for "multiple-Ack"
>     messages, can be fulfilled
>     in a simple way when using sequences of messages, by the use of an
>     interval notation to signal
>     groups of messages that are acknowledged. Or, use an upper bound:
>     One Ack message could state
>     that all messages below SequenceNo N are acknowledged.