wsrm message

Subject: Re: [wsrm] R: [wsrm] Rel YY
From: Doug Bunting <Doug.Bunting@Sun.COM>
To: Paolo Romano <paolo.romano@dis.uniroma1.it>
Date: Mon, 22 Sep 2003 17:28:24 -0700
Paulo and Jacques,

This appears to be a standard performance / storage trade off.  The 
direction chosen at our last face to face meeting emphasizes simplicity 
of the specification and implementations.  The suggestions below 
optimize on the other side, toward reduced storage requirements.  The 
included reasoning depends upon the number of messages received overall 
(in some time period) and how many of those messages come from the same 
origin.  Without clear numbers applicable to all WS-Reliability use 
cases (something we are not going to get), I find it hard to choose.

Absent concrete numbers, I would probably lean toward the simpler 
solution already chosen by the group.

thanx,
	doug

On 22-Sep-03 16:49, Paolo Romano wrote:

> I like Jacques proposal. We should address both the problems of (1) 
> minimizing the state information that needs to be persistently stored 
> and of (2) how to efficiently acknowledge batches of messages. The key 
> idea of Jacques suggestion is to have a sequential, strictly 
> increasing identifier. Such a property could then be exploited to 
> record/acknowledge continous sequences of messages (i.e. messages with 
> sequential MessageIDs) by referincing only the lower and upper bound of 
> the sequence.
> With this approach, it is possible to reduce the amount of memory 
> required at the purpose of duplicates elimination even furtherly:in fact 
> if, for a given GroupID, a set of messages with 
> MessageID={0,1,..i..,n-1,n} has been received, then it is only necessary 
> to store the MessageID "n", to indicate that messages up to n have 
> already been received.
> As an example, consider a receiver RMP which has caught the messages 
> carrying GroupID:X and MessageID={1,2,3,4,5,6,10,15,16}. It should only 
> record that "messageIDs up to 6" and MessageID={10,15,16} have 
> been received. In general, if a sequence of MessageIDs={j,j+1,..,k-1,k} 
> with k>j is received, it is sufficient to store just j and k. Note that 
> this property is independent on having enabled the ordering 
> functionality at the RMP level.
> On the other hand, GroupID is essential (and sufficient) to allow 
> univoque identification of the "conversation".
>  
> Paolo
>  
> 
>  
>  
> 
>     Here is a more precise wording of the issue I see with limiting
>     the use of sequence numbers to ordering only.
> 
>     Jacques
> 
> 
>     A new issue, (related to Rel-36, Rel-88)
>     ------------
> 
>     Precluding the use of sequence numbers, when message ordering is NOT
>     required,
>     will pose serious scalability issues for duplicate elimination
>     algorithms.
>     Indeed, in doing so the GroupID would actually behave as a Message ID,
>     and will require to be stored for each past message for the
>     duplicate look up.
>     Consider the two following deployment cases of WS-R:
> 
>     Case 1: Assume a messaging Hub, that must guarantee exactly once
>     delivery under
>     the following conditions:
>     Throughput: 1000 messages/sec
>     Size of GroupID values (approximately): 30 bytes
>     Scope of duplicate checks: messages over last 5 days
>     Using a messageID-based duplicate check, this Hub must keep a
>     database of GroupIDs able to store:
>     1000 mesg * 432,000 (Number of seconds in 5 days) = 432,000,000
>     GroupIDs to store.
>     Database size: 12.9 GB.
>     Besides the significant resource investment needed, (which may
>     please database vendors!)
>     this solution may simply not be feasible or at least cause quite an
>     additional headache and cost:
>     the database may not keep up with the speed required for duplicate
>     checks:
>     fast retrieval would require indexing. But the high rate of updates
>     of such an index
>     (2000/sec counting additions and removals) offsets the performance
>     gain in indexed search,
>     as we know indexes are costly to update, especially on a large table.
>     One way or the other, the overhead of duplicate checking may simply
>     be overwhelming.
> 
>     Case 2: Assume a messaging end-point, receiving 10 messages/sec
>     average.
>     Duplicate search is required over the last 30 days. We would still need
>     a full-fledged database (or B-tree) engine, with data size close to
>     1 GB.
> 
> 
>     Proposal: (P1 + P2 + P3)
>     ---------
> 
>     P1: Make it possible for Senders to use GroupID + SequenceNo to do
>     any grouping they want,
>     even when ordering is not required. The only requirement is, these
>     elements
>     should be used as they are for ordering, i.e.:
>     - (GroupID + SequenceNo) must be globally unique,
>     - the sender must generate contiguous sequence numbers within a group.
> 
>     P2:SequenceNo element is mandatory in the header, even when NOT
>     requiring ordering.
>     In both cases, a Sender can at discretion:
>     - (a) send messages with a different GroupID each time (1 message
>     per group, with smallest
>     SequenceNo)
>     - (b) send longer sequences of messages within a group.
> 
>     P3: To signal the Receiver to do ordered delivery on a sequence, the
>     Sender will add
>     messageOrder element in the header (close to option (3) in Rel 88),
>     and is required to
>     do so at least in the first message(s) of a group.
> 
>     NOTE1: On the receiver side, duplicate message elimination would use
>     SequenceNo
>     for fast duplicate elimination within a group, and use an indexed
>     search over a store
>     of GroupIDs for all groups currently active. Clearly, in the extreme
>     case (a) above,
>     duplicate elimination would be as costly as over conventional
>     message IDs.
>     But if Case 1 above actually has only about 1000 "active groups" at
>     any time
>     (e.g. 1000 concurrent senders generating each a single long sequence
>     at rate of 1 mesg/sec),
>     then the GroupID store and seq number info associated with it, can
>     hold in the memory
>     of a small device (estimate: 1 Mb) and does not need a DBMS.
> 
>     NOTE2: The requirement (Rel 26) to allow for "multiple-Ack"
>     messages, can be fulfilled
>     in a simple way when using sequences of messages, by the use of an
>     interval notation to signal
>     groups of messages that are acknowledged. Or, use an upper bound:
>     One Ack message could state
>     that all messages below SequenceNo N are acknowledged.
Follow-Ups:
- Re: [wsrm] R: [wsrm] Rel YY
  - From: "iwasa" <kiwasa@jp.fujitsu.com>
- Re: [wsrm] R: [wsrm] Rel YY
  - From: Tom Rutt <tom@coastin.com>
References:
- R: [wsrm] Rel YY
  - From: "Paolo Romano" <paolo.romano@dis.uniroma1.it>