wsrm message

Subject: R: [wsrm] Rel YY

From: "Paolo Romano" <paolo.romano@dis.uniroma1.it>
To: "'Jacques Durand'" <JDurand@fsw.fujitsu.com>,<wsrm@lists.oasis-open.org>
Date: Tue, 23 Sep 2003 01:49:00 +0200

Title: Messaggio

I like Jacques proposal. We should address both the problems of (1) minimizing the state information that needs to be persistently stored and of (2) how to efficiently acknowledge batches of messages. The key idea of Jacques suggestion is to have a sequential, strictly increasing identifier. Such a property could then be exploited to record/acknowledge continous sequences of messages (i.e. messages with sequential MessageIDs) by referincing only the lower and upper bound of the sequence.

With this approach, it is possible to reduce the amount of memory required at the purpose of duplicates elimination even furtherly:in fact if, for a given GroupID, a set of messages with MessageID={0,1,..i..,n-1,n} has been received, then it is only necessary to store the MessageID "n", to indicate that messages up to n have already been received.

As an example, consider a receiver RMP which has caught the messages carrying GroupID:X and MessageID={1,2,3,4,5,6,10,15,16}. It should only record that "messageIDs up to 6" and MessageID={10,15,16} have been received. In general, if a sequence of MessageIDs={j,j+1,..,k-1,k} with k>j is received, it is sufficient to store just j and k. Note that this property is independent on having enabled the ordering functionality at the RMP level.

On the other hand, GroupID is essential (and sufficient) to allow univoque identification of the "conversation".

Paolo

Here is a more precise wording of the issue I see with limiting
the use of sequence numbers to ordering only.

Jacques

A new issue, (related to Rel-36, Rel-88)
------------

Precluding the use of sequence numbers, when message ordering is NOT required,
will pose serious scalability issues for duplicate elimination algorithms.
Indeed, in doing so the GroupID would actually behave as a Message ID,
and will require to be stored for each past message for the duplicate look up.
Consider the two following deployment cases of WS-R:

Case 1: Assume a messaging Hub, that must guarantee exactly once delivery under
the following conditions:
Throughput: 1000 messages/sec
Size of GroupID values (approximately): 30 bytes
Scope of duplicate checks: messages over last 5 days
Using a messageID-based duplicate check, this Hub must keep a database of GroupIDs able to store:
1000 mesg * 432,000 (Number of seconds in 5 days) = 432,000,000 GroupIDs to store.
Database size: 12.9 GB.
Besides the significant resource investment needed, (which may please database vendors!)
this solution may simply not be feasible or at least cause quite an additional headache and cost:
the database may not keep up with the speed required for duplicate checks:
fast retrieval would require indexing. But the high rate of updates of such an index
(2000/sec counting additions and removals) offsets the performance gain in indexed search,
as we know indexes are costly to update, especially on a large table.
One way or the other, the overhead of duplicate checking may simply be overwhelming.

Case 2: Assume a messaging end-point, receiving 10 messages/sec average.
Duplicate search is required over the last 30 days. We would still need
a full-fledged database (or B-tree) engine, with data size close to 1 GB.

Proposal: (P1 + P2 + P3)
---------

P1: Make it possible for Senders to use GroupID + SequenceNo to do any grouping they want,
even when ordering is not required. The only requirement is, these elements
should be used as they are for ordering, i.e.:
- (GroupID + SequenceNo) must be globally unique,
- the sender must generate contiguous sequence numbers within a group.

P2:SequenceNo element is mandatory in the header, even when NOT requiring ordering.
In both cases, a Sender can at discretion:
- (a) send messages with a different GroupID each time (1 message per group, with smallest
SequenceNo)
- (b) send longer sequences of messages within a group.

P3: To signal the Receiver to do ordered delivery on a sequence, the Sender will add
messageOrder element in the header (close to option (3) in Rel 88), and is required to
do so at least in the first message(s) of a group.

NOTE1: On the receiver side, duplicate message elimination would use SequenceNo
for fast duplicate elimination within a group, and use an indexed search over a store
of GroupIDs for all groups currently active. Clearly, in the extreme case (a) above,
duplicate elimination would be as costly as over conventional message IDs.
But if Case 1 above actually has only about 1000 "active groups" at any time
(e.g. 1000 concurrent senders generating each a single long sequence at rate of 1 mesg/sec),
then the GroupID store and seq number info associated with it, can hold in the memory
of a small device (estimate: 1 Mb) and does not need a DBMS.

NOTE2: The requirement (Rel 26) to allow for "multiple-Ack" messages, can be fulfilled
in a simple way when using sequences of messages, by the use of an interval notation to signal
groups of messages that are acknowledged. Or, use an upper bound: One Ack message could state
that all messages below SequenceNo N are acknowledged.

Follow-Ups:
- Re: [wsrm] R: [wsrm] Rel YY
  - From: Doug Bunting <Doug.Bunting@Sun.COM>

References:
- Rel YY
  - From: Jacques Durand <JDurand@fsw.fujitsu.com>