wsrm message

Subject: Re: [wsrm] Rel YY

From: Sunil Kunisetty <sunil.kunisetty@oracle.com>
To: Jacques Durand <JDurand@fsw.fujitsu.com>
Date: Fri, 26 Sep 2003 13:09:37 -0700

Jacques,

Oracle will be supporting this proposal. However, I prefer that SequenceNumber
be Optional rather than mandatory as you indicated in (P2). I understand that it will
be difficult for schema validation, but I believe it will be much simpler and efficient
for implementations.

So essentially we should categorize all RM into 3 different categories based on
the elements used in RM Headers:

1) Grouped and Ordered Messages: Group Id + Seq No. + Message Order
Same Group Id, Different Seq No.

2) Grouped and Un-Ordered Messages: Group Id + Seq No.
Same Group Id, Different Seq No.

3) Discrete & Independent RM Messages: Group Id

We could then use the SequenceNumber sub-element has the toggle switch to
distinguish Grouped Un-ordered with Discrete & Independent messages. An
implementation could then use 3 different Hash Tables to store the IDs, thus
making DE much more efficient.

This will be better than having this element and requiring the value to be '0' for
un-ordered grouped messages.

We should strongly recommend in our Spec. that applications such use
grouped ordered or un-ordered messages as much as possible.

Comments?

-Sunil

Jacques Durand wrote:

Here is a more precise wording of the issue I see with limiting
the use of sequence numbers to ordering only.
Jacques
A new issue, (related to Rel-36, Rel-88)
------------
Precluding the use of sequence numbers, when message ordering is NOT required,
will pose serious scalability issues for duplicate elimination algorithms.
Indeed, in doing so the GroupID would actually behave as a Message ID,
and will require to be stored for each past message for the duplicate look up.
Consider the two following deployment cases of WS-R:
Case 1: Assume a messaging Hub, that must guarantee exactly once delivery under
the following conditions:
Throughput: 1000 messages/sec
Size of GroupID values (approximately): 30 bytes
Scope of duplicate checks: messages over last 5 days
Using a messageID-based duplicate check, this Hub must keep a database of GroupIDs able to store:
1000 mesg * 432,000 (Number of seconds in 5 days) = 432,000,000 GroupIDs to store.
Database size: 12.9 GB.
Besides the significant resource investment needed, (which may please database vendors!)
this solution may simply not be feasible or at least cause quite an additional headache and cost:
the database may not keep up with the speed required for duplicate checks:
fast retrieval would require indexing. But the high rate of updates of such an index
(2000/sec counting additions and removals) offsets the performance gain in indexed search,
as we know indexes are costly to update, especially on a large table.
One way or the other, the overhead of duplicate checking may simply be overwhelming.
Case 2: Assume a messaging end-point, receiving 10 messages/sec average.
Duplicate search is required over the last 30 days. We would still need
a full-fledged database (or B-tree) engine, with data size close to 1 GB.
Proposal: (P1 + P2 + P3)
---------
P1: Make it possible for Senders to use GroupID + SequenceNo to do any grouping they want,
even when ordering is not required. The only requirement is, these elements
should be used as they are for ordering, i.e.:
- (GroupID + SequenceNo) must be globally unique,
- the sender must generate contiguous sequence numbers within a group.
P2:SequenceNo element is mandatory in the header, even when NOT requiring ordering.
In both cases, a Sender can at discretion:
- (a) send messages with a different GroupID each time (1 message per group, with smallest
SequenceNo)
- (b) send longer sequences of messages within a group.
P3: To signal the Receiver to do ordered delivery on a sequence, the Sender will add
messageOrder element in the header (close to option (3) in Rel 88), and is required to
do so at least in the first message(s) of a group.
NOTE1: On the receiver side, duplicate message elimination would use SequenceNo
for fast duplicate elimination within a group, and use an indexed search over a store
of GroupIDs for all groups currently active. Clearly, in the extreme case (a) above,
duplicate elimination would be as costly as over conventional message IDs.
But if Case 1 above actually has only about 1000 "active groups" at any time
(e.g. 1000 concurrent senders generating each a single long sequence at rate of 1 mesg/sec),
then the GroupID store and seq number info associated with it, can hold in the memory
of a small device (estimate: 1 Mb) and does not need a DBMS.
NOTE2: The requirement (Rel 26) to allow for "multiple-Ack" messages, can be fulfilled
in a simple way when using sequences of messages, by the use of an interval notation to signal
groups of messages that are acknowledged. Or, use an upper bound: One Ack message could state
that all messages below SequenceNo N are acknowledged.

Follow-Ups:
- Re: [wsrm] Rel YY
  - From: Tom Rutt <tom@coastin.com>
- Re: [wsrm] Rel YY
  - From: Tom Rutt <tom@coastin.com>

References:
- Rel YY
  - From: Jacques Durand <JDurand@fsw.fujitsu.com>