ebxml-msg message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: a more in-depth look at the relayed multi-hop
- From: "Durand, Jacques R." <JDurand@us.fujitsu.com>
- To: <ebxml-msg@lists.oasis-open.org>
- Date: Mon, 3 Mar 2008 18:22:07 -0800
Here is
a more detailed analysis of the "ack relay" multi-hop technique, highlighting some
"challenges" it
presents.
Two major points are developed here:
(1) AtMostOnce delivery assurance
(2) a (critical) analysis of the Ack relay
algorithms.
1. AtMostOnce delivery assurance
In the relayed acks technique, AtMostOnce (duplicate
elimination) cannot be supported end-to-end
based on RM sequence numbers. It cannot be
delegated to the string of RM modules on the way.
For example while the message is transiting in the
intermediary,
between the removal and the insertion of a new RM
header, it could be duplicated during a crash
recovery, and the dup be assigned a new RM seq
number, undetectable afterward.
The only way to support it end-to-end is to
implement it based on eb:MessageId, in the ebMS processor.
Not being able to rely on seq numbers reduces the
efficiency and scalability of the dup check,
and forces its implementation at ebMS processing
level.
This is at odds with existing endpoint MSHs that
conform to Core V3 and that will naturally implement
it in the RM module, based on an efficient dup
check using seq numbers.
These endpoint MSHs would not be able to enforce
reliable AtMostOnce in a multi-hop configuration.
2. The relay algorithms.
The "mapping" table shown in Pim illustration makes
the relay technique look deceptively simple and easy.
But it is only one side of the
picture.
- First, it should be noted that, unlike a routing
table which is in general a read-only
structure, this is a data structure with a very
high rate of updates.
This makes scalability much harder to guarantee
(e.g. database indexes are costly to maintain
when update rate is high, and simply ineffective
when too high).
- It appears that one of the two major options in
implementing the relayed acks,
will not work. These two options were:
(Op1) when forwarding an RM message from sequence
S1 to sequence S2, the intermediary
keeps the same sequence number for the message (so
only seq ID is changing), so that the Acks
received for S2 only need a seq ID substitution
before being forwarded back for S1.
(Op2)when forwarding an RM message from sequence S1
to sequence S2, the intermediary
assigns a new sequence number to the message. the
Acks received for S2 then need be mapped to
the sequence numbers for S1.
- The reason why Op1 won't work is that by
splitting the message flow on S1 into possibly
several new outgoing RM sequences (S2, S3...)
through an intermediary, it would be impossible to keep
contiguous seq numbers for each one of these
outgoing sequences. This contiguity is
fundamental to the notion of "sequence numbers": RM
modules are hardwired to control this
by-1 increment, required by RM spec (Section 3.7:
"RM Source MUST assign each message within
a Sequence a MessageNumber element that increments
by 1 from an initial value of 1.")
- So we are left with Op2, which requires
significant work to convert an Ack into another Ack
when doing relay. To understand the kind of mapping
that needs be done for each Ack, here
is a snapshot of the situation the intermediary
will face at any time
(reusing here the sequence IDs in Pim's table:
sequence A is split into seq C and seq D
by the intermediary. The number "3" in A:3 is the
seq number of the message)
- at some point in time, the "mapping table" in the
MSH intermediary will look like:
(the arrow --> means the intermediary has
forwarded the message to the next sequence)
A:1 --> C:1 (acked)
A:2 --> D:1(no Ack yet)
(A:3 not received yet)
A:5 --> C:2 (acked)
A:4 (received late...) --> C:3 (no Ack
yet)
A:6 --> D:2 (acked)
A:3 (received late...) --> C:4
(acked)
A:7 --> D:3 (no Ack yet)
A:8 missing
A:9 --> D:4 (acked)
The RM Acks for sequence A will look
like:
At time message A:6 is received by intermediary,
and its "relay ack" received for seq D:
<wsrm:SequenceAcknowledgement>
<wsrm:Identifier>A</wsrm:Identifier>
<wsrm:AcknowledgementRange Upper="1"
Lower="1"/>
<wsrm:AcknowledgementRange Upper="6"
Lower="5"/>
</wsrm:SequenceAcknowledgement>
At time message A:9 is received by intermediary,
and its "relay ack" received for seq D:
<wsrm:SequenceAcknowledgement>
<wsrm:Identifier>A</wsrm:Identifier>
<wsrm:AcknowledgementRange Upper="1"
Lower="1"/>
<wsrm:AcknowledgementRange Upper="3"
Lower="3"/>
<wsrm:AcknowledgementRange Upper="6"
Lower="5"/>
<wsrm:AcknowledgementRange Upper="9"
Lower="9"/>
</wsrm:SequenceAcknowledgement>
The RM ack is in fact a set of "acked intervals"
that cover the entire sequence,
(meaning every past message is in fact
reacknowledged in every Ack)
and although over time we can expect the emergence
of a "solid" single interval for the oldest messages
(meaning the laggards have been received, they have
all been acknowledged at last etc.)
the reality is that the status of the most recent
messages will always be a bit messy
as illustrated here:
some
"sequence gaps", several ack ranges, until these recent messages settle
in turn into a stable,
solid Acked interval covering all past messages.
Some refer to such a datastructure, as "foamy".
The bottom line is that each Ack creation for seq A
will require the mapping and merging of a similar
set of ack ranges from sequence C and sequence D.
And the "foam" from Acks of C and D is adding up in
Acks for A (meaning, if seq A splits into 100
smaller sequences, and each one of these has an average
of 3 ackranges at any time in their Acks, then seq
A will have an average of 300 ackranges in everyone of
its acks). This also means that if the RM
destination of sequence D is down for some time,
its non-acknowledged message numbers will reflect
in all consolidated Acks for sequence A,
and will fragment these Acks.
E.g. in our example, for 9 messages sent over seq A
the intermediaries will not be able to do less
than 3 ack ranges as long as D server is down.
Imagine what the Acks will look like after 9999 messages
are sent...
(it may very well be that several MSHs senders of messages for sequence D,
alltogether generate a few
thousand messages before realizing that D server is
down and before stopping their
sending - it would take just a
few minutes. As soon as that happens,
intermediaries may find themselves having to deal with Acks,
each one of them containing thousands of ackranges
!)
The general
point is that creating such RM Acks requires a fairly complex mapping
algorithm for the intermediary,
that is not very robust in case of server or
connection downtime, and the performance of which
can degrade quite quickly. The cost of this
algorithm shows for each Ack at each intermediary.
To the complexity of the algorithm, add its
reliance on a transient state info in the intermediary
(the ever-changing mapping table), and you get a
solution with a high price to pay in development effort,
testing/validation effort, and for a run-time
robustness and scalability that is brittle.
Compare this with a
solution where the intermediary has ...nothing to do at all in terms of RM
processing,
after an end-to-end
single RM sequence has been established between two parties.
I think we need to be aware of all this before even
considering this solution as a general candidate.
I believe there might be some cases where it has
merits, but for several other multi-hop configurations
these are
clearly serious
challenges.
-Jacques
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]