wsrm message

Subject: Re: [wsrm] [REL-XX]Proposal for POLL RM-Reply Pattern
From: Paolo Romano <Paolo.Romano@dis.uniroma1.it>
To: Patrick Yee <kcyee@cecid.hku.hk>, tom@coastin.com
Date: Thu, 18 Sep 2003 09:30:16 -0000

I agree with Partick. One of the issues I think this TC should investigate
furtherly is how long a RMP should persistently store the protocol state
information.
Actually, determining when a receiver can safely remove its state information is
not a trivial problem. In fact, according to the current specifications, a
receiver can not know whether the sender has received or not the acknowledgment
for a given message.
One may think to rely on the time to live mechanism for the purpose of reducing
the amount of memory needed:  both senders and receivers may be allowed to erase
their state information for a given message once its expiration time has been
reached. Indeed this approach is not so safe, because of the nature of the
underlying transport protocols and, specifically, as no assumptions can be made
over the timeliness of a message delivery.

I'll describe a possible situation which could give raise to a failure in
guarantee to reliably deliver a message :
*) time x: Sender A sends message ID, expiration time for ID is set to time x+TTL

*) time y, such that y<x+ttl: Destination B receives the message ID, sends back
an acknowledgment and does whatever with that message (e.g. passes it to the
application level).

Unfortunately the ACK message for ID gets lost, for example, because of a
network partitioning between A and B. Hence, Sender A keeps on retransmitting or
querying B to resolve its uncertainty about ID message.
Consider the unlucky situation in which the network partiotioning lasts until
time z>x+TTL.
Since the expiration time has been reached, A should report an error message
claiming that it was impossible to determine whether ID has actually been
received or not by B. In our scenario ID was actually received and processed.

Although no messaging protocol can do miracles, I think it should be one of our
design goals to minimize the probability to give raise to such situations. This
protocol failure actually depends on the choice to remove state information as
soon as the TTL for a message expires. In ebMS, an additional parameter is used
in order to indicate how long the message will be persisted (I may be wrong but
it should be called persistenceTime), and setting it greater than the (maximum
number of retrasmissions * delay between retransmissions). This approach does
not represent a solution for the above situation, as the network partitioning
may last longer that the persistenceTime as well.

A different approach which would allow to resolve these situations is to allow a
receiver to erase its state information for a given message only upon receipt of
an explicit indication from the sender. In other words an apposite
message/header field could be defined to allow the sender to notify a receiver
that a given ACK has been received, thus receiver can forget about that message.
The sender would keep on retrasmitting this "synchronize" message as long as an
"OK" message is received from the receiver. An "OK" message indicates that the
message has been forgotten by the receiver (i.e. removed from persistence
storage). If the receiver had to receive the "synchronize" message multiple
times it should always respond OK even if it has no longer that message in its
persistent storage. Once a sender receives an "OK" msg it can erase the state
info concerning the messege.
In other words:
A--- MSG:ID-->B
A<---ACK:ID---B
A---SYNC:ID-->B //B removes state info for MSG:ID
A<---OK:ID---A  //A removes state info for MSG:ID

Of course this approach adds a new couple of messages to determine whether it is
safe or not to erase state information for a given message. Anyway, this
overhead can be effectivley reduced if a sender sends a "sync" message to
require the removal of the state info for a BATCH of messages, e.g. at the end
of a conversation. As an example, A is the sender, B the receiver.

for (i=1; i<10; i++)
 {A sends message i;
  B receives i and acks;
  A receives the ack for i;}
// A sends 10 messages to B. Normal behavior. Messages and ACKs are delivered.
A sends a SYNCH message to B.
B sends an OK message to A.

Since we already have a STATUS enquiry message, my proposal is to extend this
message at the purpose to allow persistent storages to be correctly synchronized.

Note that with this approach the above scenario of network partitioning can be
correctly faced, assuming that the network failure will eventually be recoverd.
Whereas, according to the previous approach, the protocol succeeds in reliably
delivering messages if any network failure is recovered within the
TTL/PersistenceTime.
Also, note that when using the approach I am proposing, the state information
for a given message/batch of messages may be erased from the persistent storages
much before than TTL/PersistenceTime. As we expect normal behavior to be the
common case, messages and ACKS will be quickly delivered and  persistent storage
on the receiver side could be erased as soon as a SYNCH message arrives. On the
other hand, by employng the other approach we should wait until
TTL/PersistenceTime expires to remove such information (which is actually
useless as the sender already knows that the message has been reliably delivered
to the receiver).

Any comments?

Paolo








>
> I see. That's fine for tolerating protocol failure. Now, the persistence
> is not only for the purpose of reliable protocol only, but also for the
> purpose of status query. If I said I support status query, it will be
> unreasonable for me to delete the persistence, since the spec doesn't
> indicate a deadline which I can delete the persistence after that..
>
> Am I on the right track?
>
> Regards, -Patrick
>
a>
>
> To unsubscribe from this mailing list (and be removed from the roster of the OASIS
TC), go to
http://www.oasis-open.org/apps/org/workgroup/wsrm/members/leave_workgroup.php.
>
>




--
Paolo Romano