ebxml-msg message

Subject: Re: RM - Definition needed!
From: Martin W Sachs <mwsachs@us.ibm.com>
To: Dan Weinreb <dlw@exceloncorp.com>
Date: Mon, 20 Aug 2001 21:45:23 -0400

See my comments embedded below, headed by MWS:

Regards,\Marty

*************************************************************************************

Martin W. Sachs
IBM T. J. Watson Research Center
P. O. B. 704
Yorktown Hts, NY 10598
914-784-7287;  IBM tie line 863-7287
Notes address:  Martin W Sachs/Watson/IBM
Internet address:  mwsachs @ us.ibm.com
*************************************************************************************



Dan Weinreb <dlw@exceloncorp.com> on 08/16/2001 11:56:53 PM

Please respond to Dan Weinreb <dlw@exceloncorp.com>

To:   ian.c.jones@bt.com
cc:   ebxml-msg@lists.oasis-open.org
Subject:  Re: RM - Definition needed!



   Date: Thu, 16 Aug 2001 17:20:40 +0100
   From: ian.c.jones@bt.com

   It has been suggested that a useful step in the RM discussion is a
simple
   BUT succinct definition of what RM means.  I agree that this would be
useful
   to "scope" the discussion as many other issues tend to drift in and out
of
   this topic.

   As a start to the discussion to define in 1 paragraph or less - I would
like
   2 or at most 3 sentences, the start is:
   "RM is a protocol between the sending and target MSHs to get an
indication
   that a message was delivered to the far end... pointedly, it has nothing
to
   do with the quality, existence or number of intermediate nodes in the
path."

I agree that reliable messaging should be defined in terms of what it
asserts about the communication between the From Party MSH and the To
Party MSH, and not say anything about intermediaries.

First, we need to be more precise about what "the far end" means.  At
least, we need to say something like that the message has been stored
persistently in the target MSH's persistent message repository.  We
have to be clear that we do *not* mean that the application has
processed, or even seen, the message.

Why is the fact that the message reached the target MSH's persistent
store particularly interesting?  Because now the message is available
to the application even in face of a certain set of hazards, namely
the classical hazards associated with distributed messaging systems:
network partitions, lost messages, and so on.

In my opinion, it's pretty imprecise to just say that something is
"reliable".  "Reliable" is entirely relative.  What we really mean is
that the system will continue to function correctly in the face of a
specific set of possible hazards (failure modes).  Without specifying
which hazards we are protecting against, "reliable" is pretty vacuous.

Second, don't we want the phrase "reliable messaging" to mean more
than just that the sender gets an indication that a message was
delivered (if it was, in fact, delivered)?  Don't we at least further
want it to imply once-and-only-once semantics (in the face of a
specific set of hazards)?

   Date: Thu, 16 Aug 2001 13:34:29 -0400
   From: Martin W Sachs <mwsachs@us.ibm.com>

   The definition also needs a statement about non-delivery. A guaranteed
   delivery-failure notification shall be delivered (in some unspecified
   manner) by the From MSH to the From application if the permitted number
of
   retries are exhausted without receiving an acknowledgment by the To MSH
   that it has received and persisted the message for processing by the
   application.  The guaranteed delivery-failure notification is an
essential
   part of reliable messaging because it removes any doubt about whether
the
   To party did or did not receive the message in the case where the From
   application has not received a response from the To application.

Even if you didn't get an ack after n retries, it is still quite
possible that the message is already successfully sitting in the
target MSH's persistent repository.  Maybe the acks all got lost in
the network.  It is very hard to prove any assertion merely from the
evidence that you have *not* heard anything.  (I'm assuming here that
one of the specific hazards we want to be reliable in the face of is
that a message can always vanish in the network.)

MWS:  I agree that this is a problem.  Generally any protocol with
retries makes an assumption that  a retry will succeed unless the
network is completely broken, in which case the original message did
not arrive.

If the From party wants to know without doubt that the To party did
not receive a particular message and *will* never receive that message
in the future, I think the From party needs to receive a
negative-acknowledgement (nack) message from the To party, and the
From party has to know that the nack was sent after the expiration
time (absolute time-to-live) of the message.  (This may be problematic
in the face of the possibility of clock skew.)

MWS:  In other words, the To party sends a negative acknowledgement if
it does not receive a message.  But if it did not receive the message,
how does it know that it should send the negative acknowledgment
and to whom it should send the
negative acknowledgment. ?

Stepping back a bit, I think it would be nice if we could define
"reliable messaging" in terms of what benefit it provides (i.e. what
it promises the application. what its "contract" is), irrespective of
how it is actually implemented.  So it would be good if we could
define "reliable messaging" without saying anything about, say,
retries. I admit that I'm not certain this is possible, but it's worth
a try.

MWS:  This I completely agree with.  We have to define what it promises
the application and what it does if it cannot fulfil the promise. This
has to include a set of assumptions on what hazards it protects against
and a set of assumptions on recovery (e.g. if N retries fail, what can
be assumed?)

-- Dan

----------------------------------------------------------------
To subscribe or unsubscribe from this elist use the subscription
manager: <http://lists.oasis-open.org/ob/adm.pl>