ebxml-msg message

Subject: Re: RM - Definition needed!
From: Dan Weinreb <dlw@exceloncorp.com>
To: ian.c.jones@bt.com
Date: Thu, 16 Aug 2001 23:56:53 -0400 (EDT)
   Date: Thu, 16 Aug 2001 17:20:40 +0100
   From: ian.c.jones@bt.com

   It has been suggested that a useful step in the RM discussion is a simple
   BUT succinct definition of what RM means.  I agree that this would be useful
   to "scope" the discussion as many other issues tend to drift in and out of
   this topic.

   As a start to the discussion to define in 1 paragraph or less - I would like
   2 or at most 3 sentences, the start is:
   "RM is a protocol between the sending and target MSHs to get an indication
   that a message was delivered to the far end... pointedly, it has nothing to
   do with the quality, existence or number of intermediate nodes in the path."

I agree that reliable messaging should be defined in terms of what it
asserts about the communication between the From Party MSH and the To
Party MSH, and not say anything about intermediaries.

First, we need to be more precise about what "the far end" means.  At
least, we need to say something like that the message has been stored
persistently in the target MSH's persistent message repository.  We
have to be clear that we do *not* mean that the application has
processed, or even seen, the message.

Why is the fact that the message reached the target MSH's persistent
store particularly interesting?  Because now the message is available
to the application even in face of a certain set of hazards, namely
the classical hazards associated with distributed messaging systems:
network partitions, lost messages, and so on.

In my opinion, it's pretty imprecise to just say that something is
"reliable".  "Reliable" is entirely relative.  What we really mean is
that the system will continue to function correctly in the face of a
specific set of possible hazards (failure modes).  Without specifying
which hazards we are protecting against, "reliable" is pretty vacuous.

Second, don't we want the phrase "reliable messaging" to mean more
than just that the sender gets an indication that a message was
delivered (if it was, in fact, delivered)?  Don't we at least further
want it to imply once-and-only-once semantics (in the face of a
specific set of hazards)?

   Date: Thu, 16 Aug 2001 13:34:29 -0400
   From: Martin W Sachs <mwsachs@us.ibm.com>

   The definition also needs a statement about non-delivery. A guaranteed
   delivery-failure notification shall be delivered (in some unspecified
   manner) by the From MSH to the From application if the permitted number of
   retries are exhausted without receiving an acknowledgment by the To MSH
   that it has received and persisted the message for processing by the
   application.  The guaranteed delivery-failure notification is an essential
   part of reliable messaging because it removes any doubt about whether the
   To party did or did not receive the message in the case where the From
   application has not received a response from the To application.

Even if you didn't get an ack after n retries, it is still quite
possible that the message is already successfully sitting in the
target MSH's persistent repository.  Maybe the acks all got lost in
the network.  It is very hard to prove any assertion merely from the
evidence that you have *not* heard anything.  (I'm assuming here that
one of the specific hazards we want to be reliable in the face of is
that a message can always vanish in the network.)

If the From party wants to know without doubt that the To party did
not receive a particular message and *will* never receive that message
in the future, I think the From party needs to receive a
negative-acknowledgement (nack) message from the To party, and the
From party has to know that the nack was sent after the expiration
time (absolute time-to-live) of the message.  (This may be problematic
in the face of the possibility of clock skew.)

Stepping back a bit, I think it would be nice if we could define
"reliable messaging" in terms of what benefit it provides (i.e. what
it promises the application. what its "contract" is), irrespective of
how it is actually implemented.  So it would be good if we could
define "reliable messaging" without saying anything about, say,
retries. I admit that I'm not certain this is possible, but it's worth
a try.

-- Dan
References:
- RM - Definition needed!
  - From: ian.c.jones@bt.com