ebxml-msg message

Subject: Re: Need volunteer to draft definition of reliable messaging,wasRE:reliable messaging - hop by hop
From: Dan Weinreb <dlw@exceloncorp.com>
To: david@drummondgroup.com
Date: Fri, 31 Aug 2001 21:04:48 -0400 (EDT)
   Date: Thu, 30 Aug 2001 08:17:21 -0500
   From: David Fischer <david@drummondgroup.com>

   I will volunteer to draft the words (it may be next week) and it will include
   boundaries/layers, signatures, NRR, and RM as it relates to this layer -- unless
   anyone else has time this week.

I don't think I can write up a draft, but here are some key points that I
think need to be taken into account; I hope this might be helpful.

(1) I strongly agree with Marty Sachs's earlier mail that started:
"With or without an API definition, the reliable-messaging must
include the sending application, receiving application, and both MSHs.
A contract that is just between the MSHs is worthless because the
beneficiaries of the contract are the From and To applications."  The
definition of reliable messaging must be able to talk about
communication between the application and the MSH.  We can keep it
very broad (i.e. not try to pin down whether it's a synchronous like
subroutines or asynchronous like IPC messages), but we do have to talk
about the application requesting X and the MSH letting the application
know that Y.

(1a) When using the phrase "delivery failure notification", be careful
to distinguish between (a) the MSH telling the application that the
message did not get through, and (b) a kind of message that can be
sent between MSH's.

(2) It is tempting to say that when an application asks an MSH to
deliver a message reliably, the MSH SHALL always notify the
application either that the message certainly did get delivered, or
that the message certinly did not get delivered.

There are two problems here.  First, when we say that it SHALL, do we
mean "eventually, someday, it SHALL" or "it SHALL within a bounded
amount of real time"? Second, does "it certainly did not get
delivered" mean "it certainly did not get delivered YET" or "it
certainly did not get delivered AND IT NEVER WILL be delivered in the
future, ever"?  In both cases, the latter, more stringent requirement
is more valuable to the application, and it is even more tempting to
say that we intend to fulfill these requirements.

(3) We need an abstract model of "the network" so that we can talk
about what failure modes we intend to be reliable in the face of.
It's really vacuous to use the word "reliable" without defining what
hazards your reliability mechanism claims to overcome.  Of course, the
wider your set of hazards, the harder it is to provide nice properties
to the application, so there's a tradeoff.

In particular, we might want to say that the network is a connection
between a pair of MSH's such that if the sending MSH transmits a
message, the receiving MSH might receive the message after an
unbounded delay (see my earlier mail about store-and-forward mailers
with disk head crashes), or it might never receive the message.

(4) I think it is *inherently* impossible to meet the more-stringent
requirements of (2) above, in face of the proposed failure model in
(3) above.  It's always possible that the sender sends a message, the
message arrives at the receiver, and then all communication between
the sender and receiver is suspended for an unbounded time.  As long
as "radio silence" prevails, there's no way that the sender can know
whether the message arrived at the reciever or not.  The sender just
has to tolerate being uncertain, until communications are restored.

The sender can never conclude anything for certain simply because it
hasn't heard anything from the receiver for T seconds.  Marty has
said: "Marty: "Exhaustion of retries with no Acknowledgment is certain
except for the unlikely case that the message was delivered but the
From party is continuously unable to receive ACKs. We do need to think
more about handling that case."  It's not necessarily that unlikely.
Network partitions can happen at any time, including right after
transmission of a message and before any acknolwdgement message is
sent back, and the partitions can last for significant periods of
time.  (Did you all read about that train in the tunnel that tore into
a major bundle of cables, a couple of weeks ago?)

(5) I strongly with Marty Sachs's statement "I suggest prescribing
that reliable messaging not be used for the delivery-failure
notification."  That way lies madness.  There's no point in having the
receiving end do period retransmissions of the delivery-failure
notification message: how would it ever know when to stop?  Because
the sender would send some kind of acknowledgement?  But what if
*that* got lost?

If the sender wants to get confirmation that a message did *not* get
delivered, the sender should keep sending some kind of "message status
request" until a reply to one of them gets back to him.  The sender
can keep this up until it gets exhausted.

(6) We must keep distinct the "non-repudiation of receipt" concept of
the BPSS, and the signed DeliveryReceipts of the MS.  Confusion
between these concepts has caused some "talking past each other" in
recent mail.  I agree with Arvola: "One thing I'd like to see clarified
in the MSG spec is that signed DeliveryReceipts are not intended to
satisfy the NRR requirement for business signals in the UMM model that
underlies the BPSS spec."

(6a) We need to decide whether it is a requirement of the MS that the
application be able to obtain signed "proof" that a message was
delivered to the To Party MSH.  You (David Fischer) asserted that
there's a significant legal reason to resolve the question of whether
the message was "delivered" in the sense of the receiver taking some
kind of responsibility for the message, even though "delivered"
doesn't prove that the message was "processed" in any
application-level sense.  We should resolve the question of whether
this feature is something that the MS is responsible for providing.

(7) To the greatest extent possible, it would be good to try to define
"reliable messaging" as a contract between the application and the
MSH, i.e. "the MSH undertakes to provide the following services to the
application", saying as little as possible about *how* the MSH goes
about doing it.  Although we do need to figure out all about IM's, and
whether we trust them to stay up, and whether hop-to-hop RM can be
trusted to produce end-to-end RM, it would be good to have a
definition of the *goal* of RM that didn't go into those issues.

-- Dan
References:
- RE: Need volunteer to draft definition of reliable messaging,wasRE:reliable messaging - hop by hop
  - From: David Fischer <david@drummondgroup.com>