ebxml-cppa message

Subject: Re: reliable messaging
From: christopher ferris <chris.ferris@east.sun.com>
To: ebXML Msg <ebxml-msg@lists.oasis-open.org>, ebxml-cppa@lists.oasis-open.org
Date: Tue, 28 Aug 2001 08:51:56 -0400
David,

I'm pretty much in agreement with your thoughts below.

I think that I also understand where Marty stands on
this, and I also agree that the distance separating us
is nearing insignificance;-)

I think that if we used the term "From Party" instead of
"From Application", then the statement that an MSH SHALL
notify... is less problematic. The ebMS spec should not
specify how this notification is to be handled (callback,
writing to a log, email, pager, whatever), just that an
implementation SHALL provide a some manner of notification
facility.

I also agree with David that the spec needs to provide for
the case of an intermediary notifying the "From Party"
via a DFN message that despite the fact that it (the intermediary MSH)
applied the full RM protocol (attempted N retries at the
specified retryInterval) it was unable to fulfil its 
mission (delivering the message to the next MSH node
and receiving an Ack confirming receipt).

As I believe David has previously pointed out, there are
two possible cases that any MSH will encounter.

1) it is unable to establish a connection to the MSH node
to which it has been instructed to deliver the message. This is
a *known* delivery failure (again, after suffering through
the specified retries, etc.) which would be reported as
an Error.

2) it was able to establish a connection and seemed to have
successfully dispatched the message but has received no
Ack confirming receipt. Despite all of its attempts
(again, all of the specified retries) it has failed to
receive an Ack. The sending MSH *cannot* know what
the disposition of the message might be. It MAY have
been successfully received and processed, but the Ack
is either not sent, lost in the network or possibly misdirected
due to some configuration error. OR, the message may never
have reached its destination (software problems at the
receiving MSH node which make it appear that the message
has been delivered, but in fact the receiving software 
crashes repeatedly upon receipt...) 

In the latter case, the sending MSH can only report the
DFN as a Warning because it CANNOT know for certain the
disposition of the message. This is a simple fact of
life in the world of distributed computing.

Cheers,

Chris



"Burdett, David" wrote:
> 
> I don't think that we are that far apart the critical difference in view is
> around the "requirement" that the From Application is informed of the
> delivery failure.
> 
> For example, what should an implementation do if the application has no
> existing reasonable method of being notified of the delivery failure. In
> this case, one reasonable approach might be for the MSH to log the failure
> and then provide a GUI which allows a user to browse the log and decide what
> to do. I think that if we use a "SHALL" we would preclude the second option.
> 
> I think that if we take your view literally it would mean that if an
> application could not accept this type of notification then the application
> MUST be changed before ebXML reliable messaging could be used? I don't think
> this is reasonable. Thoughts?
> 
> I also think that there are two types of delivery notification:
> 1. The From MSH reporting delivery failure to the From Application, and
> 2. Another MSH (not the From MSH) reporting a delivery failure to the From
> MSH.
> I think you are focusing on the first. I'm thinking of both.
> 
> More detail below marked with <db></db>
> 
> Best wishes.
> 
> David
> -----Original Message-----
> From: Martin W Sachs [mailto:mwsachs@us.ibm.com]
> Sent: Monday, August 27, 2001 10:31 AM
> To: Burdett, David
> Cc: jacques durand; 'christopher ferris';
> ebxml-msg@lists.oasis-open.org; ebxml-cppa@lists.oasis-open.org
> Subject: RE: reliable messaging
> 
> My replies below.
> 
> Regards,
> Marty
> 
> ****************************************************************************
> *********
> 
> Martin W. Sachs
> IBM T. J. Watson Research Center
> P. O. B. 704
> Yorktown Hts, NY 10598
> 914-784-7287;  IBM tie line 863-7287
> Notes address:  Martin W Sachs/Watson/IBM
> Internet address:  mwsachs @ us.ibm.com
> ****************************************************************************
> *********
> 
> "Burdett, David" <david.burdett@commerceone.com> on 08/27/2001 12:55:22 PM
> 
> To:   Martin W Sachs/Watson/IBM@IBMUS, jacques durand <jacques@savvion.com>
> cc:   "'christopher ferris'" <chris.ferris@east.sun.com>,
>       ebxml-msg@lists.oasis-open.org, ebxml-cppa@lists.oasis-open.org
> Subject:  RE: reliable messaging
> 
> Marty/Jacques
> 
> I think I agree with both of you.
> 1. We need to make much stronger statements about the From Party MSH
> notifying the From Party Application that a message was not delivered.
> 2. We can't make it a "SHALL" or "MUST" as we have not specified the API
> and
> we can't check compliance.
> 
> MWS:  We can make it a SHALL and worry about defining the API later.  This
> is
> no different than every other function in the MS spec.
> <db>See discussion above.</db>
> 
> So let's try and agree some words. How about the following for the
> replacement first and last paragraphs in section 10.4 ...
> 
> Current first para (lines 1874-6) ...
> >>If a message sent with deliverySemantics set to OnceAndOnlyOnce cannot be
> delivered, the MSH or process SHOULD send a delivery failure notification
> to
> the From Party. The delivery failure notification message contains: ...<<
> 
> Revised first para ...
> >>A MSH that is not the From Party MSH might receive a message sent with
> deliverySemantics set to OnceAndOnlyOnce that it determines cannot be
> delivered to the To Party application or other process that is the final
> destination of the message. In this case, the MSH MUST send a delivery
> failure notification to the From Party that contains: ...<<
> 
> MWS:  This is not sufficient.  A delivery failure notification sent over
> the
> network is inherently unreliable.  The From MSH will learn, from failure to
> receive an ACK after the specified number of retries, that the message was
> not delivered. The From MSH can reliably send a delivery failure
> notification
> to the From application.
> <db>This is paragraph is saying what a MSH which *NOT* the From MSH should
> do. I think your answer is saying what the From MSH should do which is
> covered later.</db>
> 
> We should also add to the bulleted list ...
> >>. a deliverySemantics attribute set to OnceAndOnlyOnce<<
> ... as the message should be sent reliably.
> 
> Current last para (lines 1886-9) ...
> >>It is possible that an error message with an Error element with an
> ErrorCode set to DeliveryFailure cannot be delivered successfully for some
> reason. If this occurs, then the From party that is the ultimate
> destination
> for the error message SHOULD be informed of the problem by other means. How
> this is done is outside the scope of this specification.<<
> 
> Revised last para, its now two and has sub headings ...
> >>10.4.1 From Party MSH Behavior
> The From Party MSH that sent a message with deliverySemantics set to
> OnceAndOnlyOnce might determine that the message could not delivered. In
> this case it is strongly RECOMMENDED that the From Party MSH notify the
> application or other process that requested the message be sent of the
> delivery failure. This should indicate whether the failure was certain, for
> example, there was a communications failure that meant the message could
> not
> be sent, or probable, for example, although the message was sent, no
> acknowledgement or delivery receipt was received.
> 
> MWS:  As indicated above, I do not agree to anything weaker than SHALL. The
> word
> "probable" in the next to last line is too weak.  Exhaustion of retries
> with no
> Acknowledgment is certain except for the unlikely case that the message was
> delivered but the From party is continuously unable to receive ACKs. We do
> need to think more about handling that case.
> <db>I was "thinking more about handling that case" which is why I came up
> with the wording. We also need to agree about the feasability of "requiring"
> a from MSH to notify the "application". One way around this is to fromally
> define in the spec what is meant by the "application" and allow this to
> include real software applications, notifying appropriate users via log (as
> described earlier). But that still leaves the use case where the implementer
> want's the message sent reliably but as it increases the chances of success
> but doesn't want to know if it doesn't work. I know that this is not
> recommended, but sometimes implementers want to do things they really
> shouldn't and we *can't* stop them.</db>
> 
> 10.4.2 Failure to deliver a DeliveryFailure message
> It is also possible that an MSH that sent an error message with an Error
> element with an ErrorCode set to DeliveryFailure determines that the
> message
> was not delivered successfully even though it was sent with
> deliverySemantics set to OnceAndOnlyOnce. If this occurs, then it is
> strongly RECOMMENDED that the party that is operating the MSH notifies the
> From party that is the ultimate destination for the error message by other
> means. How this is done is outside the scope of this specification.<<
> 
> MWS:  Same comments as above (10.4.1) apply. In addition, as others pointed
> out, a message which is part of the RM protocol for another message should
> not itself be sent reliably because that can lead to a never-ending series
> of messages.
> <db>I think your concern is covered by the text above which says that if the
> sending of a delivery failure fails then solve the problem by other
> means.</db>
> As noted above, I do not agree to sending DeliveryFailure
> over the network.  DeliveryFailure is an indication from the From MSH to the
> From application as a result of exhaustion of retries without receiving an
> acknowlegment.
> <db>I think there is possibly a misunderstanding here. Consider the
> following use case:
> 1. An intermediate MSH receives a message and sends and acknowledgment back
> to the "From MSH".
> 2. The intermediate MSH then determines that it cannot forward the message
> to the "To MSH" as it is down.
> 
> What should the intermediate MSH do the options as I see it are:
> 1. Do nothing. The From MSH will then deduce that the delivery failed as no
> Delivery Receipt was received. Note that there is an edge case as previously
> disccussed that the From MSH must assume that there is a small probability
> that it was delivered.
> 2. Send a "Delivery Failure" message to the From MSH. This means that the
> From MSH is then positively informed that the message was not delivered. I
> prefer this option and it is what the current spec says.
> 
> Neither of these options preclude the From MSH informing the From
> Application of the results of sending the message.
> </db>
> 
> We also need to change lines 1849-53 in section 10.3.4 as this is now
> covered in section 10.4 (see above). We also could extend it to describe
> the
> idea of checking that MSHs are up and running. Currently these lines
> contain
> ...
> >>. If the Sending MSH does not receive an Acknowledgment Message after the
> maximum number of retries, the Sending MSH SHOULD notify the application
> and/or system administrator function of the failure to receive an
> acknowledgement.
> 
> MWS:  SHALL
> <db> Marty!! This is what the current spec says!! I know you object to it ;)
> </db>
> . If the Sending MSH detects an unrecoverable communications protocol error
> at the transport protocol level, the Sending MSH SHOULD resend the
> message.<<
> 
> We could replace it with ...
> >>. If the Sending MSH does not receive an Acknowledgmemt Message after the
> maximum number of retries then the Sending MSH SHOULD:
>    a) Send a Message Service Handler Ping Message to the same MSH one or
> more times as the Sending MSH determines.
>    b) If no Message Service Handler Pong Messages are received then the
> Sending MSH MUST carry out the Failed Message Delivery behavior as
> described
> in section 10.4<<
> 
> MWS:  If this ping/pong is new for me.
> <db>It's in version 1.0 of the spec.</db>
> If it is useful in resolving doubt,
> the SHOULD probably should be changed to SHALL.  However given that the
> maximum number of retries has failed, it isn't obvious what value there is
> to
> performing ping/pong.  If there is value to it, the (b) need also to state
> what
> do do if ping/poing succeeds.
> <db>Good point. If the ping succeeds then they should retry sending the
> message again. Agreed?</db>
> Thoughts?
> 
> David
> -----Original Message-----
> From: Martin W Sachs [mailto:mwsachs@us.ibm.com]
> Sent: Sunday, August 26, 2001 2:44 PM
> To: jacques durand
> Cc: Burdett, David; 'christopher ferris';
> ebxml-msg@lists.oasis-open.org; ebxml-cppa@lists.oasis-open.org
> Subject: Re: reliable messaging
> 
> Jacques,
> 
> I have to disagree.  With or without an API definition, the
> reliable-messaging must include the sending application, receiving
> application, and both MSHs.  A contract that is just between the MSHs is
> worthless because the beneficiaries of the contract are the From and To
> applications.  We can state the essentials of the contract and the
> assumptions on the implementations with or without an API definition. Once
> we have the API definition, we can go back and improve the description of
> the contract and the assumptions on the implementations to take the API
> into account but the assumptions and contract do not change.
> 
> Regards,
> Marty
> 
> ****************************************************************************
> 
> *********
> 
> Martin W. Sachs
> IBM T. J. Watson Research Center
> P. O. B. 704
> Yorktown Hts, NY 10598
> 914-784-7287;  IBM tie line 863-7287
> Notes address:  Martin W Sachs/Watson/IBM
> Internet address:  mwsachs @ us.ibm.com
> ****************************************************************************
> 
> *********
> 
> jacques durand <jacques@savvion.com> on 08/24/2001 07:59:31 PM
> 
> To:   "Burdett, David" <david.burdett@commerceone.com>
> cc:   "'christopher ferris'" <chris.ferris@east.sun.com>,
>       ebxml-msg@lists.oasis-open.org, ebxml-cppa@lists.oasis-open.org
> Subject:  Re: reliable messaging
> 
> "Burdett, David" wrote:
> 
> > Chris said ...
> >
> > >>>If no acknowledgment has been received, the sender continues to retry
> > delivery, using the Retries and RetryInterval to govern processing. When
> the
> > number of retries identified by Retries is exceeded, the sending MSH
> > SHOULD notify the sending "party" by some means that is unspecified
> > (e.g. notify the application through some API that it provides, log
> > something
> > useful in an error log, etc.)<<<
> >
> > Note that there is an edge case where all the acknowledgements that were
> > sent failed to be delivered, e.g. maybe a MSH can receive messages but
> not
> > send them. This means that even though no acknowledgement was received,
> the
> > message was actually delivered.
> 
> That is indeed a point we have demonstrated in past POC.
> Clearly, RM cannot be substituted to a message-based transaction service,
> which
> is the right
> level to guarantee consistency across parties' apps. But it can be the
> basis for
> such a service.
> By NOT receiving an ack, the sender should not infer that the receiver has
> not
> received the message:
> only that the reception has not been confirmed, and that it is OK to resend
> it
> (the duplication check doing the cleanup job on receiver side).
> 
> Regardless of what RM can or can't do, the question raise dby Martin W
> Sachs
> ( the requirement to notify sending party) is interesting in that it
> depends on
> the definition
> of RM:
> (1) if RM is a contract between sending party, receiving party, and MSH
> transport layer, then
> these sender notifications (as well as elimination of duplicate for
> receiver)
> are part of the contract.
> (2) if RM is a contract between two end-point MSHs, then these
> notifications
> have no normative
> value.
> 
> My understanding is that (2) is currently applies (so SHOULD should remain
> SHOULD...)
> However, once an formal MS API is specified, the MS spec will have to
> address
> the
> "contract" value of such API, with regard to sender and receiver...
> 
> My two cents...
> 
> Jacques Durand
> Savvion
> 
> >
> >
> > David
> > PS Catching up on emails and logging them into the change request
> database
> > ;)
> >
> > -----Original Message-----
> > From: christopher ferris [mailto:chris.ferris@east.sun.com]
> > Sent: Friday, August 03, 2001 7:36 AM
> > To: ebxml-msg@lists.oasis-open.org; ebxml-cppa@lists.oasis-open.org
> > Subject: Re: reliable messaging
> >
> > Marty,
> >
> > Please see below.
> >
> > Chris
> >
> > Martin W Sachs wrote:
> > >
> > > Chris,
> > >
> > > I think I may have been unclear.  I specifically am not after an
> > > application-level response for this purpose.
> > >
> > > The question is:  when can a sending party conclude that his message
> > either
> > > was or wasn't delivered?  That time is not relevant to the performance
> of
> > > the application function.  If the service provider site goes down
> before
> > > processing the message, but the message has been persisted ( a key
> > > requirement of reliable messaging), knowing that the message was
> persisted
> > > at the application is important information because it tells the
> sending
> > > party not to resend.
> >
> > Then receipt of the reliable messaging acknowledgment is the answer to
> > your question. That is the point at which the sender knows that the
> message
> > has been received and persisted.
> >
> > >
> > > Yes, receipt of the RM acknowledgment tells the party that the message
> got
> > > there but how long does the sending party wait to decide that it won't
> be
> > > receiving a guaranteed delivery failure notification?  The answer, in
> my
> >
> > If the sender receives an acknowledgment, it won't be receiving a
> guaranteed
> > delivery failure notification because the message HAS been received. Once
> > this acknowledgment has been received it SHOULD cease all reliable
> messaging
> > retries, etc. as any subsequent retries would place an unnecessary burden
> > on both party's MSHs.
> >
> > If no acknowledgment has been received, the sender continues to retry
> > delivery, using the Retries and RetryInterval to govern processing. When
> the
> > number of retries identified by Retries is exceeded, the sending MSH
> > SHOULD notify the sending "party" by some means that is unspecified
> > (e.g. notify the application through some API that it provides, log
> > something
> > useful in an error log, etc.)
> >
> > It isn't at all clear to me that the sender needs anything more than
> > Retries and RetryInterval to achieve its mission. Again, persistDuration
> > is NOT a sending MSH parameter, it is a receiving MSH parameter.
> >
> > > mind, is long enough for all the allowable reliable-messaging retries
> to
> > be
> > > completed.  I believe that persistDuration is the right answer as long
> as
> > > it is prescribed that it be set long enough to cover the time to
> complete
> > > the allowed number of retries plus a little for propagating the
> delivery
> > > failure notification back to where the sending application can find it.
> > > Alternately, a worst case time to recognize a delivery failure could be
> > > defined.
> > >
> > > The sending application cannot determine if the message is relevant
> unless
> > > it knows that delivery did or did not succeed.  Receiving or not
> receiving
> > > a delivery failure notification within a defined time is crucial.
> > >
> > > Yes, what I described covers several layers in the stack and maybe
> several
> > > middleware "modules".  However, unless all the reliable messaging rules
> > are
> > > set down in one place, they will never be understood.
> > >
> > > ...and let me reiterate again:  The messaging service must guarantee
> that
> > a
> > > delivery failure notification will be sent by the sending MSH to the
> > > sending application in all cases where delivery could not be made.
> > Without
> > > this, reliable messaging is utterly broken because the key requirement
> of
> > > reliable messaging is that the state of the business transaction not be
> in
> > > doubt if the application-level acknowledgment is not received.  If the
> > > message sender is not notified of delivery failure, reliable messaging
> > > fails because the sending application does not know if the message got
> to
> > > the other party and therefore doesn't know how to recover.  People
> outside
> > > of the ebXML teams are starting to notice this failure and conclude
> that
> > > reliable messaging is no good.  Changing those SHOULDs to SHALLs is
> > > essential to the business future of the ebXML specifications because
> > > reliable messaging is a major component of the value of the ebXML
> message
> > > service.
> >
> > The MS specification cannot dictate to implementation vendors anything
> > of this nature. How they notify the sending "party" (application or
> > person) is strictly within their prerogative. The MS spec deals
> exclusively
> > with the details of the wire protocol, not the implementation details
> > of how an MSH is integrated with some application.
> >
> > I don't see how this can be perceived as a failure of the specification
> > when it is clearly (IMO) outside the scope of our work.
> >
> > If we change all of these SHOULDs to SHALLs then everyone would be
> > asking "how?" to which there is no possible answer that covers all
> possible
> > cases.
> >
> > >
> > > Regards,
> > > Marty
> > >
> > >
> 
> ----------------------------------------------------------------
> To subscribe or unsubscribe from this elist use the subscription
> manager: <http://lists.oasis-open.org/ob/adm.pl>
References:
- RE: reliable messaging
  - From: "Burdett, David" <david.burdett@commerceone.com>