ebxml-cppa message

Subject: RE: reliable messaging
From: Martin W Sachs <mwsachs@us.ibm.com>
To: "Burdett, David" <david.burdett@commerceone.com>
Date: Tue, 28 Aug 2001 10:03:46 -0400

My rejoinders below.

Regards,
Marty

*************************************************************************************

Martin W. Sachs
IBM T. J. Watson Research Center
P. O. B. 704
Yorktown Hts, NY 10598
914-784-7287;  IBM tie line 863-7287
Notes address:  Martin W Sachs/Watson/IBM
Internet address:  mwsachs @ us.ibm.com
*************************************************************************************



"Burdett, David" <david.burdett@commerceone.com> on 08/27/2001 09:52:00 PM

To:   Martin W Sachs/Watson/IBM@IBMUS
cc:   jacques durand <jacques@savvion.com>, "'christopher ferris'"
      <chris.ferris@east.sun.com>, ebxml-msg@lists.oasis-open.org,
      ebxml-cppa@lists.oasis-open.org
Subject:  RE: reliable messaging



I don't think that we are that far apart the critical difference in view is
around the "requirement" that the From Application is informed of the
delivery failure.

For example, what should an implementation do if the application has no
existing reasonable method of being notified of the delivery failure. In
this case, one reasonable approach might be for the MSH to log the failure
and then provide a GUI which allows a user to browse the log and decide
what
to do. I think that if we use a "SHALL" we would preclude the second
option.

MWS:  I woud argue that the second option satifies the SHALL.  Face it, if
an application has no reasonable way of being notified of delivery failure,
it cannot benefit from reliable messaging.  One more time: Removal of
uncertainty as to whether the message got there or not is the key element
of reliable messaging.

I think that if we take your view literally it would mean that if an
application could not accept this type of notification then the application
MUST be changed before ebXML reliable messaging could be used? I don't
think
this is reasonable. Thoughts?

MWS:  As I said above; if an application cannot get the benefit of reliable
messaging, it probably doesn't need it.  The existence of applications that
either don't need or can't get the benefit of reliable messaging should not
get in the way of doing it right.

I also think that there are two types of delivery notification:
1. The From MSH reporting delivery failure to the From Application, and
2. Another MSH (not the From MSH) reporting a delivery failure to the From
MSH.
I think you are focusing on the first. I'm thinking of both.

MWS:  Fine let's do BOTH.  Just remember that delivery failure notification
across the network is not reliable.

More detail below marked with <db></db>

Best wishes.

David
-----Original Message-----
From: Martin W Sachs [mailto:mwsachs@us.ibm.com]
Sent: Monday, August 27, 2001 10:31 AM
To: Burdett, David
Cc: jacques durand; 'christopher ferris';
ebxml-msg@lists.oasis-open.org; ebxml-cppa@lists.oasis-open.org
Subject: RE: reliable messaging



My replies below.

Regards,
Marty

****************************************************************************

*********

Martin W. Sachs
IBM T. J. Watson Research Center
P. O. B. 704
Yorktown Hts, NY 10598
914-784-7287;  IBM tie line 863-7287
Notes address:  Martin W Sachs/Watson/IBM
Internet address:  mwsachs @ us.ibm.com
****************************************************************************

*********



"Burdett, David" <david.burdett@commerceone.com> on 08/27/2001 12:55:22 PM

To:   Martin W Sachs/Watson/IBM@IBMUS, jacques durand <jacques@savvion.com>
cc:   "'christopher ferris'" <chris.ferris@east.sun.com>,
      ebxml-msg@lists.oasis-open.org, ebxml-cppa@lists.oasis-open.org
Subject:  RE: reliable messaging



Marty/Jacques

I think I agree with both of you.
1. We need to make much stronger statements about the From Party MSH
notifying the From Party Application that a message was not delivered.
2. We can't make it a "SHALL" or "MUST" as we have not specified the API
and
we can't check compliance.

MWS:  We can make it a SHALL and worry about defining the API later.  This
is
no different than every other function in the MS spec.
<db>See discussion above.</db>

So let's try and agree some words. How about the following for the
replacement first and last paragraphs in section 10.4 ...

Current first para (lines 1874-6) ...
>>If a message sent with deliverySemantics set to OnceAndOnlyOnce cannot be
delivered, the MSH or process SHOULD send a delivery failure notification
to
the From Party. The delivery failure notification message contains: ...<<



Revised first para ...
>>A MSH that is not the From Party MSH might receive a message sent with
deliverySemantics set to OnceAndOnlyOnce that it determines cannot be
delivered to the To Party application or other process that is the final
destination of the message. In this case, the MSH MUST send a delivery
failure notification to the From Party that contains: ...<<

MWS:  This is not sufficient.  A delivery failure notification sent over
the
network is inherently unreliable.  The From MSH will learn, from failure to
receive an ACK after the specified number of retries, that the message was
not delivered. The From MSH can reliably send a delivery failure
notification
to the From application.
<db>This is paragraph is saying what a MSH which *NOT* the From MSH should
do. I think your answer is saying what the From MSH should do which is
covered later.</db>

We should also add to the bulleted list ...
>>. a deliverySemantics attribute set to OnceAndOnlyOnce<<
... as the message should be sent reliably.

Current last para (lines 1886-9) ...
>>It is possible that an error message with an Error element with an
ErrorCode set to DeliveryFailure cannot be delivered successfully for some
reason. If this occurs, then the From party that is the ultimate
destination
for the error message SHOULD be informed of the problem by other means. How
this is done is outside the scope of this specification.<<

Revised last para, its now two and has sub headings ...
>>10.4.1 From Party MSH Behavior
The From Party MSH that sent a message with deliverySemantics set to
OnceAndOnlyOnce might determine that the message could not delivered. In
this case it is strongly RECOMMENDED that the From Party MSH notify the
application or other process that requested the message be sent of the
delivery failure. This should indicate whether the failure was certain, for
example, there was a communications failure that meant the message could
not
be sent, or probable, for example, although the message was sent, no
acknowledgement or delivery receipt was received.

MWS:  As indicated above, I do not agree to anything weaker than SHALL. The
word
"probable" in the next to last line is too weak.  Exhaustion of retries
with no
Acknowledgment is certain except for the unlikely case that the message was
delivered but the From party is continuously unable to receive ACKs. We do
need to think more about handling that case.
<db>I was "thinking more about handling that case" which is why I came up
with the wording. We also need to agree about the feasability of
"requiring"
a from MSH to notify the "application". One way around this is to fromally
define in the spec what is meant by the "application" and allow this to
include real software applications, notifying appropriate users via log (as
described earlier). But that still leaves the use case where the
implementer
want's the message sent reliably but as it increases the chances of success
but doesn't want to know if it doesn't work. I know that this is not
recommended, but sometimes implementers want to do things they really
shouldn't and we *can't* stop them.</db>

MWS:  That's easy - an application that doesn't care if the message was
delivered may ignore the delivery failure notification. That does not
give us carte blanche to decide that delivery failure notification is
not a system-level requirement.

10.4.2 Failure to deliver a DeliveryFailure message
It is also possible that an MSH that sent an error message with an Error
element with an ErrorCode set to DeliveryFailure determines that the
message
was not delivered successfully even though it was sent with
deliverySemantics set to OnceAndOnlyOnce. If this occurs, then it is
strongly RECOMMENDED that the party that is operating the MSH notifies the
From party that is the ultimate destination for the error message by other
means. How this is done is outside the scope of this specification.<<

MWS:  Same comments as above (10.4.1) apply. In addition, as others pointed
out, a message which is part of the RM protocol for another message should
not itself be sent reliably because that can lead to a never-ending series
of messages.
<db>I think your concern is covered by the text above which says that if
the
sending of a delivery failure fails then solve the problem by other
means.</db>

MWS:  Perhaps

MWS: As noted above, I do not agree to sending DeliveryFailure
over the network.  DeliveryFailure is an indication from the From MSH to
the
From application as a result of exhaustion of retries without receiving an
acknowlegment.
<db>I think there is possibly a misunderstanding here. Consider the
following use case:
1. An intermediate MSH receives a message and sends and acknowledgment back
to the "From MSH".
2. The intermediate MSH then determines that it cannot forward the message
to the "To MSH" as it is down.

What should the intermediate MSH do the options as I see it are:
1. Do nothing. The From MSH will then deduce that the delivery failed as no
Delivery Receipt was received. Note that there is an edge case as
previously
disccussed that the From MSH must assume that there is a small probability
that it was delivered.
2. Send a "Delivery Failure" message to the From MSH. This means that the
From MSH is then positively informed that the message was not delivered. I
prefer this option and it is what the current spec says.

MWS:  Again, that delivery receipt may fail to arrive.  We still need to
consider the end to end approach even with intermediaries.

Neither of these options preclude the From MSH informing the From
Application of the results of sending the message.
</db>

We also need to change lines 1849-53 in section 10.3.4 as this is now
covered in section 10.4 (see above). We also could extend it to describe
the
idea of checking that MSHs are up and running. Currently these lines
contain
...
>>. If the Sending MSH does not receive an Acknowledgment Message after the
maximum number of retries, the Sending MSH SHOULD notify the application
and/or system administrator function of the failure to receive an
acknowledgement.

MWS:  SHALL
<db> Marty!! This is what the current spec says!! I know you object to it
;)
</db>

MWS:  Sorry, it's a bit confusing to distinguish between what is now and
what we need.
. If the Sending MSH detects an unrecoverable communications protocol error
at the transport protocol level, the Sending MSH SHOULD resend the
message.<<

We could replace it with ...
>>. If the Sending MSH does not receive an Acknowledgmemt Message after the
maximum number of retries then the Sending MSH SHOULD:
   a) Send a Message Service Handler Ping Message to the same MSH one or
more times as the Sending MSH determines.
   b) If no Message Service Handler Pong Messages are received then the
Sending MSH MUST carry out the Failed Message Delivery behavior as
described
in section 10.4<<

MWS:  If this ping/pong is new for me.
<db>It's in version 1.0 of the spec.</db>
If it is useful in resolving doubt,
the SHOULD probably should be changed to SHALL.  However given that the
maximum number of retries has failed, it isn't obvious what value there is
to
performing ping/pong.  If there is value to it, the (b) need also to state
what
do do if ping/poing succeeds.
<db>Good point. If the ping succeeds then they should retry sending the
message again. Agreed?</db>

MWS:  This could get into a never-ending loop of retries and pings. It is
much simpler to pick a maximum number of retries which has a high
probability
of success declare delivery failure when the maximum number of retries
is exhausted.

Thoughts?

David
-----Original Message-----
From: Martin W Sachs [mailto:mwsachs@us.ibm.com]
Sent: Sunday, August 26, 2001 2:44 PM
To: jacques durand
Cc: Burdett, David; 'christopher ferris';
ebxml-msg@lists.oasis-open.org; ebxml-cppa@lists.oasis-open.org
Subject: Re: reliable messaging



Jacques,

I have to disagree.  With or without an API definition, the
reliable-messaging must include the sending application, receiving
application, and both MSHs.  A contract that is just between the MSHs is
worthless because the beneficiaries of the contract are the From and To
applications.  We can state the essentials of the contract and the
assumptions on the implementations with or without an API definition. Once
we have the API definition, we can go back and improve the description of
the contract and the assumptions on the implementations to take the API
into account but the assumptions and contract do not change.

Regards,
Marty

****************************************************************************


*********

Martin W. Sachs
IBM T. J. Watson Research Center
P. O. B. 704
Yorktown Hts, NY 10598
914-784-7287;  IBM tie line 863-7287
Notes address:  Martin W Sachs/Watson/IBM
Internet address:  mwsachs @ us.ibm.com
****************************************************************************


*********



jacques durand <jacques@savvion.com> on 08/24/2001 07:59:31 PM

To:   "Burdett, David" <david.burdett@commerceone.com>
cc:   "'christopher ferris'" <chris.ferris@east.sun.com>,
      ebxml-msg@lists.oasis-open.org, ebxml-cppa@lists.oasis-open.org
Subject:  Re: reliable messaging





"Burdett, David" wrote:

> Chris said ...
>
> >>>If no acknowledgment has been received, the sender continues to retry
> delivery, using the Retries and RetryInterval to govern processing. When
the
> number of retries identified by Retries is exceeded, the sending MSH
> SHOULD notify the sending "party" by some means that is unspecified
> (e.g. notify the application through some API that it provides, log
> something
> useful in an error log, etc.)<<<
>
> Note that there is an edge case where all the acknowledgements that were
> sent failed to be delivered, e.g. maybe a MSH can receive messages but
not
> send them. This means that even though no acknowledgement was received,
the
> message was actually delivered.

That is indeed a point we have demonstrated in past POC.
Clearly, RM cannot be substituted to a message-based transaction service,
which
is the right
level to guarantee consistency across parties' apps. But it can be the
basis for
such a service.
By NOT receiving an ack, the sender should not infer that the receiver has
not
received the message:
only that the reception has not been confirmed, and that it is OK to resend
it
(the duplication check doing the cleanup job on receiver side).

Regardless of what RM can or can't do, the question raise dby Martin W
Sachs
( the requirement to notify sending party) is interesting in that it
depends on
the definition
of RM:
(1) if RM is a contract between sending party, receiving party, and MSH
transport layer, then
these sender notifications (as well as elimination of duplicate for
receiver)
are part of the contract.
(2) if RM is a contract between two end-point MSHs, then these
notifications
have no normative
value.

My understanding is that (2) is currently applies (so SHOULD should remain
SHOULD...)
However, once an formal MS API is specified, the MS spec will have to
address
the
"contract" value of such API, with regard to sender and receiver...

My two cents...

Jacques Durand
Savvion

>
>
> David
> PS Catching up on emails and logging them into the change request
database
> ;)
>
> -----Original Message-----
> From: christopher ferris [mailto:chris.ferris@east.sun.com]
> Sent: Friday, August 03, 2001 7:36 AM
> To: ebxml-msg@lists.oasis-open.org; ebxml-cppa@lists.oasis-open.org
> Subject: Re: reliable messaging
>
> Marty,
>
> Please see below.
>
> Chris
>
> Martin W Sachs wrote:
> >
> > Chris,
> >
> > I think I may have been unclear.  I specifically am not after an
> > application-level response for this purpose.
> >
> > The question is:  when can a sending party conclude that his message
> either
> > was or wasn't delivered?  That time is not relevant to the performance
of
> > the application function.  If the service provider site goes down
before
> > processing the message, but the message has been persisted ( a key
> > requirement of reliable messaging), knowing that the message was
persisted
> > at the application is important information because it tells the
sending
> > party not to resend.
>
> Then receipt of the reliable messaging acknowledgment is the answer to
> your question. That is the point at which the sender knows that the
message
> has been received and persisted.
>
> >
> > Yes, receipt of the RM acknowledgment tells the party that the message
got
> > there but how long does the sending party wait to decide that it won't
be
> > receiving a guaranteed delivery failure notification?  The answer, in
my
>
> If the sender receives an acknowledgment, it won't be receiving a
guaranteed
> delivery failure notification because the message HAS been received. Once
> this acknowledgment has been received it SHOULD cease all reliable
messaging
> retries, etc. as any subsequent retries would place an unnecessary burden
> on both party's MSHs.
>
> If no acknowledgment has been received, the sender continues to retry
> delivery, using the Retries and RetryInterval to govern processing. When
the
> number of retries identified by Retries is exceeded, the sending MSH
> SHOULD notify the sending "party" by some means that is unspecified
> (e.g. notify the application through some API that it provides, log
> something
> useful in an error log, etc.)
>
> It isn't at all clear to me that the sender needs anything more than
> Retries and RetryInterval to achieve its mission. Again, persistDuration
> is NOT a sending MSH parameter, it is a receiving MSH parameter.
>
> > mind, is long enough for all the allowable reliable-messaging retries
to
> be
> > completed.  I believe that persistDuration is the right answer as long
as
> > it is prescribed that it be set long enough to cover the time to
complete
> > the allowed number of retries plus a little for propagating the
delivery
> > failure notification back to where the sending application can find it.
> > Alternately, a worst case time to recognize a delivery failure could be
> > defined.
> >
> > The sending application cannot determine if the message is relevant
unless
> > it knows that delivery did or did not succeed.  Receiving or not
receiving
> > a delivery failure notification within a defined time is crucial.
> >
> > Yes, what I described covers several layers in the stack and maybe
several
> > middleware "modules".  However, unless all the reliable messaging rules
> are
> > set down in one place, they will never be understood.
> >
> > ...and let me reiterate again:  The messaging service must guarantee
that
> a
> > delivery failure notification will be sent by the sending MSH to the
> > sending application in all cases where delivery could not be made.
> Without
> > this, reliable messaging is utterly broken because the key requirement
of
> > reliable messaging is that the state of the business transaction not be
in
> > doubt if the application-level acknowledgment is not received.  If the
> > message sender is not notified of delivery failure, reliable messaging
> > fails because the sending application does not know if the message got
to
> > the other party and therefore doesn't know how to recover.  People
outside
> > of the ebXML teams are starting to notice this failure and conclude
that
> > reliable messaging is no good.  Changing those SHOULDs to SHALLs is
> > essential to the business future of the ebXML specifications because
> > reliable messaging is a major component of the value of the ebXML
message
> > service.
>
> The MS specification cannot dictate to implementation vendors anything
> of this nature. How they notify the sending "party" (application or
> person) is strictly within their prerogative. The MS spec deals
exclusively
> with the details of the wire protocol, not the implementation details
> of how an MSH is integrated with some application.
>
> I don't see how this can be perceived as a failure of the specification
> when it is clearly (IMO) outside the scope of our work.
>
> If we change all of these SHOULDs to SHALLs then everyone would be
> asking "how?" to which there is no possible answer that covers all
possible
> cases.
>
> >
> > Regards,
> > Marty
> >
> >


----------------------------------------------------------------
To subscribe or unsubscribe from this elist use the subscription
manager: <http://lists.oasis-open.org/ob/adm.pl>