[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: RE: reliable messaging
My rejoinders below. Regards, Marty ************************************************************************************* Martin W. Sachs IBM T. J. Watson Research Center P. O. B. 704 Yorktown Hts, NY 10598 914-784-7287; IBM tie line 863-7287 Notes address: Martin W Sachs/Watson/IBM Internet address: mwsachs @ us.ibm.com ************************************************************************************* "Burdett, David" <david.burdett@commerceone.com> on 08/27/2001 09:52:00 PM To: Martin W Sachs/Watson/IBM@IBMUS cc: jacques durand <jacques@savvion.com>, "'christopher ferris'" <chris.ferris@east.sun.com>, ebxml-msg@lists.oasis-open.org, ebxml-cppa@lists.oasis-open.org Subject: RE: reliable messaging I don't think that we are that far apart the critical difference in view is around the "requirement" that the From Application is informed of the delivery failure. For example, what should an implementation do if the application has no existing reasonable method of being notified of the delivery failure. In this case, one reasonable approach might be for the MSH to log the failure and then provide a GUI which allows a user to browse the log and decide what to do. I think that if we use a "SHALL" we would preclude the second option. MWS: I woud argue that the second option satifies the SHALL. Face it, if an application has no reasonable way of being notified of delivery failure, it cannot benefit from reliable messaging. One more time: Removal of uncertainty as to whether the message got there or not is the key element of reliable messaging. I think that if we take your view literally it would mean that if an application could not accept this type of notification then the application MUST be changed before ebXML reliable messaging could be used? I don't think this is reasonable. Thoughts? MWS: As I said above; if an application cannot get the benefit of reliable messaging, it probably doesn't need it. The existence of applications that either don't need or can't get the benefit of reliable messaging should not get in the way of doing it right. I also think that there are two types of delivery notification: 1. The From MSH reporting delivery failure to the From Application, and 2. Another MSH (not the From MSH) reporting a delivery failure to the From MSH. I think you are focusing on the first. I'm thinking of both. MWS: Fine let's do BOTH. Just remember that delivery failure notification across the network is not reliable. More detail below marked with <db></db> Best wishes. David -----Original Message----- From: Martin W Sachs [mailto:mwsachs@us.ibm.com] Sent: Monday, August 27, 2001 10:31 AM To: Burdett, David Cc: jacques durand; 'christopher ferris'; ebxml-msg@lists.oasis-open.org; ebxml-cppa@lists.oasis-open.org Subject: RE: reliable messaging My replies below. Regards, Marty **************************************************************************** ********* Martin W. Sachs IBM T. J. Watson Research Center P. O. B. 704 Yorktown Hts, NY 10598 914-784-7287; IBM tie line 863-7287 Notes address: Martin W Sachs/Watson/IBM Internet address: mwsachs @ us.ibm.com **************************************************************************** ********* "Burdett, David" <david.burdett@commerceone.com> on 08/27/2001 12:55:22 PM To: Martin W Sachs/Watson/IBM@IBMUS, jacques durand <jacques@savvion.com> cc: "'christopher ferris'" <chris.ferris@east.sun.com>, ebxml-msg@lists.oasis-open.org, ebxml-cppa@lists.oasis-open.org Subject: RE: reliable messaging Marty/Jacques I think I agree with both of you. 1. We need to make much stronger statements about the From Party MSH notifying the From Party Application that a message was not delivered. 2. We can't make it a "SHALL" or "MUST" as we have not specified the API and we can't check compliance. MWS: We can make it a SHALL and worry about defining the API later. This is no different than every other function in the MS spec. <db>See discussion above.</db> So let's try and agree some words. How about the following for the replacement first and last paragraphs in section 10.4 ... Current first para (lines 1874-6) ... >>If a message sent with deliverySemantics set to OnceAndOnlyOnce cannot be delivered, the MSH or process SHOULD send a delivery failure notification to the From Party. The delivery failure notification message contains: ...<< Revised first para ... >>A MSH that is not the From Party MSH might receive a message sent with deliverySemantics set to OnceAndOnlyOnce that it determines cannot be delivered to the To Party application or other process that is the final destination of the message. In this case, the MSH MUST send a delivery failure notification to the From Party that contains: ...<< MWS: This is not sufficient. A delivery failure notification sent over the network is inherently unreliable. The From MSH will learn, from failure to receive an ACK after the specified number of retries, that the message was not delivered. The From MSH can reliably send a delivery failure notification to the From application. <db>This is paragraph is saying what a MSH which *NOT* the From MSH should do. I think your answer is saying what the From MSH should do which is covered later.</db> We should also add to the bulleted list ... >>. a deliverySemantics attribute set to OnceAndOnlyOnce<< ... as the message should be sent reliably. Current last para (lines 1886-9) ... >>It is possible that an error message with an Error element with an ErrorCode set to DeliveryFailure cannot be delivered successfully for some reason. If this occurs, then the From party that is the ultimate destination for the error message SHOULD be informed of the problem by other means. How this is done is outside the scope of this specification.<< Revised last para, its now two and has sub headings ... >>10.4.1 From Party MSH Behavior The From Party MSH that sent a message with deliverySemantics set to OnceAndOnlyOnce might determine that the message could not delivered. In this case it is strongly RECOMMENDED that the From Party MSH notify the application or other process that requested the message be sent of the delivery failure. This should indicate whether the failure was certain, for example, there was a communications failure that meant the message could not be sent, or probable, for example, although the message was sent, no acknowledgement or delivery receipt was received. MWS: As indicated above, I do not agree to anything weaker than SHALL. The word "probable" in the next to last line is too weak. Exhaustion of retries with no Acknowledgment is certain except for the unlikely case that the message was delivered but the From party is continuously unable to receive ACKs. We do need to think more about handling that case. <db>I was "thinking more about handling that case" which is why I came up with the wording. We also need to agree about the feasability of "requiring" a from MSH to notify the "application". One way around this is to fromally define in the spec what is meant by the "application" and allow this to include real software applications, notifying appropriate users via log (as described earlier). But that still leaves the use case where the implementer want's the message sent reliably but as it increases the chances of success but doesn't want to know if it doesn't work. I know that this is not recommended, but sometimes implementers want to do things they really shouldn't and we *can't* stop them.</db> MWS: That's easy - an application that doesn't care if the message was delivered may ignore the delivery failure notification. That does not give us carte blanche to decide that delivery failure notification is not a system-level requirement. 10.4.2 Failure to deliver a DeliveryFailure message It is also possible that an MSH that sent an error message with an Error element with an ErrorCode set to DeliveryFailure determines that the message was not delivered successfully even though it was sent with deliverySemantics set to OnceAndOnlyOnce. If this occurs, then it is strongly RECOMMENDED that the party that is operating the MSH notifies the From party that is the ultimate destination for the error message by other means. How this is done is outside the scope of this specification.<< MWS: Same comments as above (10.4.1) apply. In addition, as others pointed out, a message which is part of the RM protocol for another message should not itself be sent reliably because that can lead to a never-ending series of messages. <db>I think your concern is covered by the text above which says that if the sending of a delivery failure fails then solve the problem by other means.</db> MWS: Perhaps MWS: As noted above, I do not agree to sending DeliveryFailure over the network. DeliveryFailure is an indication from the From MSH to the From application as a result of exhaustion of retries without receiving an acknowlegment. <db>I think there is possibly a misunderstanding here. Consider the following use case: 1. An intermediate MSH receives a message and sends and acknowledgment back to the "From MSH". 2. The intermediate MSH then determines that it cannot forward the message to the "To MSH" as it is down. What should the intermediate MSH do the options as I see it are: 1. Do nothing. The From MSH will then deduce that the delivery failed as no Delivery Receipt was received. Note that there is an edge case as previously disccussed that the From MSH must assume that there is a small probability that it was delivered. 2. Send a "Delivery Failure" message to the From MSH. This means that the From MSH is then positively informed that the message was not delivered. I prefer this option and it is what the current spec says. MWS: Again, that delivery receipt may fail to arrive. We still need to consider the end to end approach even with intermediaries. Neither of these options preclude the From MSH informing the From Application of the results of sending the message. </db> We also need to change lines 1849-53 in section 10.3.4 as this is now covered in section 10.4 (see above). We also could extend it to describe the idea of checking that MSHs are up and running. Currently these lines contain ... >>. If the Sending MSH does not receive an Acknowledgment Message after the maximum number of retries, the Sending MSH SHOULD notify the application and/or system administrator function of the failure to receive an acknowledgement. MWS: SHALL <db> Marty!! This is what the current spec says!! I know you object to it ;) </db> MWS: Sorry, it's a bit confusing to distinguish between what is now and what we need. . If the Sending MSH detects an unrecoverable communications protocol error at the transport protocol level, the Sending MSH SHOULD resend the message.<< We could replace it with ... >>. If the Sending MSH does not receive an Acknowledgmemt Message after the maximum number of retries then the Sending MSH SHOULD: a) Send a Message Service Handler Ping Message to the same MSH one or more times as the Sending MSH determines. b) If no Message Service Handler Pong Messages are received then the Sending MSH MUST carry out the Failed Message Delivery behavior as described in section 10.4<< MWS: If this ping/pong is new for me. <db>It's in version 1.0 of the spec.</db> If it is useful in resolving doubt, the SHOULD probably should be changed to SHALL. However given that the maximum number of retries has failed, it isn't obvious what value there is to performing ping/pong. If there is value to it, the (b) need also to state what do do if ping/poing succeeds. <db>Good point. If the ping succeeds then they should retry sending the message again. Agreed?</db> MWS: This could get into a never-ending loop of retries and pings. It is much simpler to pick a maximum number of retries which has a high probability of success declare delivery failure when the maximum number of retries is exhausted. Thoughts? David -----Original Message----- From: Martin W Sachs [mailto:mwsachs@us.ibm.com] Sent: Sunday, August 26, 2001 2:44 PM To: jacques durand Cc: Burdett, David; 'christopher ferris'; ebxml-msg@lists.oasis-open.org; ebxml-cppa@lists.oasis-open.org Subject: Re: reliable messaging Jacques, I have to disagree. With or without an API definition, the reliable-messaging must include the sending application, receiving application, and both MSHs. A contract that is just between the MSHs is worthless because the beneficiaries of the contract are the From and To applications. We can state the essentials of the contract and the assumptions on the implementations with or without an API definition. Once we have the API definition, we can go back and improve the description of the contract and the assumptions on the implementations to take the API into account but the assumptions and contract do not change. Regards, Marty **************************************************************************** ********* Martin W. Sachs IBM T. J. Watson Research Center P. O. B. 704 Yorktown Hts, NY 10598 914-784-7287; IBM tie line 863-7287 Notes address: Martin W Sachs/Watson/IBM Internet address: mwsachs @ us.ibm.com **************************************************************************** ********* jacques durand <jacques@savvion.com> on 08/24/2001 07:59:31 PM To: "Burdett, David" <david.burdett@commerceone.com> cc: "'christopher ferris'" <chris.ferris@east.sun.com>, ebxml-msg@lists.oasis-open.org, ebxml-cppa@lists.oasis-open.org Subject: Re: reliable messaging "Burdett, David" wrote: > Chris said ... > > >>>If no acknowledgment has been received, the sender continues to retry > delivery, using the Retries and RetryInterval to govern processing. When the > number of retries identified by Retries is exceeded, the sending MSH > SHOULD notify the sending "party" by some means that is unspecified > (e.g. notify the application through some API that it provides, log > something > useful in an error log, etc.)<<< > > Note that there is an edge case where all the acknowledgements that were > sent failed to be delivered, e.g. maybe a MSH can receive messages but not > send them. This means that even though no acknowledgement was received, the > message was actually delivered. That is indeed a point we have demonstrated in past POC. Clearly, RM cannot be substituted to a message-based transaction service, which is the right level to guarantee consistency across parties' apps. But it can be the basis for such a service. By NOT receiving an ack, the sender should not infer that the receiver has not received the message: only that the reception has not been confirmed, and that it is OK to resend it (the duplication check doing the cleanup job on receiver side). Regardless of what RM can or can't do, the question raise dby Martin W Sachs ( the requirement to notify sending party) is interesting in that it depends on the definition of RM: (1) if RM is a contract between sending party, receiving party, and MSH transport layer, then these sender notifications (as well as elimination of duplicate for receiver) are part of the contract. (2) if RM is a contract between two end-point MSHs, then these notifications have no normative value. My understanding is that (2) is currently applies (so SHOULD should remain SHOULD...) However, once an formal MS API is specified, the MS spec will have to address the "contract" value of such API, with regard to sender and receiver... My two cents... Jacques Durand Savvion > > > David > PS Catching up on emails and logging them into the change request database > ;) > > -----Original Message----- > From: christopher ferris [mailto:chris.ferris@east.sun.com] > Sent: Friday, August 03, 2001 7:36 AM > To: ebxml-msg@lists.oasis-open.org; ebxml-cppa@lists.oasis-open.org > Subject: Re: reliable messaging > > Marty, > > Please see below. > > Chris > > Martin W Sachs wrote: > > > > Chris, > > > > I think I may have been unclear. I specifically am not after an > > application-level response for this purpose. > > > > The question is: when can a sending party conclude that his message > either > > was or wasn't delivered? That time is not relevant to the performance of > > the application function. If the service provider site goes down before > > processing the message, but the message has been persisted ( a key > > requirement of reliable messaging), knowing that the message was persisted > > at the application is important information because it tells the sending > > party not to resend. > > Then receipt of the reliable messaging acknowledgment is the answer to > your question. That is the point at which the sender knows that the message > has been received and persisted. > > > > > Yes, receipt of the RM acknowledgment tells the party that the message got > > there but how long does the sending party wait to decide that it won't be > > receiving a guaranteed delivery failure notification? The answer, in my > > If the sender receives an acknowledgment, it won't be receiving a guaranteed > delivery failure notification because the message HAS been received. Once > this acknowledgment has been received it SHOULD cease all reliable messaging > retries, etc. as any subsequent retries would place an unnecessary burden > on both party's MSHs. > > If no acknowledgment has been received, the sender continues to retry > delivery, using the Retries and RetryInterval to govern processing. When the > number of retries identified by Retries is exceeded, the sending MSH > SHOULD notify the sending "party" by some means that is unspecified > (e.g. notify the application through some API that it provides, log > something > useful in an error log, etc.) > > It isn't at all clear to me that the sender needs anything more than > Retries and RetryInterval to achieve its mission. Again, persistDuration > is NOT a sending MSH parameter, it is a receiving MSH parameter. > > > mind, is long enough for all the allowable reliable-messaging retries to > be > > completed. I believe that persistDuration is the right answer as long as > > it is prescribed that it be set long enough to cover the time to complete > > the allowed number of retries plus a little for propagating the delivery > > failure notification back to where the sending application can find it. > > Alternately, a worst case time to recognize a delivery failure could be > > defined. > > > > The sending application cannot determine if the message is relevant unless > > it knows that delivery did or did not succeed. Receiving or not receiving > > a delivery failure notification within a defined time is crucial. > > > > Yes, what I described covers several layers in the stack and maybe several > > middleware "modules". However, unless all the reliable messaging rules > are > > set down in one place, they will never be understood. > > > > ...and let me reiterate again: The messaging service must guarantee that > a > > delivery failure notification will be sent by the sending MSH to the > > sending application in all cases where delivery could not be made. > Without > > this, reliable messaging is utterly broken because the key requirement of > > reliable messaging is that the state of the business transaction not be in > > doubt if the application-level acknowledgment is not received. If the > > message sender is not notified of delivery failure, reliable messaging > > fails because the sending application does not know if the message got to > > the other party and therefore doesn't know how to recover. People outside > > of the ebXML teams are starting to notice this failure and conclude that > > reliable messaging is no good. Changing those SHOULDs to SHALLs is > > essential to the business future of the ebXML specifications because > > reliable messaging is a major component of the value of the ebXML message > > service. > > The MS specification cannot dictate to implementation vendors anything > of this nature. How they notify the sending "party" (application or > person) is strictly within their prerogative. The MS spec deals exclusively > with the details of the wire protocol, not the implementation details > of how an MSH is integrated with some application. > > I don't see how this can be perceived as a failure of the specification > when it is clearly (IMO) outside the scope of our work. > > If we change all of these SHOULDs to SHALLs then everyone would be > asking "how?" to which there is no possible answer that covers all possible > cases. > > > > > Regards, > > Marty > > > > ---------------------------------------------------------------- To subscribe or unsubscribe from this elist use the subscription manager: <http://lists.oasis-open.org/ob/adm.pl>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC