ebxml-msg message

Subject: Re: T2 Retry with Delivery Receipt
From: christopher ferris <chris.ferris@Sun.COM>
To: ebXML Msg <ebxml-msg@lists.oasis-open.org>
Date: Tue, 18 Sep 2001 13:31:59 -0400
David,

Please see below.

Cheers,

Chris

David Fischer wrote:
> 
> Chris, your discussion is about RM and DR/NRR for RM.  We agreed to take DR out
> of the RM discussion.  In your discussion you asked if this satisfied the
> end-to-end retry need but you didn't discussion multi-hop at all?  All your
> examples were single hop (SMTP does not count as an IM and let's stay away from
> translating gateways for now).

Not at all. I gave 3 use cases, the third had a formal ebXML MSH intermediary
node. I should point out that it was you who raised the EDI/INT gateway
use case as a reason why you felt that end-to-end retries were a requirement.
I was only building on that.

> 
> My question about allowing the Sending Party to retry (manually or
> automatically) has nothing to do with RM.  The problem impacts RM
> (deliverySemantics=OnceAndOnlyOnce) only in that the Receiving Party MUST
> perform Idempotency.  My contention is that the ability to retry is required any
> time there is a delivery failure.  I am not breaching the issue of automatic vs.
> manual nor do I think we need to put that in the spec (in this we agree).

Okay, this is clearer.

If a DFN is returned, there are two possible meanings implied:
	- the message could not possibly have made it to the To Party
	- the message may have made it to the To Party, there is simply no proof (Ack)

I think that the speification already covers the DFN adequately with the exception
of Marty's suggested change (SHOULD to SHALL to ensure that a DFN is ALWAYS genereted).
The DFN has an Error severity if it is known that the message could not possibly
have reached its destination. The DFN has a severity of Warning if it MAY have
reached its destination but was simply unacknowledged after exhausting all retries.

A retry/resend of the identical message (same MessageId) above and beyond the 
RM-related retries can be accomodated when the DFN was generated locally 
(by the original sending MSH node) with a severity of either Error or Warning.
It isn't clear to me that we have defined clearly enough the means by which
the source of a DFN can be determined. It is also unclear as to whether
an origin MSH that cannot communicate with the endpoint actually constructs
an ebXML SOAP message, or whether it can simply throw an exception or notify
the "application" layer in some manner other than the creation of an ebXML
SOAP message that has an ErrorList with an Error element of DFN. This needs
some clarification in any event.
 
A resend of a new message (different MessageId) with the same payload can always
be safely accomodated if the severity of the DFN (generated anywhere
along the message path) was Error. We can, and possibly should state this
clearly in the specification, without going to the extreme of actually
specifying any required MSH behaviour w/r/t a resend/retry, thus leaving
it to the layer of software "above" the level of the MSH.

A DFN with a severity of Warning needs further investigation IMO. Clearly,
we should not encourage that a new message (same payload) be sent if
the DFN severity is Warning. We could, possibly in a non-normative section,
describe how the Status inquiry service can be used to determine the
status of the message w/r/t the To Party. If the message has indeed
not been received, then it would seem to me to be a relatively safe course of action
to send a new message with the same payload, assuming that this course
of action is suitable for the "application"/message (e.g. it's business ttl
is still viable). It should also be noted that the likelihood of this is fairly
remote.

> 
> I am not concerned with end-to-end RM here, except where duplicates are
> concerned.  This issue does not revolve around IMs being reliable or not or even
> if they are true MSHs or not.  This is end-to-end with a black box in the
> middle.
> 
> I define TRP failure as:
> 
>      1.  a DFN sent to the From Party MSH (error or warning)
>      2. an Error Message sent to the From Party MSH
>      3. the lack of a properly constructed Acknowledgement
>         Message (Ack/DR/NRR) upon request.
> 
> There's probably something else but I can't think what right now.  Let's take a
> few example use cases.
> 
>    - Lack of DR (when requested) (3)

If the DR is sent reliably, then its absense is significant cause for concern.

>    - If there is a network outage (1 or 3)

I assume that you mean 3 (DR) if the DR is sent unreliably. If sent reliably, then 
a network partition would result in a DFN (1) with a severity of Error which,
as I stated before, can be safely accomodated by resending the original, identical
message.

>    - DFN from IM to From Party MSH (1 or 3)

See above, if severity is Error, message can be safely sent as a new
message with the same payload. We can say this, but it must be clearly
stated that this functionality is outside the scope of the MSH proper,
but of course can be implemented as an add-on.

>    - NRR validation failure (3)

Seems to me that this use case needs further decomposition. Do you
mean that the receiving MSH failed to validate the signature of
the original received message, and is therefore reporting that
it will not process the message? This seems to be a case of a (2)
above. In that case, sending a new message with the same payload
is safe because the To Party has indicated that it will not
process the message. Of course, this case also requires further
investigation/intervention. If the signature is based on a certificate
that has expired, or which the To party doesn't recognize as valid,
then more than a simple retry is in order.

If the NRR validation failure is at the sending node, then
it isn't clear to me that resending the message is in order at all.

If the message was mangled in transit, then clearly, something
needs to be done to ensure that it never happens again! A retry
gets you nowhere when there is some manner of security violation.

>    - Lack of initial Ack (3)

Already accomodated in the spec with the RM retry protocol.

>    - Security Failure (error on Signature or Encryption) (2)

Send a new message with the same payload. See above regarding
the fact that there are more than likely bigger problems involved.

>    - XML text corruption in transit (2)

Unless you can verify that the message wasn't mangled to begin with,
a retry does little to resolve the problem. In any event, sending
a new message with the same payload is always safe in this circumstance
because it is known that the To Party cannot and will not process the
original message.

> 
> Some of these might be automatic and some will require a fix prior to retry.
> Lack of a DR is only one possible cause for a retry.  In any of these cases,
> there will be a retry of the same message (same MessageId) to prevent duplicates
> which means Idempotency must be performed by the Receiving Party.  If even one
> of these is valid, then end-to-end retries needs to be allowed.

See above, I don't think that there is need to do anything to support
resending a message beyond what is already accomodated by the spec.

> 
> Regards,
> 
> David Fischer
> Drummond Group.
> -----Original Message-----
> From: Chris.Ferris@Sun.COM [mailto:Chris.Ferris@Sun.COM]
> Sent: Tuesday, September 18, 2001 9:44 AM
> To: ebXML Msg
> Subject: Re: T2 Retry with Delivery Receipt
> 
> David,
> 
> Please see below.
> 
> Cheers,
> 
> Chris
> 
> David Fischer wrote:
> >
> > I haven't seen any discussion on the list from this question.  Does this mean
> > everyone agrees there are valid use cases supporting end-to-end retries?
> 
> I don't agree that there has been a valid use case presented.
> 
> Specifically, what we are concerned with are intermediary nodes
> that are ebXML MSH nodes, not transport intermediary nodes such as SMTP
> nodes along an SMTP message path between a From: and a To: email address.
> 
> e.g.
> 
>         MSHA->SMTP(local)->SMTP(x)->SMTP(y)->SMTP(To:<host>)->MSHB
> 
> The above is a *single hop* from the ebXML RM perspective. If the message
> is either delayed, or "lost" somewhere between MSHA and MSHB, then the
> ebXML RM protocol would kick in and there would be an automated retry
> based on retryInterval/retries as detailed in the spec because MSHA
> would not receive an Acknowledgment from MSHB. The same holds true for
> the case where the Acknowledgment is lost on the return message path.
> MSHA would automatically resend the message. MSHB would detect any
> duplicates, based on MessageId and respond with the *same* Acknowledgment
> that it sent in response to the original message it received.
> 
> If MSHA wants a DR for NRR from MSHB, it asks for this and would receive
> it. I believe that this makes a strong case for the DR to be delivered
> reliably, so that MSHA can be certain that it will be received.
> 
> In the use case where there is not an MSH at the To Party, as might be
> the case where some manner of gateway is employed, then the MSH which
> acts as that gateway has a responsibility to ensure that the message is
> reliably delivered. I don't believe that it is our responsibility to
> provide specification language for how that is to be achieved.
> 
> e.g.
> 
>         MSHA->HTTP->MSHB||EDI/INT->EDI/INT(To Party)
> 
> In the above case, MSHB is the endpoint from the perspective of the
> sending MSHA. MSHB would acknowledge the message from MSHA after it
> had persisted the message, thus having assumed full responsibility for
> delivering the message. How this is effected is outside the scope of
> our specification. MSHB (or more correctly, the gateway "application"
> at MSHB) can do whatever is necessary to ensure that the message is
> delivered. It can resend the message all it likes as far as I'm concerned.
> MSHA should not have to concern itself with these details since it
> successfully transitioned the responsibility to reliably deliver the
> message to the node at MSHB.
> 
> In the case where there is a reverse gateway at the To Party, then
> again, the ebXML RM protocol could be used to ensure that the To Party
> receives one and only one message just as was the case for the transfer
> between MSHA and MSHB. How you get a DR for NRR from the To Party
> in the above case is also beyond our scope. It must be assumed that
> somehow, the To Party generates some manner of EDI/INT equivalent to
> the DR which the MSHB||EDI/INT gateway translates into an ebXML DR.
> Again, we don't need to go there for the spec because it is outside
> our scope.
> 
> e.g.
> 
>         MSHA->HTTP->MSHB||EDI/INT->EDI/INT||MSHC
> 
> In the above case, if the message were lost between MSHB and MSHC,
> then the retries kick in, etc. thus ensuring that the message
> is safely delivered. In this use case, EDI/INT is equivalent
> to a transport such as HTTP (even if EDI/INT uses HTTP for its
> transport).
> 
> We do not assume that MSHB||EDI/INT is unreliable. We assume that
> it is reliable. I see no reason why we need to pursue the case where
> it is irresponsible and may lose messages or not bother to make
> any effort at ensuring that the message is reliably delivered
> safely to the next MSH node.
> 
> In the case where there IS no subsequent MSH node, then all bets
> are off as far as I'm concerned. We need not concern ourselves
> with this because it is outside our scope. We are and should be
> solely concerned with ensuring that we have a reliable messaging
> protocol that works effectively between MSH nodes that exchange
> messages over an unreliable transport protocol such as HTTP, SMTP
> or orange-juice cans and strings.
> 
> If you want to covver the case where some disaster at MSHB||EDI/INT
> GW node results in loss of data, then MSHB needs to rollback to some
> earlier state (one in which it has not seen the messages that it may have lost).
> MSHA can resend messages which MSHB will treat as new, forwarding
> them onto MSHC which would discard any duplicates as per the current
> spec. Any messages which are resent in this manner, which MSHC has not
> previously received would be processed accordingly.
> 
> Does this satisfy you your end-to-end retry requirement? Note that
> it doesn't involve any changes to the spec (IMO). If you or anyone
> else wants to build in an automated retry on non-receipt of a DR,
> you are free to do so. I disagree that it is something that the
> specification needs to comment on. The retries could be manual
> with equal effect (and that would also provide that MSHB had
> restored itself to some stable state if one assumes that a phone
> call or some other OOB communication is made to ensure that everything
> is ready to roll). For that matter, MSHA may have a similar rollback
> capability that it could invoke after consulting the To Party
> (possibly using the Status inquiry MSH service) to determine
> at which point the two parties need to resynchronize, etc.
> 
> Again, I have also repeatedly stated that a DR is not a requirement
> for all messages in all cases. It may be that it is frequently used,
> but in fact it may be mere window dressing to some. Some parties
> might consider the expected "response" or follow-on message as
> the proof they require to "know" that the message they sent
> was received. They may be satisfied with "if I get no business
> response, then the business transaction is null and void". This
> is likely to be something that the Business Server layer of software
> would concern itself with, not the MSH.
> 
> >
> > The way the spec is written now, single-hop, end-to-end retries work.
> Multi-hop
> > end-to-end retries do not work when RM is turned on (idempotence).  Can we now
> > discuss what that will entail?
> >
> > Regards,
> >
> > David Fischer
> > Drummond Group.
> >
> > -----Original Message-----
> > From: David Fischer [mailto:david@drummondgroup.com]
> > Sent: Friday, September 14, 2001 8:46 AM
> > To: Martin W Sachs; Christopher Ferris
> > Cc: Dan Weinreb; ebxml-msg@lists.oasis-open.org
> > Subject: RE: T2 Retry with Delivery Receipt
> >
> > This all comes down to "Are end-to-end Retries REQUIRED"?  All the other
> things
> > like automated retries, end-to-end RM, retry on DR, are secondary issues.
> >
> > Under any delivery failure scenario, the ability to retry the original send is
> > REQUIRED.  This might be automated or it might be manual.  It might come from
> > the MSH or from the Application.  It might be now or after a fix.  No matter
> > where or how, we MUST allow end-to-end Retries.
> >
> > Can anyone disagree with this?
> >
> > Regards,
> >
> > David Fischer
> > Drummond Group.
> >
> > -----Original Message-----
> > From: Martin W Sachs [mailto:mwsachs@us.ibm.com]
> > Sent: Friday, September 14, 2001 8:25 AM
> > To: Christopher Ferris
> > Cc: Dan Weinreb; david@drummondgroup.com; ebxml-msg@lists.oasis-open.org
> > Subject: Re: T2 Retry with Delivery Receipt
> >
> > Sure but it is an example of how ebxml end to end RM can work through
> > unreliable IMs.
> >
> >
> ********************************************************************************
> > *****
> >
> > Martin W. Sachs
> > IBM T. J. Watson Research Center
> > P. O. B. 704
> > Yorktown Hts, NY 10598
> > 914-784-7287;  IBM tie line 863-7287
> > Notes address:  Martin W Sachs/Watson/IBM
> > Internet address:  mwsachs @ us.ibm.com
> >
> ********************************************************************************
> > *****
> >
> > Christopher Ferris <chris.ferris@sun.com> on 09/13/2001 01:40:58 PM
> >
> > To:   Martin W Sachs/Watson/IBM@IBMUS
> > cc:   Dan Weinreb <dlw@exceloncorp.com>, david@drummondgroup.com,
> >       ebxml-msg@lists.oasis-open.org
> > Subject:  Re: T2 Retry with Delivery Receipt
> >
> > Marty,
> >
> > AN SMTP node is NOT an MSH node. It is not part of the equation.
> > The MSH nodes that are communication via SMTP are the ones that
> > adopt the RM protocol of retries in the absence of an Acknowledgment.
> > The SMTP nodes are incidental.
> >
> > Cheers,
> >
> > Chris
> >
> > Martin W Sachs wrote:
> > >
> > > Re:  "I think David's position is that we can't do that, because there
> > are
> > > hosts/entities out there that (a) must participate as ebXML MS IM's,
> > > and (b) that are unreliable.  The question is whether there's a use
> > > case demonstrating this."
> > >
> > > There is one major use case, which is SMTP.  SMTP intermediate nodes are
> > > notoriously unreliable and only acknowledge to the previous node so a
> > > sender has no idea whether the message got to its destination.  ebXML on
> > > top of SMTP is one of the major reasons for having ebXML reliable
> > messaging
> > > and only end to end reliable messaging helps with SMTP.  I don't know if
> > > there is a use case for ebXML unreliable intermediaries but if there is,
> > > end to end RM is the answer.
> > >
> > > Regards,
> > > Marty
> > >
> >
> ********************************************************************************
> > *****
> >
> > >
> > > Martin W. Sachs
> > > IBM T. J. Watson Research Center
> > > P. O. B. 704
> > > Yorktown Hts, NY 10598
> > > 914-784-7287;  IBM tie line 863-7287
> > > Notes address:  Martin W Sachs/Watson/IBM
> > > Internet address:  mwsachs @ us.ibm.com
> > >
> >
> ********************************************************************************
> > *****
> >
> > >
> > > Dan Weinreb <dlw@exceloncorp.com> on 09/13/2001 12:55:02 PM
> > >
> > > Please respond to Dan Weinreb <dlw@exceloncorp.com>
> > >
> > > To:   chris.ferris@sun.com
> > > cc:   david@drummondgroup.com, ebxml-msg@lists.oasis-open.org
> > > Subject:  Re: T2 Retry with  Delivery Receipt
> > >
> > >    Date: Thu, 13 Sep 2001 11:48:33 -0400
> > >    From: Christopher Ferris <chris.ferris@sun.com>
> > >
> > >    > The only problem is that the addition of multi-hop interferes with
> > > end-to-end
> > >    > retries (duplicates) which, as we have seen, is a fundamental
> > > functional
> > >    > requirement under all circumstances when a Delivery Receipt is
> > > requested but not
> > >    > received.
> > >
> > >    You're asking for retries on top of retries. What happens when the
> > > end-to-end
> > >    retries are exhausted and there is still no delivery receipt? Do we
> > add
> > > retries
> > >    of retries of retries? What happens when they fail? Do we add yet
> > > another layer?
> > >
> > > What David is asking for is perfectly sensible *if* you your failure
> > > model states that IM's are unreliable, e.g. that an IM might accept a
> > > message, and then silently forget it.  In that case, the end-to-end
> > > retries exist for a specific purpose: to harden the system against the
> > > possibility of flaky IM's.  There would be no need to add another
> > > layer unless there is some additional, distinct failure mode to be
> > > taken care of.
> > >
> > >    Why not focus on what you perceive as an omission in the spec, that an
> > > intermediary
> > >    has certain obligations w/r/t reliable delivery. Let's address that by
> > > adding
> > >    text that fully sets out what the responsibilities of an intermediary
> > > are
> > >    not only w/r/t RM but w/r/t routing and any other oddities of an
> > > intermediaries
> > >    role that is clearly distinct from that of an endpoint.
> > >
> > > I think David's position is that we can't do that, because there are
> > > hosts/entities out there that (a) must participate as ebXML MS IM's,
> > > and (b) that are unreliable.  The question is whether there's a use
> > > case demonstrating this.
> > >
> > >    I'd like to focus on the specific use case that you cited in the call,
> > > where
> > >    an MSH uses an EDI/INT gateway. Is there an ebXML MSH at the To Party
> > or
> > > do they
> > >    simply have an EDI/INT server?
> > >
> > >         MSHA -> IMSHGW -> EDI/INTGW -> EDI/INTB
> > >
> > >    In this case, how does the ebXML delivery receipt get generated? IMO,
> > > the
> > >    EDI/INT Gateway has a responsibility to ensure that the message is
> > > safely
> > >    delivered. How it does this is not the perview of our specification.
> > > However,
> > >    that doesn't obviate the responsibility that the gateway intermediary
> > > node
> > >    assumes.
> > >
> > > I'd call this a protocol-translating gateway, not an ebXML MS IM at
> > > all.  I agree that the gateway has to make sure that the message is
> > > truly delivered, and then the gateway generates the DR.  It's the
> > > job of the protcol-translating gateway to create the illusion that
> > > the far end is really running ebXML MS.
> > >
> > > ----------------------------------------------------------------
> > > To subscribe or unsubscribe from this elist use the subscription
> > > manager: <http://lists.oasis-open.org/ob/adm.pl>
> > >
> > > ----------------------------------------------------------------
> > > To subscribe or unsubscribe from this elist use the subscription
> > > manager: <http://lists.oasis-open.org/ob/adm.pl>
> >
> > ----------------------------------------------------------------
> > To subscribe or unsubscribe from this elist use the subscription
> > manager: <http://lists.oasis-open.org/ob/adm.pl>
> >
> > ----------------------------------------------------------------
> > To subscribe or unsubscribe from this elist use the subscription
> > manager: <http://lists.oasis-open.org/ob/adm.pl>
> 
> ----------------------------------------------------------------
> To subscribe or unsubscribe from this elist use the subscription
> manager: <http://lists.oasis-open.org/ob/adm.pl>
> 
> ----------------------------------------------------------------
> To subscribe or unsubscribe from this elist use the subscription
> manager: <http://lists.oasis-open.org/ob/adm.pl>
Follow-Ups:
- RE: T2 Retry with Delivery Receipt
  - From: David Fischer <david@drummondgroup.com>
References:
- RE: T2 Retry with Delivery Receipt
  - From: David Fischer <david@drummondgroup.com>