[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: T2 Retry with Delivery Receipt
I have another idea about how to clarify the hop-to-hop/end-to-end debate. First of all, whichever of the two points of view you take, there is such a thing as "MSH-level failure" to deliver a message. For example, suppose From Party F talks to To Party T directly, with no IM's at all. F tries sending the message and it fails, or F doesn't get an Ack from T. Therefore F does all of its retries; sadly, all of them fail or don't result in an Ack. There's nothing more to try, so the From MSH returns an "MSH-level failure" error code (or exception, whatever) to the From Application. This is what some people have been calling "DFN", but "DFN" has also been used as the name of a message, so I want to avoid that terminological confusion. The "MSH-level failure" means "I, the MSH, have tried to get your message through, but after trying everything within my power, I cannot confirm that your message got through." (There are also cases where the MSH can promise that the message did not get through, but those are too easy to be of interest here so I'll ignore them.) At this point, any further retrying has to be at a higher level than the MS layer at all, i.e. the application layer. For example, maybe the human administrators reconfigure things, pick a new HTTP server because they decide the old one is too broken for words, negotiate a new CPA, and then try again. But I think that if this kind of "application-level retry" is done, the new message is considered "new" by the MSH, which doesn't know that the application level thinks of it as a retry. So what about the possibility that the first message actually did get through, and you might be causing a duplication? The only answer here is that you must resolve the doubt before doing the application-level retry. This clearly (IMHO) is what MessageStatus is for: to get a positive or negative resolution from the TO party. There's nothing wrong with MessageStatus. Distributed systems have long had things like this. For example, in two-phase commit protocols, there is a message where one of the resource managers can ask the transaction manager "please tell me what the outcome of transaction 1324 turned out to be". It's just like that. Anyway, there is no way for the From MSH to resolve its doubt about whether the message was delivered until it (the From MSH) can actually communicate with the To MSH. OK, now let's introduce IM's into the picture, assuming the overall model of "reliable IM"'s advocated by Chris and Colleen. Suppose F talks to T through a chain of IM's: F <-> IM1 <-> IM2 <-> T. IM1 tries to send the message to IM2, and IM1 exhausts all his retries without success. What happens now? In the "reliable IM" model, if communication between any two adjacent MSH's fails, that constitutes MSH failure for the original request, and the F MSH should return a failure code (or exception) to the F application. It's just like the simple case of MSH failure that I started out with. The only recourse at this point is application-level retry as discussed above. There is no point in doing an "end-to-end retry", because retrying has already been attempted and has proven inadequate. You'd just be beating a dead horse. There is no failure mode that would be repaired by an end-to-end retry that would not have already been taken care of by the hop-to-hop retry. That's because the "reliable IM" axiom says that there aren't any such failure modes in the failure model. The only failure modes are in the network between the MSH's, and the hop-to-hop retries take care of that. (Of course you have to tune the retry parameters based on just how flakey the network is.) Again, I am not taking sides here. I just want to point out that the model Chris and Colleen are advocating is entirely consistent and works just fine, as long as you accept the "reliable IM" axiom. On the other hand, suppose you assume the "unreliable IM" model advocated by David F and Marty. In this case, end-to-end retry is useful, because it recovers from "IM failure". "IM failure" is different from the network failure that the hop-to-hop retries take care of. F would do an end-to-end retry either because some kind of "delivery failed" message was sent back to it, or else because it times out. (Of course end-to-end retry implies that there must be something like a "retry count" field in the message so that there should be two different kinds of message identity, as we've discused.) Chris, it seems to me that you'd have to agree that hop-to-hop isn't adequate if someone were to provide a use case that met all of the following criteria: - It's clearly and compellingly something that we must support. - The node in question must, for some reason, be treated as an IM at the ebXML MS level; it cannot, for whatever reason, be treated as if it were an SMTP store-and-forward mailer at the underlying transport/communication layer. - The node is unreliable, e.g. it can drop (or duplicate) messages for reasons other than the network being its usual flakey message-dropping self. To put it another way, I think your (Chris) position is that there isn't ever going to be compelling reason to take an unreliable node and treat it as an IM. I would like to add that protocol-translating gateways are very much *not* the same thing as IM's. Providing a use case in which we need to use an unreliable protocol-translating gateway does *not* contradict the hop-to-hop approach for dealing with IM's. To knock down hop-to-hop, someone must come up with a genuine IM use case, corresponding to Figure 8-2 (in section 8.5.4, page 28, of the MS Spec).
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC