ebxml-msg message

Subject: Re: [ebxml-msg] an assessment of the reliability of pulled messages
From: Doug Bunting <Doug.Bunting@Sun.COM>
To: Jacques Durand <JDurand@us.fujitsu.com>
Date: Fri, 07 Jan 2005 11:41:53 -0800
Jacques,

Sorry for the length of this.  I hope the text is made clearer through the 
longer explanations rather than the reverse.  Please let me know what 
remains muddled.  Also, the diagrams below will only work when viewed with 
a fixed-width font.

I believe some aspects of both scenarios as described so far are unrealistic.

For #1, it would be more realistic to imagine a queue[1] maintained at the 
MSH level.  The RMP would not learn any new semantics as a result of adding 
the Pull to the overall situation.  It is the MSH, and not the RMP, which 
would implement semantics making the Pull request idempotent.  This does 
not require a new flag or MSH-level acknowledgement of any sort; it would 
be much simpler for messages to remain in the MSH queue until the 
underlying RMP indicates that reliable message had been received 
successfully.  (The RMP-level acknowledgement could certainly be bundled 
with a later Pull request as an optimization but that is a minor detail.) 
Other ways to make the Pull request idempotent are pretty easy to imagine 
as well.

For #2, the problem is a confusion between reliability of the waiting 
(to-be-pulled) message and reliably getting the next message.  If the Pull 
request / response pair is "reliable" (including caching of the response) 
at the RMP level, additional acknowledgements of the pulled messages become 
success signals for the receiver MSH rather than RMP signals.  Put another 
way, the receiver RMP has to do something special with the pulled messages 
and their acknowledgements because it cannot resend the messages: An 
acknowledgement in this scenario may end caching of the response but does 
not relate to any "retry loop" in that RMP.  (Note: Ending caching might 
result in problems for later duplicate Pull requests.)

I am not sure the following is that different from the scenarios Jeff and 
Jacques have drawn but wanted to come at it from another angle.  (I suspect 
the main difference is making the R MSH actions a bit more explicit.)  When 
laying out the various systems and drawing the message flow between them, 
the details for the two scenarios look something like:

1.	app	S MSH	S RMP		R RMP	R MSH	app (consumer)
						   |<-- Reliable msg(s)
	? -->	Pull -->Pull -->	Pull --> peek queue
	Rel. msg <-- Pull resp. / Rel. msg <--  <--|
			|--> RMP acknowledgement --> pop queue
						(MSH removes ack'd msg)
						 --> signal success -->

Note that until the RM Reply is received the MSH-maintained queue will have 
the same top message and the Pull request will therefore be idempotent.

While it may not be clear from the cramped diagram above, the requested 
reliable message (business payload) is taken out of the Pull response by 
the Sending MSH.  The application that requests the next reliable message 
(indicated by a "?" above) never sees Pull requests or responses.  If that 
were not true (for either scenario), it would not be clear what the RMP or 
MSH is providing the two applications.

The only change in behaviour for the RMP is that the acknowledgements serve 
as signals for the MSH rather than ending a retry loop at its level.  That 
is, the Pull response passes through the RMP without invoking its reliable 
machinery.  All that it does is remember the message identifier so the 
later acknowledgements are not treated as errors.  Scenario #2 requires an 
almost identical change in receiver RMP behaviour, plus more.

2.	app	S MSH	S RMP		R RMP	R MSH	app (consumer)
						   |<-- Reliable msg(s)
	? -->	Pull -->Pull -->	Pull --> pop queue
	Rel. msg <-- Pull resp. / Rel msg <--	<--|
			|--> RMP acknowledgement --> signal success -->

Failures in this case result in the Pull request being treated as a 
duplicate and the Receiver RMP must return the previous response.  That is, 
the Pull request is not idempotent and the Receiver RMP must ensure the 
Receiver MSH does not see duplicates.  The Receiver RMP is also responsible 
for caching the Pull response.

As mentioned above, the Receiver RMP must also remember the message 
identifier of the contained reliable message.  The first (response caching) 
is a requirement for something currently optional in the WS-Reliability 
specification.  The second (special casing the content of Pull responses 
and their acknowledgements) is entirely new behaviour.

  ---

Overall, the two scenarios look somewhat similar on the wire as I 
diagrammed them above.  This is primarily because I left out the WS-R 
headers in the scenario #2 Pull request and ignored the RM Reply to that 
reliable request.  In most cases, however, this information will result in 
a relatively minor expansion of the messages rather than additional 
messages (the Pull response is sent using the RM Response reply pattern).

The primary differences between the two scenarios seems to be: (a) Where 
the Pull responses are cached. (b) How much an off-the-shelf RMP must be 
changed to implement the new protocol.

On (a), I have imagined a consistent Receiver MSH / application interface 
that includes a mechanism for that application to push new outbound 
reliable messages into some queue, waiting for the next Pull request.  This 
is somewhat artificial but certainly makes it seem like the MSH queue is 
necessary and could be used as demonstrated in scenario #1.  In any case, 
scenario #1 makes the MSH consistently responsible for these Pull responses 
and scenario #2 turns responsibility over to the RMP after the first 
relevant Pull request.

On (b), I believe the RMP implementation must be changed more for scenario 
#2.  This is due to that scenario's non-standard response caching requirement.

Just a bit more below.

thanx,
	doug

[1] I could care less how either the RMP or the MSH are implemented but 
queues have well-understood and simple external semantics that are visible 
in their external interface.  For this scenario, I suggest we use this 
metaphor regardless of the perceived constraint on implementation.  That 
constraint is no more binding than the "caching" metaphor we imagine is 
involved in persisting reliable responses in the receiver RMP.

On 06-Jan-05 15:53, Jacques Durand wrote:
> Jeff:
>  
> Your Scenario #1  (Non-reliable Pull Request)
> is not realistic as you draft it, because you assume an RMP behavior 
> that is quite "new" w/r to WS-Reliability spec:
> namely, that the RMP will be able to resend the cached pulled message as 
> a sync response of an entirely new Pull request.
> That would assume that  the RMP has some knowledge of the Pull 
> semantics, and knows somehow that all "Pulls" for this party ID  are 
> equivalent, so that it can do the resending as a response of any of them.
>  
> The other radical alternative for Non-reliable Pull Request that could 
> be taken (but which has many problems)  can be summarized as:
>  
> "Because it is a Pull signal, we can redo it (MSH level) as many times 
> as we want, when we fail to get
> a pulled message. So no need to involve any RMP-level reliability."
>  
> Problems are:
>  
> - that only works if the Pull is idempotent, e.g. you need a flag saying 
> "I want to re-pull the same message as previous pull"
> that complicates things.

Complications are in the eye of the beholder.  If, as I do, you view a MSH 
level queue as necessary in any case (because the application connected to 
the receiver of the Pull request submits new outbound messages according to 
its own timing) and do not see the need for a new flag, the major 
complications arise when making larger modifications to your RMP 
implementation (for scenario #2).

> - some caching is assumed on MSH side so that pulled messages are still 
> available for resending until they are acknowledged somehow. That seems 
> to require a new persistence mechansim at MSH level that would not be 
> necessary if we rely just on RMP presistence.

As above, I disagree with this assertion.

> - that needs be governed by an MSH-level Ack (which we precisely wanted 
> to avoid lately).

I am not sure what we have been trying to avoid or why this would be at the 
MSH level.

> Trying to use the "next" Pull as an Ack is very tricky: that assumes a 
> complete serialization of Pull+response pairs, which I think is very 
> restrictive. We should allow for implementations sending batches of Pull 
> without waiting for the response between each.  That is quite possible 
> when using concurrently several HTTP connections, and even over a single 
> HTTP connection (pipelining).

What is the requirement for multiple Pull request / response pairs at once? 
  I agree that scenario #2 might support it but have not heard a request 
for this feature.

> - unless we try to avoid situations where the same ebMS message can be 
> successfully pulled twice (whcih seems hard to guarantee) we'll need to 
> do duplicate elimination at MSH level (based on ebMS ID).

This is necessary only in scenario #2.

> Jacques

...
References:
- RE: [ebxml-msg] an assessment of the reliability of pulled messages
  - From: Jacques Durand <JDurand@us.fujitsu.com>