ebxml-msg message

Subject: RE: [ebxml-msg] an assessment of the reliability of pulled messages

From: Jacques Durand <JDurand@us.fujitsu.com>
To: "'Doug Bunting'" <Doug.Bunting@sun.com>, Jacques Durand <JDurand@us.fujitsu.com>
Date: Mon, 10 Jan 2005 20:19:29 -0800

Title: RE: [ebxml-msg] an assessment of the reliability of pulled messages

Doug:

It took me awhile, but here is some comments on your scenario comparisons, attached.
Discounting perceived bad things about the "reliable Pull", and unveiling some issues with the "MSH idempotent method"...

Hamid has suggested to me a 3rd alternative that I have not yet looked at much,
but is about introducing a status request that returns a list of ebMS IDs to be pulled
and then only Pull(ID) would be done.

Jacques

-----Original Message-----
From: Doug Bunting [mailto:Doug.Bunting@Sun.COM]
Sent: Friday, January 07, 2005 11:42 AM
To: Jacques Durand
Cc: 'Jeff Turpin'; 'ebxml-msg@lists.oasis-open.org'
Subject: Re: [ebxml-msg] an assessment of the reliability of pulled
messages

Jacques,

Sorry for the length of this. I hope the text is made clearer through the
longer explanations rather than the reverse. Please let me know what
remains muddled. Also, the diagrams below will only work when viewed with
a fixed-width font.

I believe some aspects of both scenarios as described so far are unrealistic.

For #1, it would be more realistic to imagine a queue[1] maintained at the
MSH level. The RMP would not learn any new semantics as a result of adding
the Pull to the overall situation. It is the MSH, and not the RMP, which
would implement semantics making the Pull request idempotent. This does
not require a new flag or MSH-level acknowledgement of any sort; it would
be much simpler for messages to remain in the MSH queue until the
underlying RMP indicates that reliable message had been received
successfully. (The RMP-level acknowledgement could certainly be bundled
with a later Pull request as an optimization but that is a minor detail.)
Other ways to make the Pull request idempotent are pretty easy to imagine
as well.

For #2, the problem is a confusion between reliability of the waiting
(to-be-pulled) message and reliably getting the next message. If the Pull
request / response pair is "reliable" (including caching of the response)
at the RMP level, additional acknowledgements of the pulled messages become
success signals for the receiver MSH rather than RMP signals. Put another
way, the receiver RMP has to do something special with the pulled messages
and their acknowledgements because it cannot resend the messages: An
acknowledgement in this scenario may end caching of the response but does
not relate to any "retry loop" in that RMP. (Note: Ending caching might
result in problems for later duplicate Pull requests.)

I am not sure the following is that different from the scenarios Jeff and
Jacques have drawn but wanted to come at it from another angle. (I suspect
the main difference is making the R MSH actions a bit more explicit.) When
laying out the various systems and drawing the message flow between them,
the details for the two scenarios look something like:

1.      app     S MSH   S RMP           R RMP   R MSH   app (consumer)
                                                   |<-- Reliable msg(s)
        ? -->   Pull -->Pull -->        Pull --> peek queue
        Rel. msg <-- Pull resp. / Rel. msg <-- <--|
                        |--> RMP acknowledgement --> pop queue
                                                (MSH removes ack'd msg)
                                                 --> signal success -->

Note that until the RM Reply is received the MSH-maintained queue will have
the same top message and the Pull request will therefore be idempotent.

While it may not be clear from the cramped diagram above, the requested
reliable message (business payload) is taken out of the Pull response by
the Sending MSH. The application that requests the next reliable message
(indicated by a "?" above) never sees Pull requests or responses. If that
were not true (for either scenario), it would not be clear what the RMP or
MSH is providing the two applications.

The only change in behaviour for the RMP is that the acknowledgements serve
as signals for the MSH rather than ending a retry loop at its level. That
is, the Pull response passes through the RMP without invoking its reliable
machinery. All that it does is remember the message identifier so the
later acknowledgements are not treated as errors. Scenario #2 requires an
almost identical change in receiver RMP behaviour, plus more.

2.      app     S MSH   S RMP           R RMP   R MSH   app (consumer)
                                                   |<-- Reliable msg(s)
        ? -->   Pull -->Pull -->        Pull --> pop queue
        Rel. msg <-- Pull resp. / Rel msg <--   <--|
                        |--> RMP acknowledgement --> signal success -->

Failures in this case result in the Pull request being treated as a
duplicate and the Receiver RMP must return the previous response. That is,
the Pull request is not idempotent and the Receiver RMP must ensure the
Receiver MSH does not see duplicates. The Receiver RMP is also responsible
for caching the Pull response.

As mentioned above, the Receiver RMP must also remember the message
identifier of the contained reliable message. The first (response caching)
is a requirement for something currently optional in the WS-Reliability
specification. The second (special casing the content of Pull responses
and their acknowledgements) is entirely new behaviour.

---

Overall, the two scenarios look somewhat similar on the wire as I
diagrammed them above. This is primarily because I left out the WS-R
headers in the scenario #2 Pull request and ignored the RM Reply to that
reliable request. In most cases, however, this information will result in
a relatively minor expansion of the messages rather than additional
messages (the Pull response is sent using the RM Response reply pattern).

The primary differences between the two scenarios seems to be: (a) Where
the Pull responses are cached. (b) How much an off-the-shelf RMP must be
changed to implement the new protocol.

On (a), I have imagined a consistent Receiver MSH / application interface
that includes a mechanism for that application to push new outbound
reliable messages into some queue, waiting for the next Pull request. This
is somewhat artificial but certainly makes it seem like the MSH queue is
necessary and could be used as demonstrated in scenario #1. In any case,
scenario #1 makes the MSH consistently responsible for these Pull responses
and scenario #2 turns responsibility over to the RMP after the first
relevant Pull request.

On (b), I believe the RMP implementation must be changed more for scenario
#2. This is due to that scenario's non-standard response caching requirement.

Just a bit more below.

thanx,
doug

[1] I could care less how either the RMP or the MSH are implemented but
queues have well-understood and simple external semantics that are visible
in their external interface. For this scenario, I suggest we use this
metaphor regardless of the perceived constraint on implementation. That
constraint is no more binding than the "caching" metaphor we imagine is
involved in persisting reliable responses in the receiver RMP.

On 06-Jan-05 15:53, Jacques Durand wrote:
> Jeff:
>
> Your Scenario #1 (Non-reliable Pull Request)
> is not realistic as you draft it, because you assume an RMP behavior
> that is quite "new" w/r to WS-Reliability spec:
> namely, that the RMP will be able to resend the cached pulled message as
> a sync response of an entirely new Pull request.
> That would assume that the RMP has some knowledge of the Pull
> semantics, and knows somehow that all "Pulls" for this party ID are
> equivalent, so that it can do the resending as a response of any of them.
>
> The other radical alternative for Non-reliable Pull Request that could
> be taken (but which has many problems) can be summarized as:
>
> "Because it is a Pull signal, we can redo it (MSH level) as many times
> as we want, when we fail to get
> a pulled message. So no need to involve any RMP-level reliability."
>
> Problems are:
>
> - that only works if the Pull is idempotent, e.g. you need a flag saying
> "I want to re-pull the same message as previous pull"
> that complicates things.

Complications are in the eye of the beholder. If, as I do, you view a MSH
level queue as necessary in any case (because the application connected to
the receiver of the Pull request submits new outbound messages according to
its own timing) and do not see the need for a new flag, the major
complications arise when making larger modifications to your RMP
implementation (for scenario #2).

> - some caching is assumed on MSH side so that pulled messages are still
> available for resending until they are acknowledged somehow. That seems
> to require a new persistence mechansim at MSH level that would not be
> necessary if we rely just on RMP presistence.

As above, I disagree with this assertion.

> - that needs be governed by an MSH-level Ack (which we precisely wanted
> to avoid lately).

I am not sure what we have been trying to avoid or why this would be at the
MSH level.

> Trying to use the "next" Pull as an Ack is very tricky: that assumes a
> complete serialization of Pull+response pairs, which I think is very
> restrictive. We should allow for implementations sending batches of Pull
> without waiting for the response between each. That is quite possible
> when using concurrently several HTTP connections, and even over a single
> HTTP connection (pipelining).

What is the requirement for multiple Pull request / response pairs at once?
I agree that scenario #2 might support it but have not heard a request
for this feature.

> - unless we try to avoid situations where the same ebMS message can be
> successfully pulled twice (whcih seems hard to guarantee) we'll need to
> do duplicate elimination at MSH level (based on ebMS ID).

This is necessary only in scenario #2.

> Jacques

...

From: Doug Bunting [Doug.Bunting@Sun.COM]
Sent: Friday, January 07, 2005 11:42 AM
To: Jacques Durand
Cc: 'Jeff Turpin'; 'ebxml-msg@lists.oasis-open.org'
Subject: Re: [ebxml-msg] an assessment of the reliability of pulled
messages

Jacques,

Sorry for the length of this.  I hope the text is made clearer through the 
longer explanations rather than the reverse.  Please let me know what 
remains muddled.  Also, the diagrams below will only work when viewed with 
a fixed-width font.

I believe some aspects of both scenarios as described so far are unrealistic.

For #1, it would be more realistic to imagine a queue[1] maintained at the 
MSH level.  The RMP would not learn any new semantics as a result of adding 
the Pull to the overall situation.  It is the MSH, and not the RMP, which 
would implement semantics making the Pull request idempotent.  This does 
not require a new flag or MSH-level acknowledgement of any sort; it would 
be much simpler for messages to remain in the MSH queue until the 
underlying RMP indicates that reliable message had been received 
successfully.  (The RMP-level acknowledgement could certainly be bundled 
with a later Pull request as an optimization but that is a minor detail.) 
Other ways to make the Pull request idempotent are pretty easy to imagine 
as well.

<JD> I can see that working, but not without serious complications or restrictions at other places.
By keeping a pulled message in Receiver MSH until it gets acknowledged (at MSH level) and enforcing idempotence
of MSH pull operation, you assume that this message will be returned as response to any subsequent Pull signal 
for this partyID, until it is finally acknowledged.
This causes the following issues:
- buggy scenario: the Receiver MSH has not received yet an Ack for a Pull, so resends the same message as response
to the next Pull, yet the first response was well received, just the Ack was "on the way" (as a separate Callback, 
could be sent before the 2nd Pull but arrive after...)
So if duplicate elimination is required we must check it at MSH level on sending side (can't rely on RMP).
- scalability issue: that does not allow for a "pipelined" mode where several Pull messages are sent to receiver without
waiting for their responses (that can be done by pooling several HTTP connections, or even using just one, since
HTTP allows for this kind of pipelining, where all is required is that HTTP responses are sent back in same order 
as HTTP requests). This pipelining is not possible because the MSH would always serve the same pulled message until an Ack
comes back much later. I think that can be a serious scalability issue: an HTTP rountrip that includes
MSH layer may take a while, relatively speaking, and a serialization may slow things considerably in some MSHs.
- Also, an HTTP Pull request may timeout for some reason (bad message), blocking all subsequent Pulls 
(these would pull the same message) If the failure to send a pulled message is due to some aspect of 
the message itself (size...) we
should not keep it in the MSH queue and try to reserve it at each Pull. At some point a delivery failure
should be decided by the MSH on receiver side, based on absence of Acks. That sounds like replicating some RMP feature. 
</JD>

For #2, the problem is a confusion between reliability of the waiting 
(to-be-pulled) message and reliably getting the next message.  If the Pull 
request / response pair is "reliable" (including caching of the response) 
at the RMP level, additional acknowledgements of the pulled messages become 
success signals for the receiver MSH rather than RMP signals. 

<JD> agree. These acks would not really control the resending of pulled message (RMP-level).
As I see it, these acks have unique function of preventing the receiving RMP from notifying the receiving MSH
of delivery failure (of pulled message). But more on this later. </JD>

 Put another 
way, the receiver RMP has to do something special with the pulled messages 
and their acknowledgements because it cannot resend the messages: An 
acknowledgement in this scenario may end caching of the response but does 
not relate to any "retry loop" in that RMP.  
(Note: Ending caching might  result in problems for later duplicate Pull requests.)

<JD> We can't really do cache-ending on criteria other than Expirationtime, because we still have to accommodate 
the current behavior as required by WS_Reliability 1.1:
- Caching is currently required for any SOAP Response, when duplicate elimination
is used for the Request, regardless of whether the SOAP Request is a Pull or a regular message (RMP can't know!).  
- So we already have a (RMP-level) resending mechanism for responses that is driven by duplicates of the Request
(either a Pull or a business message), though that is not in any way controlled by an Ack for responses... 
- But with duplicate elimination on sender RMP, all this would remain invisible to the MSH layer.
So I do not see "Acks having a cache-ending semantics" a necessity even with your scenario.
</JD>

I am not sure the following is that different from the scenarios Jeff and 
Jacques have drawn but wanted to come at it from another angle.  (I suspect 
the main difference is making the R MSH actions a bit more explicit.)  When 
laying out the various systems and drawing the message flow between them, 
the details for the two scenarios look something like:

1.	app	S MSH	S RMP		R RMP	R MSH	app (consumer)
						   |<-- Reliable msg(s)
	? -->	Pull -->Pull -->	Pull --> peek queue
	Rel. msg <-- Pull resp. / Rel. msg <--  <--|
	|--> RMP acknowledgement --> pop queue
						(MSH removes ack'd msg)
						 --> signal success -->

Note that until the RM Reply is received the MSH-maintained queue will have 
the same top message and the Pull request will therefore be idempotent.

<JD> I believe this idempotent behavior comes with a cost, as I mentioned in my first insert.

While it may not be clear from the cramped diagram above, the requested 
reliable message (business payload) is taken out of the Pull response by 
the Sending MSH.  The application that requests the next reliable message 
(indicated by a "?" above) never sees Pull requests or responses.  If that 
were not true (for either scenario), it would not be clear what the RMP or 
MSH is providing the two applications.

<JD> Understood, the MSH makes the use of Pull transparent to its consumer/producer apps.

The only change in behaviour for the RMP is that the acknowledgements serve 
as signals for the MSH rather than ending a retry loop at its level.  

<JD> Two things in what you said above:
(1)"acknowledgements serve  as signals for the MSH ": That is indeed an addition that your scenario requires, 
although could be considered as releavant to implementation (nothing prevents an RMP to notify more than just delivery failure).
But note that the reliability method I favor (RMP-reliant) does not require this change: failure deliveries to MSH
are sufficient.
(2) "remove the retry-ending semantics".
I think we don't actually need to do anything about this in the present case:
(let us note that the spec never explicitly require that Acks have a semantics of stopping resending.
So we could take this as implementation decision, like other details of resending mechanism )
But I believe we don't even have to worry about this: the right behavior (automatic stopping of resending pulled message)
will be obtained today if we use guaranteed delivery for Pull: 
the same agent (sender RMP) who sends this Ack also controls teh resending of the response since this is driven 
by the resending of the (reliable) Pull. 
In other words, I do not see that we face a situation where the sending RMP sends an Ack 
for a pulled message, and this pulled message keeps being resent. 
A conforming RMP today that is sending Pull reliably, will normally stop the resending of the Pull
when the pulled message is obtained successfully. Some time after that's done, it will send the Ack for the pulled message,
and the rsponse will stop being resent not because of the Ack but because of what caused the sending of this Ack (i.e. the
stop of Pull resending).
There could still be some freak cases of a Pull duplicate coming from nowhere, and that indeed would cause
resending of the response by receiver RMP (unless it expires). But that is not such an issue - sending RMP is 
eliminating duplicates. Same issue for regular business request-responses. </JD>

That 
is, the Pull response passes through the RMP without invoking its reliable 
machinery.  All that it does is remember the message identifier so the 
later acknowledgements are not treated as errors.  Scenario #2 requires an 
almost identical change in receiver RMP behaviour, plus more.

2.	app	S MSH	S RMP		R RMP	R MSH	app (consumer)
						   |<-- Reliable msg(s)
	? -->	Pull -->Pull -->	Pull --> pop queue
	Rel. msg <-- Pull resp. / Rel msg <--	<--|
	|--> RMP acknowledgement --> signal success -->

Failures in this case result in the Pull request being treated as a 
duplicate and the Receiver RMP must return the previous response.  That is, 
the Pull request is not idempotent and the Receiver RMP must ensure the 
Receiver MSH does not see duplicates.  The Receiver RMP is also responsible 
for caching the Pull response.

As mentioned above, the Receiver RMP must also remember the message 
identifier of the contained reliable message.  The first (response caching) 
is a requirement for something currently optional in the WS-Reliability 
specification.  The second (special casing the content of Pull responses 
and their acknowledgements) is entirely new behaviour.

<JD> I hope my previous explanation shows no new behavior is needed...

  ---

Overall, the two scenarios look somewhat similar on the wire as I 
diagrammed them above.  This is primarily because I left out the WS-R 
headers in the scenario #2 Pull request and ignored the RM Reply to that 
reliable request.  In most cases, however, this information will result in 
a relatively minor expansion of the messages rather than additional 
messages (the Pull response is sent using the RM Response reply pattern).

The primary differences between the two scenarios seems to be: (a) Where 
the Pull responses are cached. (b) How much an off-the-shelf RMP must be 
changed to implement the new protocol.

On (a), I have imagined a consistent Receiver MSH / application interface 
that includes a mechanism for that application to push new outbound 
reliable messages into some queue, waiting for the next Pull request.  This 
is somewhat artificial but certainly makes it seem like the MSH queue is 
necessary and could be used as demonstrated in scenario #1.  In any case, 
scenario #1 makes the MSH consistently responsible for these Pull responses 
and scenario #2 turns responsibility over to the RMP after the first 
relevant Pull request.

On (b), I believe the RMP implementation must be changed more for scenario 
#2.  This is due to that scenario's non-standard response caching requirement.

Just a bit more below.

thanx,
	doug

[1] I could care less how either the RMP or the MSH are implemented but 
queues have well-understood and simple external semantics that are visible 
in their external interface. 

<JD> understood. 

 For this scenario, I suggest we use this 
metaphor regardless of the perceived constraint on implementation.  That 
constraint is no more binding than the "caching" metaphor we imagine is 
involved in persisting reliable responses in the receiver RMP.

On 06-Jan-05 15:53, Jacques Durand wrote:
> Jeff:
>  
> Your Scenario #1  (Non-reliable Pull Request)
> is not realistic as you draft it, because you assume an RMP behavior 
> that is quite "new" w/r to WS-Reliability spec:
> namely, that the RMP will be able to resend the cached pulled message as 
> a sync response of an entirely new Pull request.
> That would assume that  the RMP has some knowledge of the Pull 
> semantics, and knows somehow that all "Pulls" for this party ID  are 
> equivalent, so that it can do the resending as a response of any of them.
>  
> The other radical alternative for Non-reliable Pull Request that could 
> be taken (but which has many problems)  can be summarized as:
>  
> "Because it is a Pull signal, we can redo it (MSH level) as many times 
> as we want, when we fail to get
> a pulled message. So no need to involve any RMP-level reliability."
>  
> Problems are:
>  
> - that only works if the Pull is idempotent, e.g. you need a flag saying 
> "I want to re-pull the same message as previous pull"
> that complicates things.

Complications are in the eye of the beholder.  If, as I do, you view a MSH 
level queue as necessary in any case (because the application connected to 
the receiver of the Pull request submits new outbound messages according to 
its own timing) and do not see the need for a new flag, the major 
complications arise when making larger modifications to your RMP 
implementation (for scenario #2).

> - some caching is assumed on MSH side so that pulled messages are still 
> available for resending until they are acknowledged somehow. That seems 
> to require a new persistence mechansim at MSH level that would not be 
> necessary if we rely just on RMP presistence.

As above, I disagree with this assertion.

> - that needs be governed by an MSH-level Ack (which we precisely wanted 
> to avoid lately).

I am not sure what we have been trying to avoid or why this would be at the 
MSH level.

> Trying to use the "next" Pull as an Ack is very tricky: that assumes a 
> complete serialization of Pull+response pairs, which I think is very 
> restrictive. We should allow for implementations sending batches of Pull 
> without waiting for the response between each.  That is quite possible 
> when using concurrently several HTTP connections, and even over a single 
> HTTP connection (pipelining).

What is the requirement for multiple Pull request / response pairs at once? 
  I agree that scenario #2 might support it but have not heard a request 
for this feature.

> - unless we try to avoid situations where the same ebMS message can be 
> successfully pulled twice (whcih seems hard to guarantee) we'll need to 
> do duplicate elimination at MSH level (based on ebMS ID).

This is necessary only in scenario #2.

> Jacques

...

Follow-Ups:
- Re: [ebxml-msg] an assessment of the reliability of pulled messag es
  - From: Doug Bunting <Doug.Bunting@Sun.COM>