[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [ebxml-msg] an assessment of the reliability of pulled messages
Doug:
It took me awhile, but here is some comments on your scenario comparisons, attached.
Discounting perceived bad things about the "reliable Pull", and unveiling some issues with the "MSH idempotent method"...
Hamid has suggested to me a 3rd alternative that I have not yet looked at much,
but is about introducing a status request that returns a list of ebMS IDs to be pulled
and then only Pull(ID) would be done.
Jacques
-----Original Message-----
From: Doug Bunting [mailto:Doug.Bunting@Sun.COM]
Sent: Friday, January 07, 2005 11:42 AM
To: Jacques Durand
Cc: 'Jeff Turpin'; 'ebxml-msg@lists.oasis-open.org'
Subject: Re: [ebxml-msg] an assessment of the reliability of pulled
messages
Jacques,
Sorry for the length of this. I hope the text is made clearer through the
longer explanations rather than the reverse. Please let me know what
remains muddled. Also, the diagrams below will only work when viewed with
a fixed-width font.
I believe some aspects of both scenarios as described so far are unrealistic.
For #1, it would be more realistic to imagine a queue[1] maintained at the
MSH level. The RMP would not learn any new semantics as a result of adding
the Pull to the overall situation. It is the MSH, and not the RMP, which
would implement semantics making the Pull request idempotent. This does
not require a new flag or MSH-level acknowledgement of any sort; it would
be much simpler for messages to remain in the MSH queue until the
underlying RMP indicates that reliable message had been received
successfully. (The RMP-level acknowledgement could certainly be bundled
with a later Pull request as an optimization but that is a minor detail.)
Other ways to make the Pull request idempotent are pretty easy to imagine
as well.
For #2, the problem is a confusion between reliability of the waiting
(to-be-pulled) message and reliably getting the next message. If the Pull
request / response pair is "reliable" (including caching of the response)
at the RMP level, additional acknowledgements of the pulled messages become
success signals for the receiver MSH rather than RMP signals. Put another
way, the receiver RMP has to do something special with the pulled messages
and their acknowledgements because it cannot resend the messages: An
acknowledgement in this scenario may end caching of the response but does
not relate to any "retry loop" in that RMP. (Note: Ending caching might
result in problems for later duplicate Pull requests.)
I am not sure the following is that different from the scenarios Jeff and
Jacques have drawn but wanted to come at it from another angle. (I suspect
the main difference is making the R MSH actions a bit more explicit.) When
laying out the various systems and drawing the message flow between them,
the details for the two scenarios look something like:
1. app S MSH S RMP R RMP R MSH app (consumer)
|<-- Reliable msg(s)
? --> Pull -->Pull --> Pull --> peek queue
Rel. msg <-- Pull resp. / Rel. msg <-- <--|
|--> RMP acknowledgement --> pop queue
(MSH removes ack'd msg)
--> signal success -->
Note that until the RM Reply is received the MSH-maintained queue will have
the same top message and the Pull request will therefore be idempotent.
While it may not be clear from the cramped diagram above, the requested
reliable message (business payload) is taken out of the Pull response by
the Sending MSH. The application that requests the next reliable message
(indicated by a "?" above) never sees Pull requests or responses. If that
were not true (for either scenario), it would not be clear what the RMP or
MSH is providing the two applications.
The only change in behaviour for the RMP is that the acknowledgements serve
as signals for the MSH rather than ending a retry loop at its level. That
is, the Pull response passes through the RMP without invoking its reliable
machinery. All that it does is remember the message identifier so the
later acknowledgements are not treated as errors. Scenario #2 requires an
almost identical change in receiver RMP behaviour, plus more.
2. app S MSH S RMP R RMP R MSH app (consumer)
|<-- Reliable msg(s)
? --> Pull -->Pull --> Pull --> pop queue
Rel. msg <-- Pull resp. / Rel msg <-- <--|
|--> RMP acknowledgement --> signal success -->
Failures in this case result in the Pull request being treated as a
duplicate and the Receiver RMP must return the previous response. That is,
the Pull request is not idempotent and the Receiver RMP must ensure the
Receiver MSH does not see duplicates. The Receiver RMP is also responsible
for caching the Pull response.
As mentioned above, the Receiver RMP must also remember the message
identifier of the contained reliable message. The first (response caching)
is a requirement for something currently optional in the WS-Reliability
specification. The second (special casing the content of Pull responses
and their acknowledgements) is entirely new behaviour.
---
Overall, the two scenarios look somewhat similar on the wire as I
diagrammed them above. This is primarily because I left out the WS-R
headers in the scenario #2 Pull request and ignored the RM Reply to that
reliable request. In most cases, however, this information will result in
a relatively minor expansion of the messages rather than additional
messages (the Pull response is sent using the RM Response reply pattern).
The primary differences between the two scenarios seems to be: (a) Where
the Pull responses are cached. (b) How much an off-the-shelf RMP must be
changed to implement the new protocol.
On (a), I have imagined a consistent Receiver MSH / application interface
that includes a mechanism for that application to push new outbound
reliable messages into some queue, waiting for the next Pull request. This
is somewhat artificial but certainly makes it seem like the MSH queue is
necessary and could be used as demonstrated in scenario #1. In any case,
scenario #1 makes the MSH consistently responsible for these Pull responses
and scenario #2 turns responsibility over to the RMP after the first
relevant Pull request.
On (b), I believe the RMP implementation must be changed more for scenario
#2. This is due to that scenario's non-standard response caching requirement.
Just a bit more below.
thanx,
doug
[1] I could care less how either the RMP or the MSH are implemented but
queues have well-understood and simple external semantics that are visible
in their external interface. For this scenario, I suggest we use this
metaphor regardless of the perceived constraint on implementation. That
constraint is no more binding than the "caching" metaphor we imagine is
involved in persisting reliable responses in the receiver RMP.
On 06-Jan-05 15:53, Jacques Durand wrote:
> Jeff:
>
> Your Scenario #1 (Non-reliable Pull Request)
> is not realistic as you draft it, because you assume an RMP behavior
> that is quite "new" w/r to WS-Reliability spec:
> namely, that the RMP will be able to resend the cached pulled message as
> a sync response of an entirely new Pull request.
> That would assume that the RMP has some knowledge of the Pull
> semantics, and knows somehow that all "Pulls" for this party ID are
> equivalent, so that it can do the resending as a response of any of them.
>
> The other radical alternative for Non-reliable Pull Request that could
> be taken (but which has many problems) can be summarized as:
>
> "Because it is a Pull signal, we can redo it (MSH level) as many times
> as we want, when we fail to get
> a pulled message. So no need to involve any RMP-level reliability."
>
> Problems are:
>
> - that only works if the Pull is idempotent, e.g. you need a flag saying
> "I want to re-pull the same message as previous pull"
> that complicates things.
Complications are in the eye of the beholder. If, as I do, you view a MSH
level queue as necessary in any case (because the application connected to
the receiver of the Pull request submits new outbound messages according to
its own timing) and do not see the need for a new flag, the major
complications arise when making larger modifications to your RMP
implementation (for scenario #2).
> - some caching is assumed on MSH side so that pulled messages are still
> available for resending until they are acknowledged somehow. That seems
> to require a new persistence mechansim at MSH level that would not be
> necessary if we rely just on RMP presistence.
As above, I disagree with this assertion.
> - that needs be governed by an MSH-level Ack (which we precisely wanted
> to avoid lately).
I am not sure what we have been trying to avoid or why this would be at the
MSH level.
> Trying to use the "next" Pull as an Ack is very tricky: that assumes a
> complete serialization of Pull+response pairs, which I think is very
> restrictive. We should allow for implementations sending batches of Pull
> without waiting for the response between each. That is quite possible
> when using concurrently several HTTP connections, and even over a single
> HTTP connection (pipelining).
What is the requirement for multiple Pull request / response pairs at once?
I agree that scenario #2 might support it but have not heard a request
for this feature.
> - unless we try to avoid situations where the same ebMS message can be
> successfully pulled twice (whcih seems hard to guarantee) we'll need to
> do duplicate elimination at MSH level (based on ebMS ID).
This is necessary only in scenario #2.
> Jacques
...
From: Doug Bunting [Doug.Bunting@Sun.COM] Sent: Friday, January 07, 2005 11:42 AM To: Jacques Durand Cc: 'Jeff Turpin'; 'ebxml-msg@lists.oasis-open.org' Subject: Re: [ebxml-msg] an assessment of the reliability of pulled messages Jacques, Sorry for the length of this. I hope the text is made clearer through the longer explanations rather than the reverse. Please let me know what remains muddled. Also, the diagrams below will only work when viewed with a fixed-width font. I believe some aspects of both scenarios as described so far are unrealistic. For #1, it would be more realistic to imagine a queue[1] maintained at the MSH level. The RMP would not learn any new semantics as a result of adding the Pull to the overall situation. It is the MSH, and not the RMP, which would implement semantics making the Pull request idempotent. This does not require a new flag or MSH-level acknowledgement of any sort; it would be much simpler for messages to remain in the MSH queue until the underlying RMP indicates that reliable message had been received successfully. (The RMP-level acknowledgement could certainly be bundled with a later Pull request as an optimization but that is a minor detail.) Other ways to make the Pull request idempotent are pretty easy to imagine as well. <JD> I can see that working, but not without serious complications or restrictions at other places. By keeping a pulled message in Receiver MSH until it gets acknowledged (at MSH level) and enforcing idempotence of MSH pull operation, you assume that this message will be returned as response to any subsequent Pull signal for this partyID, until it is finally acknowledged. This causes the following issues: - buggy scenario: the Receiver MSH has not received yet an Ack for a Pull, so resends the same message as response to the next Pull, yet the first response was well received, just the Ack was "on the way" (as a separate Callback, could be sent before the 2nd Pull but arrive after...) So if duplicate elimination is required we must check it at MSH level on sending side (can't rely on RMP). - scalability issue: that does not allow for a "pipelined" mode where several Pull messages are sent to receiver without waiting for their responses (that can be done by pooling several HTTP connections, or even using just one, since HTTP allows for this kind of pipelining, where all is required is that HTTP responses are sent back in same order as HTTP requests). This pipelining is not possible because the MSH would always serve the same pulled message until an Ack comes back much later. I think that can be a serious scalability issue: an HTTP rountrip that includes MSH layer may take a while, relatively speaking, and a serialization may slow things considerably in some MSHs. - Also, an HTTP Pull request may timeout for some reason (bad message), blocking all subsequent Pulls (these would pull the same message) If the failure to send a pulled message is due to some aspect of the message itself (size...) we should not keep it in the MSH queue and try to reserve it at each Pull. At some point a delivery failure should be decided by the MSH on receiver side, based on absence of Acks. That sounds like replicating some RMP feature. </JD> For #2, the problem is a confusion between reliability of the waiting (to-be-pulled) message and reliably getting the next message. If the Pull request / response pair is "reliable" (including caching of the response) at the RMP level, additional acknowledgements of the pulled messages become success signals for the receiver MSH rather than RMP signals. <JD> agree. These acks would not really control the resending of pulled message (RMP-level). As I see it, these acks have unique function of preventing the receiving RMP from notifying the receiving MSH of delivery failure (of pulled message). But more on this later. </JD> Put another way, the receiver RMP has to do something special with the pulled messages and their acknowledgements because it cannot resend the messages: An acknowledgement in this scenario may end caching of the response but does not relate to any "retry loop" in that RMP. (Note: Ending caching might result in problems for later duplicate Pull requests.) <JD> We can't really do cache-ending on criteria other than Expirationtime, because we still have to accommodate the current behavior as required by WS_Reliability 1.1: - Caching is currently required for any SOAP Response, when duplicate elimination is used for the Request, regardless of whether the SOAP Request is a Pull or a regular message (RMP can't know!). - So we already have a (RMP-level) resending mechanism for responses that is driven by duplicates of the Request (either a Pull or a business message), though that is not in any way controlled by an Ack for responses... - But with duplicate elimination on sender RMP, all this would remain invisible to the MSH layer. So I do not see "Acks having a cache-ending semantics" a necessity even with your scenario. </JD> I am not sure the following is that different from the scenarios Jeff and Jacques have drawn but wanted to come at it from another angle. (I suspect the main difference is making the R MSH actions a bit more explicit.) When laying out the various systems and drawing the message flow between them, the details for the two scenarios look something like: 1. app S MSH S RMP R RMP R MSH app (consumer) |<-- Reliable msg(s) ? --> Pull -->Pull --> Pull --> peek queue Rel. msg <-- Pull resp. / Rel. msg <-- <--| |--> RMP acknowledgement --> pop queue (MSH removes ack'd msg) --> signal success --> Note that until the RM Reply is received the MSH-maintained queue will have the same top message and the Pull request will therefore be idempotent. <JD> I believe this idempotent behavior comes with a cost, as I mentioned in my first insert. While it may not be clear from the cramped diagram above, the requested reliable message (business payload) is taken out of the Pull response by the Sending MSH. The application that requests the next reliable message (indicated by a "?" above) never sees Pull requests or responses. If that were not true (for either scenario), it would not be clear what the RMP or MSH is providing the two applications. <JD> Understood, the MSH makes the use of Pull transparent to its consumer/producer apps. The only change in behaviour for the RMP is that the acknowledgements serve as signals for the MSH rather than ending a retry loop at its level. <JD> Two things in what you said above: (1)"acknowledgements serve as signals for the MSH ": That is indeed an addition that your scenario requires, although could be considered as releavant to implementation (nothing prevents an RMP to notify more than just delivery failure). But note that the reliability method I favor (RMP-reliant) does not require this change: failure deliveries to MSH are sufficient. (2) "remove the retry-ending semantics". I think we don't actually need to do anything about this in the present case: (let us note that the spec never explicitly require that Acks have a semantics of stopping resending. So we could take this as implementation decision, like other details of resending mechanism ) But I believe we don't even have to worry about this: the right behavior (automatic stopping of resending pulled message) will be obtained today if we use guaranteed delivery for Pull: the same agent (sender RMP) who sends this Ack also controls teh resending of the response since this is driven by the resending of the (reliable) Pull. In other words, I do not see that we face a situation where the sending RMP sends an Ack for a pulled message, and this pulled message keeps being resent. A conforming RMP today that is sending Pull reliably, will normally stop the resending of the Pull when the pulled message is obtained successfully. Some time after that's done, it will send the Ack for the pulled message, and the rsponse will stop being resent not because of the Ack but because of what caused the sending of this Ack (i.e. the stop of Pull resending). There could still be some freak cases of a Pull duplicate coming from nowhere, and that indeed would cause resending of the response by receiver RMP (unless it expires). But that is not such an issue - sending RMP is eliminating duplicates. Same issue for regular business request-responses. </JD> That is, the Pull response passes through the RMP without invoking its reliable machinery. All that it does is remember the message identifier so the later acknowledgements are not treated as errors. Scenario #2 requires an almost identical change in receiver RMP behaviour, plus more. 2. app S MSH S RMP R RMP R MSH app (consumer) |<-- Reliable msg(s) ? --> Pull -->Pull --> Pull --> pop queue Rel. msg <-- Pull resp. / Rel msg <-- <--| |--> RMP acknowledgement --> signal success --> Failures in this case result in the Pull request being treated as a duplicate and the Receiver RMP must return the previous response. That is, the Pull request is not idempotent and the Receiver RMP must ensure the Receiver MSH does not see duplicates. The Receiver RMP is also responsible for caching the Pull response. As mentioned above, the Receiver RMP must also remember the message identifier of the contained reliable message. The first (response caching) is a requirement for something currently optional in the WS-Reliability specification. The second (special casing the content of Pull responses and their acknowledgements) is entirely new behaviour. <JD> I hope my previous explanation shows no new behavior is needed... --- Overall, the two scenarios look somewhat similar on the wire as I diagrammed them above. This is primarily because I left out the WS-R headers in the scenario #2 Pull request and ignored the RM Reply to that reliable request. In most cases, however, this information will result in a relatively minor expansion of the messages rather than additional messages (the Pull response is sent using the RM Response reply pattern). The primary differences between the two scenarios seems to be: (a) Where the Pull responses are cached. (b) How much an off-the-shelf RMP must be changed to implement the new protocol. On (a), I have imagined a consistent Receiver MSH / application interface that includes a mechanism for that application to push new outbound reliable messages into some queue, waiting for the next Pull request. This is somewhat artificial but certainly makes it seem like the MSH queue is necessary and could be used as demonstrated in scenario #1. In any case, scenario #1 makes the MSH consistently responsible for these Pull responses and scenario #2 turns responsibility over to the RMP after the first relevant Pull request. On (b), I believe the RMP implementation must be changed more for scenario #2. This is due to that scenario's non-standard response caching requirement. Just a bit more below. thanx, doug [1] I could care less how either the RMP or the MSH are implemented but queues have well-understood and simple external semantics that are visible in their external interface. <JD> understood. For this scenario, I suggest we use this metaphor regardless of the perceived constraint on implementation. That constraint is no more binding than the "caching" metaphor we imagine is involved in persisting reliable responses in the receiver RMP. On 06-Jan-05 15:53, Jacques Durand wrote: > Jeff: > > Your Scenario #1 (Non-reliable Pull Request) > is not realistic as you draft it, because you assume an RMP behavior > that is quite "new" w/r to WS-Reliability spec: > namely, that the RMP will be able to resend the cached pulled message as > a sync response of an entirely new Pull request. > That would assume that the RMP has some knowledge of the Pull > semantics, and knows somehow that all "Pulls" for this party ID are > equivalent, so that it can do the resending as a response of any of them. > > The other radical alternative for Non-reliable Pull Request that could > be taken (but which has many problems) can be summarized as: > > "Because it is a Pull signal, we can redo it (MSH level) as many times > as we want, when we fail to get > a pulled message. So no need to involve any RMP-level reliability." > > Problems are: > > - that only works if the Pull is idempotent, e.g. you need a flag saying > "I want to re-pull the same message as previous pull" > that complicates things. Complications are in the eye of the beholder. If, as I do, you view a MSH level queue as necessary in any case (because the application connected to the receiver of the Pull request submits new outbound messages according to its own timing) and do not see the need for a new flag, the major complications arise when making larger modifications to your RMP implementation (for scenario #2). > - some caching is assumed on MSH side so that pulled messages are still > available for resending until they are acknowledged somehow. That seems > to require a new persistence mechansim at MSH level that would not be > necessary if we rely just on RMP presistence. As above, I disagree with this assertion. > - that needs be governed by an MSH-level Ack (which we precisely wanted > to avoid lately). I am not sure what we have been trying to avoid or why this would be at the MSH level. > Trying to use the "next" Pull as an Ack is very tricky: that assumes a > complete serialization of Pull+response pairs, which I think is very > restrictive. We should allow for implementations sending batches of Pull > without waiting for the response between each. That is quite possible > when using concurrently several HTTP connections, and even over a single > HTTP connection (pipelining). What is the requirement for multiple Pull request / response pairs at once? I agree that scenario #2 might support it but have not heard a request for this feature. > - unless we try to avoid situations where the same ebMS message can be > successfully pulled twice (whcih seems hard to guarantee) we'll need to > do duplicate elimination at MSH level (based on ebMS ID). This is necessary only in scenario #2. > Jacques ...
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]