ebxml-msg message

Subject: RE: More on message bundling

From: "Jacques R. Durand" <JDurand@us.fujitsu.com>
To: <ebxml-msg@lists.oasis-open.org>
Date: Wed, 2 Sep 2009 17:10:53 -0700

Comments inline:

-jacques

From: Pim van der Eijk [mailto:pvde@sonnenglanz.net]
Sent: Wednesday, July 01, 2009 10:58 AM
To: Jacques R. Durand; ebxml-msg@lists.oasis-open.org
Subject: More on message bundling

The latest Part 2 draft already has a good start on bundling.

Here are some more ideas, building on that.

A relatively simple way of controlling bundling is to view bundles as combinations of a single "primary" user message (the driver) and other "secondary" user messages that piggy back onto that message. The "primary" user message determines the P-mode settings applied by the MSH to the entire bundle.

<JD> So I guess some implicit compatibility rules are needed, when bundling from different PModes, e.g:

- it is OK to put a message from a PMode without RM, into a bundle with primary u.message under RM.

- it is OK to sign the entire SOAP header according to the PMode of a primary u.message, even if secondary u.messages did not require security.

- however, Receipts are only on a per-u.message basis and should remain so. So if NRR is required for u.m. 1,2,3 but not for u.m. 4,5,6 in the same bundle, then a single signature may cover it all (e.g. all payloads are is the SOAP body), but there should be only 3 Receipts in response (for 1,2,3) each one of them will use however the same digest(s) in its nonRepudiationInformation element (digest may cover all u.m. 1,2,3,4,5,6).

Assumption: in a bundle, the primary user message is the first message unit and the first message unit can be assumed to be the primary user message.

Since user messages are submitted separately and usually not at the exact same time, the MSH should temporarily store submitted user messages to allow combining them later into bundles. The MSH may already compute the eb3:UserMessage XML header including the eb3:PayloadInfo references to message parts. It may also already perform some CPU intensive ebMS3/AS4 functions such as compressing payloads and perhaps already compute secure hashes, as an optimization to speed up security operations in the security module. The MSH cannot yet assemble the ebMS3 SOAP header and does not perform any security operations such as encryption, since that depends on the Pmode of the primary message that a user message is going to be bundled with, nor does it apply any reliability headers for the same reason.

<JD> By definition, the message (the bundle) must be complete on the sender side before applying any RM and Security. Even if the primary u.m. is known from the start, I would assume that so signing is done until the message is complete. A signature is likely to cover all the SOAP Body (and contained payloads), and several SOAP headers. All u.m. headers will be signed together, very likely.

Bundling therefore would be controlled controlled by Pmodes. Suggested new Pmode parameters:

Pmode[].bundling.standalone: values "always", "optional", "never"

Indicates whether the user message is to be sent as a standalone message without piggy-backed user messages
("always"),

<JD> the name of this parameter is confusing (standalone is conflicting with "bundling"). We could have:

PMode[].bundling.policy = {always. optional, never}

with opposite meanings as yours.

may be sent separately but may also be bundled with other user messages ("optional") or is never sent as a standalone message ("never"). An ebMS 3 implementation that strictly conforms to part 1 of the ebMS 3.0 Core Specification and does not support any form of bundling will only support the option "always" for this parameter. A parameter value "never" means that the message needs to be bundled with some other message to be sent at all. It cannot act as a primary user message.

Pmode[].bundling.bundleswith

<JD> we could have two ways so specify bundling compatibility:

(1) by explicit listing (sane as your "bundlewith" above):

PMode[].bundling.compatibility.pmodelist (e.g. = "PMode['abc'][1],PMode['def'][2],PMode['ghi'][1]" where 'abc', 'def', 'ghi' are IDs of PModes)

(2) by correlation rule (same technique used as for PMode[1].Reliability.Correlation ):

PMode[].bundling.compatibility.class (e.g. = "eb:UserMessage/eb:CollaborationInfo/eb:Service, eb:UserMessage/eb:CollaborationInfo/eb:Action")

would mean that only messages units with same Service/Action as the primary unit, can be bundled.

NOTE: A more complete expression woudl make explicit reference to the "primary" unit using the reserved variable $primary:

PMode[].bundling.compatibility.class = "eb:UserMessage/eb:CollaborationInfo/eb:Service = $primary/eb:CollaborationInfo/eb:Service and eb:UserMessage/eb:CollaborationInfo/eb:Action = $primary/eb:CollaborationInfo/eb:Action",

meaning every unit for which this condition is true, belongs to the same bundling compatibility class as the primary.)

If both (1) and (2) are present, the bundling must satisfy (1) AND (2).

This parameter is to be specified in case the "standalone" parameter has a value other than "never". Its value is a list of Pmodes. The semantics of the parameter is that a user message may (if standalone is "optional") or must (if standalone is "never") be bundled with other user messages associated with the Pmodes identified in the list. The parameter may list other types of Pmodes that describe messages that the user message can be bundled with. In general these Pmodes SHOULD be destined to the same target MSH (even in a multihop network) and SHOULD satisfy the same or stricter QoS requirements, but the responsibility for making sure this is true is with the Pmode configuration administrator.

<JD> Not sure there is any value in "SHOULD be destined to the same target MSH ". In a multihop context these PModes may not know the actual destination endpoint address, and all we can ask for, is for PMode designers to know whether the logical destination of 2 PModes is the same or not. In a borderline case, a PMode could be designed leaving blank the To/PartyId, e.g. a PMode governing messages sent for a particular Service/Action regardless of the actual destination party, which could vary and be served by a different MSH from another. The sender would pass the To/PartyId when submitting the message, which would determine the routing. So in that case, we can't even bundle any message from the same PMode: we need in addition a compatibility rule (added as bundling parameter in the above generic Service/Action PMode):

PMode[].bundling.compatibility.class = "fn:contains('party100;party102;party104;party105;party106', xsd:string( eb:UserMessage/eb:PartyInfo/eb:To/eb:PartyId) and fn:contains('party100;party102;party104;party105;party106', xsd:string( $primary/eb:PartyInfo/eb:To/eb:PartyId) )

Where it is known that all the above party IDs are served by the same MSH.

In case there is another set of parties that can be bundled together (for a different MSH endpoint), we would have an additional instance of PMode[].bundling.compatibility.class parameter specified in this PMode.

Typically, the list value of the "bundleswith" parameter for Pmode P1 will also include that same Pmode P1, indicating that more than one user message of this type may be sent. But if a particular user message is known to be very large or to be sent rarely, the parameter could only list some Pmodes of messages that are known to be small and sent frequently.

The MSH should support configuration of the maximum duration of the time it waits for more user messages to be submitted to be added to bundles, the maximum number of user messages it bundles and/or the maximum size of the user message content (including business documents and attachments, after AS4 compression) of the bundle.

Pmode[].bundling.maxdelay
Pmode[].bundling.maxmessages
Pmode[].bundling.maxsize

<JD> instead of restricting this to the bundling case: PMode.Message.maxSize: overall, maximum size of any message in this PMode (bundled or not).

Note that the MSH is not required to wait before the maxdelay interval has expired before sending the message. It may send the message before the interval has finished. The maxdelay is a contract with the submitting business application which may have some time-to-perform to be respected.

<JD> wondering if we should have the same approach for the timing as suggested for the size: independently from bundling, this could be seen as a general guideline for any message of this PMode (a QoS contract): what is the maximum delay that an MSH should not exceed, from submission time to sending time: PMode.Message.maxDelay. (not sure Ian will like it better, though...)

If the "bundleswith" list includes multiple Pmodes, and more than one candidate waiting "primary" user message is available that a user message can be bundled with, the allocation is implementation-dependent.

If the value of the "standalone" parameter is "optional", the decision by the MSH to send the message as a standalone message or to bundle it with another user message is implementation-dependent.

If the value of the "standalone" is "never", and no primary message is available to bundle the message with within a "maxdelay" duration, the MSH MUST generate an ebms:xxxx BundlingError sender error.

<JD> So, if we have: PMode[].bundling.policy = {always, optional, never}, then no error should be sent back: the "always" (your "never") only means that the MSH must do its best effort to bundle. If it cannot bundle, then it just send the message alone.

Bundling and Pull

The description above assumes that submitted user messages are queued temporarily between ebMS processing and between further security and reliability processing. This means that a submitted user message may not yet be available for pulling, leading to unintuitive results. This could be optimized in advanced implementations as some kind of on-demand bundling, or on-demand MSH completion, triggered by an incoming Pull Request. Left to implementations.

<JD> Intuitively, as soon as you send a PullRequest, you want to get an immediate response, bundling or not... So waiting is not an option for sure. The PullRequest could take the initiative of requesting the bundling. I see three ways to do this:

(a) on-the-wire-indication: several PullRequests units could be bundled together: as many as the maximum u.m. you may expect bundled in response. Thats easy but not very elegant.

(b) some additional attribute in PullRequest indicate the max bundling you can handle. Problem is , schema needs be extended for this.

Bundling and Sync

The above is really about requests and responses for both Push and Pull binding, but not for response user messages to Two Way / Sync MEP bindings, which must be transmitted on the HTTP back channel without delay. The synchronous responses to each of the user messages in the bundle should be sent without waiting "maxdelay" for more messages to be bundled with them, to prevent HTTP transport timeouts.

<JD> But a Two-way / Sync still depends on the Responding application for not causing HTTP timeout, bundling or not. A policy could be that when a timeout is near and not all responses have been obtained yet, an implementation has the right to send back a bundle with whatever responses it has, along with an (new) error like "timeout constraint" to excuse the missing responses.

Minor comment: the current bundling chapter writes:

"However, when InOrder delivery assurance is required for User message units, bundling SHOULD NOT be in effect for these messages as there is no trace of the submission order on the sending side, and the order of message units under eb:Messaging is not significant."

I think we could assume that message units are to be processed in document order. There is still no guarantee that messages are assigned to the same bundle and/or that different bundles would arrive in-order though, so the warning against bundling is valid. Perhaps an advanced MSH API could allow bundling or ordering to be controlled. Out of scope for spec.

Bundling and multihop

Assumption: intermediaries (such as the "transparent" intermediaries) may not bundle or unbundle messages. The MSH may not know which Pmodes target the same (logical) destination (as this requires knowledge of the routing function). So what can and cannot be bundled cannot be predicted based on Pmode parameters other than the proposed "bundleswith".

<JD> But we could still have a default: any two messages from the same PMode leg can be bundled together, except when the above compatibility parameters are present, in which case they take over.

It seems acceptable to assume that in a multi-hop context any routing function should be able to look at the first user message in a bundle to determine the destination. So the "first unit determines .." principle would apply to both bundling and routing.

<JD> The "primary" unit sets the tone for bundling and routing. Now we may want to control whcih units can be "primary" and which ones cannot. E.g. the ICloud is configured with routing functions that only know about some "lead PartyId" for each MSH endpoint. There is always the resort of adding the routingInput ref parameter in that case. But if a sending gateway always get enough messages for this lead partyId, it should always use some of these as primary.

<JD> we may need these additional parameters:

- PMode[1].Bundling.Receipts: yes = bundling of Receipts requested for messages bundled together.

- PMode[1].Bundling.Errors: yes = bundling of Errors requested for messages bundled together.

Pim