RE: [ebxml-msg] Groups - Transparent MEP routing proposal V0.1 (ebMS-t

I am hoping that my comments can be picked out in green.

From: Pim van der Eijk [mailto:pvde@sonnenglanz.net]
Sent: Tuesday, April 08, 2008 1:10 PM
To: 'Durand, Jacques R.'; ebxml-msg@lists.oasis-open.org
Subject: RE: [ebxml-msg] Groups - Transparent MEP routing proposal V0.1 (ebMS-transparent-Multihop-MEPs-Routng.doc) uploaded

Some additional comments, see below in red. I would be very interested in other people's fresh opinions on this, I am beginning to think I have spent too much time in multi-hop projects and may be brain washed ...

From: Durand, Jacques R. [mailto:JDurand@us.fujitsu.com]
Sent: 03 April 2008 07:49
To: Pim van der Eijk; ebxml-msg@lists.oasis-open.org
Subject: RE: [ebxml-msg] Groups - Transparent MEP routing proposal V0.1 (ebMS-transparent-Multihop-MEPs-Routng.doc) uploaded

Pim: inline

From: Pim van der Eijk [mailto:pvde@sonnenglanz.net]
Sent: Wednesday, April 02, 2008 1:35 PM
To: Durand, Jacques R.; ebxml-msg@lists.oasis-open.org
Subject: RE: [ebxml-msg] Groups - Transparent MEP routing proposal V0.1 (ebMS-transparent-Multihop-MEPs-Routng.doc) uploaded

Hello Jacques,

Here are some quick comments on this proposal (and some earlier documents):

1) What does "no additional capability besides Core ebMS V3 is required from the endpoint MSHs involved at the boundaries of the I-Cloud" mean? The proposal describes a requirement that "The PMode parameter PMode.Protocol.Address contains the Hub URL, the value of which must be extended with an HTTP Query of the form: ?pmode=<ID>". In this proposal, any support for multihop would mean the ebMS 3.0 processor must append this query to the hub URL. I don't know if any of the few existing ebMS 3.0 processors support this ability to update the hub URL today, and it may not be easy to add this behaviour to an existing SOAP stack. Or are we assuming this is configured statically by users, e.g. in the CPA? In a CPA with a dozen CanSends, each would be connected with a DeliveryChannel that references a Transport element that contains a URL that include a substring naming that same DeliveryChannel. We would be missing a generalization, which users and implementers will object to.

<JD> Our assumption is that the URL of the Hub (or Intermediary) is configuration data: e.g. it has to be represented in the PMode that describes an end-to-end exchange. We do not see any problem with our own implementation on top of Axis 2, in adding an HTTP query. Now we know that the CPA is a possible representation for the PMode. The CPA has already been profiled / extended for V3.0. It may have to be updated again for multi-hop, and with all due respect for the CPA, I believe a multihop solution(s) should be designed independently from the current CPA (ebms specs are supposed to work independently), then we can adapt/extend the CPA as needed.

This being said, I don't believe that a different DeliveryChannel has to be defined for each party or each CanSend when the same Intermediary is used as TransportReceiver/Endpoint. An identifier must exist in the CPA (or PMode) that defines a set of messages sent to the same destination with the same level of reliability. (question: how do you know which message must be associated with which RM sequence, in your solution?) This identifier is what needs be communicated to the first intermediary (or Hub) in our solution, along with CreateSequence. Whether already appended to the destination URL or intended to be appended when the CPA is interpreted. I see this a CPA binding question - probably needs be defined in a conformance profile.

OK, but the point I tried to make is that if an MSH can send messages with 20 different p-modes, it has to reference 20 different URLs, which would need to be enumerated in a CPA or equivalent configuration format. Not very elegant?

CPAs and CPPs are organized with “reusability” in mind, and therefore make use of certain presumptions about what things are likely to be reused (that is, tend to be shared for different actions within a service). Reusable “modules” are then referenced (using IDREFs) so that they do not have to be repeated. The initial TC members and ebXML participants believed that Endpoints and Transport details would form a reusable module because actions within a service would tend to use the same Endpoint (URL). Maintaining a distinct URL for each Action seemed to be a management and governance mess to many participants and TC members. De-multiplexing multiple kinds of Actions going to one URL is clearly possible because of the variety of metadata (such as the values for Action and PartyId). I think Pim shares a very widely held sentiment that using distinct URLs for distinct Endpoints for every PMode would be quite an imposition on the management of the community. Would a large community of 50,000 community partners then have 50,000 distinct URLs issued? In addition to their distinct PartyIds? Clearly this fixation on one part of the configuration information to vary is possible. It is clearly not necessary, and I believe many customers will find it undesirable and an imposition on their environment.

Technically speaking, Pim is correct in saying that a different Delivery Channel will be needed for every partner because a DeliveryChannel incorporates by reference a DocExchange module and a Transport Module. Since the disinct Endpoints cause the Transport Module to be distinct for each partner, the ID value must differ and so much the DeliveryChannel element that has an attribute whose value (an IDREF) refers to the Transport Module. Thus handling configurations for a Community becomes much more awkward because they cannot be conveniently assembled from basic reusable parts. It would still be possible in a CPA Template to substitute for the Endpoint value. But for different Actions, a different Endpoint value would be needed. So the template would need to make multiple distinct substitutions into Transport elements, and repeat those Transport elements in building up the CPA.

A regular v3.0 Core conformant specification does not have to do this and (the few ebMS 3.0 implementations that have been developed so far) no doubt don't. And if they don't, they don't support your proposal. Also, if this solution requires WS-Addressing, WS-Addressing is never mentioned in v3.0 Core and the Core spec includes example messages that don't have WS-Addressing anywhere. If it requires an eb:Routing header, that is an element that does not even exist in the v3.0 Core XSD. So I don't see how you can say that it does not require more than v3.0 Conformance.

By the way, I am assuming the p-mode ID is used as a key to retrieve information that is used in routing rules, e.g. to find out Service/Action or To/PartyId, i.e. the routing rules would not reference p-mode IDs but derived information. Appendix D of v3.0 Core mentions several dozen p-mode parameters. Are these parameter values all available for routing, or do you intend to propose a subset of "routable" parameters. If the latter, would you agree that those would be parameters like To/PartyId, Service/Action that are available in the ebMS business document header in ebMS user messages?

My concern is why these metadata values need to be repeated as parts of the URL.

If routing is based on p-mode parameters, are you saying that routing cannot reference dynamic properties (like ConversationId or Message Properties)?

That limitation would be unfortunate restriction. I think Pim is correct about this limitation.

2) This proposal is based on features of HTTP as an underlying transport protocol. Even if most ebMS implementations will use HTTP, the idea that an ebMS multihop functionality ("level 3" in my earlier terminology) has dependencies with transport protocols (two levels lower) does not seem right architecturally. If someone defined another transport protocol binding than HTTP for ebMS 3 0 (e.g. SMTP or JMS), he should be able to use our multihop solution with those other lower level transport protocols.

<JD> then let us just say that the handhake part of our solution (initiating a new RM sequence) is transport-binding specific (at least, specific to the transport used between Sending endpoint and first Intermediary). That wouldn't be the first ebMS feature the usability of which is tied to a particular transport layer (e.g. Pull mode so far is not usable with an underlying protocol that is not request-response , e.g. SMTP). It could be that we don't have a solution for all transports..

Pull mode should be possible using SMTP message stores (mailbox) as found in protocols such as IMAP or POP3, but of course it would work quite differently and POP3 and IMAP lower level could be used instead of fiddling with the HTTP request entity (body part). The ebMS 3 Pull mode is a kind of message store solution for a protocol that does not have an already defined message store extension. FTP would have worked for a FTP client out of the box because the client could have both sent and received data using FTP methods STOR and RETR using the FTP directory structure as the store.

3) Page 5 "Step 4: The Hub receives the message, and closes the HTTP connection (asynchronous case). The Hub determines where this User Message must be sent, using routing function based on ebMS header data". It would be better if the Hub does a routability check (do I have a configuration rule that tells me where to forward this message to?) before closing the connection. If it does not know how to forward the message, the Hub could then return an ebMS error directly. This means that a sender in a multihop context could receive errors from both the ultimate recipient (as in the peer to peer case) and from intermediaries.

<JD> good point. A new type of error needs be created for Intermediaries (e.g. "RoutingFailure")

4) I have been thinking of an alternative way to do what you want using this appended query. An alternative that to me seems more in the spirit of the SOAP processing model, would be to allow messages to contain more than one eb:Messaging SOAP header element. One eb:Messaging block would be targeted at ebMS intermediary nodes, identified using a separate "target" attribute. The other would be targeted at the true recipient, as in v3.0 Core. The idea is analogous to having multiple WS-Security header blocks, targeted at multiple SOAP nodes. Given our goal to converge with WS-* specifications, it seems best to leverage their type of solutions to similar problems.

The only required (compatible) update to the ebMS v3 0 schema is to add an optional "target" attribute to eb:Messaging.

The eb:Message block targeted at the intermediary could be used in various situations:

- When sending a wsrm:CreateSequence message, it would serve to provide rich business document routingheader content (To/PartyId, Service) to enable the sequence to be established with the right recipient MSH. Like the appended "?pmode=<ID>", this second header is not explicitly mentioned in v3 0 Core. But v3 0 Core is assumed to be composable with other WS-* specifications not discussed in v3 0 Core (say, WS-Addressing or WS-SecureConversation) like any well-designed WS spec without this having to be described in the v3 0 spec. It may be an acceptable price to pay, even for Endpoints. In a Web Browser you have to configure something in your client to use an HTTP proxy too, after all.

- But the Sending endpoint could use the appended "?pmode=<ID>" trick to pass information to the first intermediary that would allow to create the second ebMS header block as in your proposal, if it somehow cannot be modified to create this second block itself.

<JD> When HTTP is used, the HTTP query trick has the advantage of not requiring more than Core V3 Conformance - i.e. no ebms-level additional capability, like this piggybacking of an eb:Messaging header on a CreateSequence message. And even if we do so, we could probably reuse regular eb:Messaging headers (same Core V3 schema) without creating a "target" attribute, if we rely instead on SOAP processing features - i.e. use instead the standard "role" attribute that SOAP2.0 headers support.

- Intermediaries could have same default logic as ebMS 2.0 to reverse route any ebMS (user or signal) message by reversing eb:From and eb:To, copying eb:Service, eb:Action, eb:ConversationId, eb:AgreementRef and setting eb:RefToMessageId based on incoming eb:MessageId. We only need to think of a value for eb:Action. This model could be used to return a standalone wsrm:CreateSequenceResponse to the sending MSH.

<JD> we thought of that. But that seems contrived. We believe again that within an I-Cloud, routing should always be possible based on the URL of the destination node, in case this URL is known (in many cases, a single hop directly to this URL will be possible !) And that is usually the case for a response: we know where it should go. more generally, the URL of the Intermediary to which the response destination MSH is connected (the last hop could be a Pull). Header data that tells where the Response should go, needs then be added to the Request message by the first intermediary. That could be a wsa header (ReplyTo) or another ebMS-level header data.

Here I think you have very different deployment scenarios and user communities in mind than I. The existing ebMS 2.0 multihop users I know (and whose interest I have in mind) operate in environments where intermediaries are used to bridge private networks. Message handlers are not supposed to directly connect beyond one intermediary. They're not supposed to know the URL of intermediaries beyond the one they're connected to immediately, but if they knew, that URL wouldn't be in their DNS. If they knew the IP address, it would be an address in a different VPN that they can't even ping to.

- When sending an eb:Messaging/eb:SignalMessage, adding a eb:Messaging[@target='intermediary'] structure could provide rich header data to allow the signal to be routed across hops, even though the Signal itself lacks business semantics. This would even work for PullRequest which unlike eb:Receipt and errors is not a response message, where the trick of rerouting based on retrieving data from the preceding UserMessage using MessageId/RefToMessageId would not work.

<JD> right that PullRequest was never considered for "routing" so far. But we believe that is a low priority: the case for pulling across the entire I-Cloud has not been made yet... So yes some "routable" header could be added to a response Signal. But if doing so, it is better if the first intermediary on its way does this, rather than the endpoint MSH because that would require more than core V3 conformance.

- When sending an eb:Messaging/eb:UserMessage, adding a eb:Messaging[@target='intermediary'] structure would not be necessary, if the ebMS header is not encrypted, so that its header elements can be used for routing.

- When sending an eb:Messaging/eb:UserMessage, adding a eb:Messaging[@target='intermediary'] structure would allow the end-to-end ebMS header to be encrypted by the first intermediary and decrypted by the last one or by the recipient.

<JD> Possibly. Although instead of adding a second eb:Messaging header, we could consider instead an "eb:Routing" header that contains the subset of header values that are set as "constants" in the PMode, if we admit that all messages related to same PMode (or same CPA CanSend?) must be routed the same way. The first Intermediary to which the endpoint MSH is connected would add this header, based on PMode info (assuming here that an endpoint must "register" in some way to an Intermediary that will serve as its gateway to the I-Cloud).

This has some advantages in cases where (some of the) header data is sensitive too. The eb:Messaging structure targeted at the intermediary could be copied from the end-to-end headers, or it could have some derived generalized content (e.g. mapping many From/PartyIds to a more general PartyId, e.g. the one of the first intermediary). This would greatly simplify the maintenance of routing tables at the intermediary, which would just have to know how to get from one intermediary to another, not from any endpoint to any other endpoint. In realistic use cases (e.g. hubs linking geographies, or various sectors) there could be in the order of dozens of intermediaries serving thousands of endpoints.

- The last intermediary could remove this header block before forwarding the message to the Endpoint, as in your proposal.

- The second ebMS header would never be needed in peer-to-peer ebMS messaging.

Note: the above attempts to use routing based on ebMS header data for all ebMS traffic and WSRM lifecycle messages. This does not preclude the use of optimized routing information, as in your proposal. If the endpoint somehow pre-computes a way to express the HTTP URL of the ultimate recipient in a WS-Addressing header, as a custom header, as an HTTP URL or as an IP address, this could be encoded in the SOAP structure too. But before we look at optimizations, I wanted to make sure the ebMS 2.0 style of header-based routing can be used to cover end-to-end routing of user messages, signals and reliability messages.

<JD> But the main problem with ebms header-based routing of signals, is that Core V3 does not support adding such headers... note that even if you relax this restriction, a "reverse-routing" based on exactly the same headers with just a swap between FromParty and ToParty, assumes that the routing always works with only ToParty. What if in a first phase the routing uses "Service", then only for the last Intermediary, uses "ToParty"? How can an intermediary tell if it must do reverse-routing or just regular routing? I am not convinced this reverse routing system is viable - need to see it described more completely.

Yes, perhaps at an upcoming call or F2F

-Jacques

Pim

Dale, who had to leave for another meeting before completing comments.

ebxml-msg message