OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

ebxml-msg message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [ebxml-msg] Proposal for handling large messages




Hi,

A couple of comments:

On the "external payload reference":  

I don't see why the Producer could ever play a role in serving the payload to the receiver MSH or the Consumer application.   The referenced payload will be processed by the Sending MSH and is likely to differ from the payload produced by the Producer, for example, depending on the PMode, the Sending MSH may compress the payload and encrypt it.  So the external payload referenced by the ebMS message has to have been submitted by the Producer along with other metadata.  The implementation of the interface for submitting is out-of-scope for ebMS.

The difference in processing would be that,  instead of packaging the payload in a MIME part,  it is stored at some location accessible via an HTTP(S) URL,  which the MSH constructs.   This is indeed most efficiently done if the MSH and the Web Server are in the same infrastructure, so this is a feature mostly (but not exclusively) of interest to servers and cloud infrastructures. 

There would only be an additional transfer (and thus lower efficiency) if the Sending MSH is a client and needs to upload the payload to some external server.  The external reference makes most sense for servers. For them, there is no additional data exchange:   With an internal reference,  the payload is transferred three times:  from Producer to S-MSH (Submit()),  from S-MSH to R-MSH (Send()) and then from R-MSH to Consumer (Deliver()). With an external reference,  the payload is also transferred three times:  from Producer to S-MSH (Submit()),  from S-MSH to R-MSH (HTTPS GET) and then from R-MSH to Consumer (Deliver()).  The advantage would be that the Receiver controls the timing of the exchange of the payload.

To secure access to the payload,  we could include additional authentication information (e.g. a per-payload unique username/password) in the PartProperties for the payload, so that only the recipient of the message can download the attachment. 

There could be an agreement on the domain name or URL prefix the Sender MSH would use, e.g. via some extension PMode parameters.   If the Receiver MSH knows the URL for an external payloads will always start with e.g. "https://sender.example.com/ebms3/externalpayloads", and if that URL somehow needs to be rewritten for the request to be processed (for DNS or other networking reasons),  this could be a feature that products could offer, and it would address the issues of the URL being part of the signed eb:Messaging header.  The receiver MSH could reject messages with payload references that do not have the agreed value as a prefix.

Some other considerations are that we need to define Reception Awareness for messages with external payloads,  I guess we would want the Receipt to be sent only after the Receiving MSH has successfully downloaded the payload.  As AS4 Receipts also control resends, this may cause a few resends of the original message,  but since that original message would be small, it should not be problematic.   In summary,  I still think the external reference feature is useful and probably easy to specify.  The main tricky bits in an implementation would be in the security module.

On the use of Range Requests with Pull:

I'm interested in the details of the proposal,  because I'm afraid it will be difficult to get this to comply with both the RFCs and ebMS3.  E.g. RFC 7233 states that "A server MUST ignore a Range header field received with a request method other than GET" but pull requests use POST.   

Also,  would you just be re-posting the same PullRequest,  with the same MessageId,  or new PullRequests?   What is the relation between the ebMS3 MessageId and the Etag?   Is the PullRequest still processed by the MSH, or is knowledge of the Etag sufficient?   An alternative could be to encode the range in the PullRequest, as an additional parameter, but then we have different mechanism for Push and Pull.

<eb3:PullRequest mpc="http://msh.example.com/mpc123">
    <extensionnamespace:RangeRequest>bytes 65982464-307502442/307502443</extensionnamespace:RangeRequest>
</eb3:PullRequest> 

On Split-Join:

Another consideration is that we already have the split-join feature,  which is mentioned in Superannuation and is implemented in at least two commercial products.  It support push and pull and non-repudiation.  It also works very well with multihop.   Is there really a need for another mechanism to address the same requirement.  
 
Kind Regards,

Pim


On 06/04/2015 01:22 PM, Sander Fieten wrote:
Hi all,

as agreed at our April meeting I would look into the options for handling large messages with a focus on using the external payload reference and alternatively HTTP restart.

When creating a profile for handling large message with external payload references I think the target is that the MSH will completely handle the processing of the message which includes on the sending side that the external payload must be made available to the receiver and on the receiving side to download the payload(s). 
Making the payload available to the receiver is strictly seen not necessary because the sending MSH could also use the already uploaded payload for its processing. That however has two drawbacks: first the producer application has to arrange for making the payload available to the receiver and second the MSH has to retrieve the payload from this location. 
If the producer has to arrange for the payload being available for download by the receiver it gets involved in the message transport itself, if only limited, and I think the case for using ebMS is that the business application should not concerned about the transport of the message.
That the sending MSH must also retrieve the payload from the location where it is made available by the producer application is not very efficient especially when the payload is hosted in the cloud. In that case the payload is first uploaded by the producer and then downloaded again by the MSH for processing (signing). It may also cause issues with network security because both the producer application and MSH must have access to external networks. Of course this is all solvable but it gets complicated and our target should be to create a profile that is simple.

As already noticed during the call an exchange that uses external payloads is always a kind of pull exchange as the payload must be retrieved by the receiver of the message. This will limit the usability of the external payload if the sending MSH can only push message and can not operate as a server. A possible solution is to upload the payload to the cloud. I think this should be included in a profile although it will make it more complex (because an MSH must be able to upload the payload to the cloud and we need to determine which upload protocols must be supported).
However a bigger issue that limits the usability of the external payload is that the URL included in the user message may not be accessible by the receiver when operating in a multi-hop context. For example because endpoints are in different networks and can not access resources out of their network. Because the URL is included in a possibly signed message the intermediary can not change it.

I therefore think that a possibility to restart a transmission on the http level is better to meet the objective of a reliable transfer of large messages that works in a multi-hop environment and that does not create dependencies between endpoints. Because the restart function applies to the transport level it can be used only for the hops that need it. 
Because the restart takes place at the http level it is transparant to the ebMS processing and therefor does not requires changes to already implemented ebMS processing. Only the correct configuration for http transmission needs to be set.
=


Part 2 of the ebMS spec already mentions the possibility of using AS2 Restart. The problem however with AS2 restart is that it only applies to push exchanges as it only defines restart for the entity of a POST method request and not for the response entity. So for restarting a pull additional specification is need. This could however be based on the http range request as defined in RFC7233. Although this RFC only defines the range request for the GET method it can also be implemented for the response entity of a POST method. The restart of a pull request would then look like in the following diagram.
=


The restart request in this diagram uses the POST method but it could also use GET to make it a more regular range request. This is possible because the restart is on the http level only so there is no need to resend the PullRequest and therefore there is no entity needed in restart request.
Enabling such a restart opens the possibility for an attacker to sent a restart request asking for a restart from the beginning, i.e. with Range: bytes=0- http header. Possible counter measures are restricting restarts from a certain number of bytes or securing the http connection with SSL/TLS. I think this is no greater issue than normal since an attacker that can already read the communication between the MSHs can also replay complete PullRequests.  

An advantage I see with the HTTP restart is that it can be implemented using proxies as well so support can be implemented without modifying the current implementation.

Because this solution fits very well with requirements I have been told by several people I would like to the create a specification for this http restart function rather than for the external payload option. 

Looking forward to comments.

Regards
Sander
=



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]