Hi,
A couple of comments:
On the "external payload reference":
I don't see why the Producer could ever play a role in serving the
payload to the receiver MSH or the Consumer application. The
referenced payload will be processed by the Sending MSH and is
likely to differ from the payload produced by the Producer, for
example, depending on the PMode, the Sending MSH may compress the
payload and encrypt it. So the external payload referenced by the
ebMS message has to have been submitted by the Producer along with
other metadata. The implementation of the interface for
submitting is out-of-scope for ebMS.
The difference in processing would be that, instead of packaging
the payload in a MIME part, it is stored at some location
accessible via an HTTP(S) URL, which the MSH constructs. This
is indeed most efficiently done if the MSH and the Web Server are
in the same infrastructure, so this is a feature mostly (but not
exclusively) of interest to servers and cloud infrastructures.
There would only be an additional transfer (and thus lower
efficiency) if the Sending MSH is a client and needs to upload the
payload to some external server. The external reference makes
most sense for servers. For them, there is no additional data
exchange: With an internal reference, the payload is
transferred three times: from Producer to S-MSH (
Submit()),
from S-MSH to R-MSH (
Send()) and then from R-MSH to
Consumer (
Deliver()). With an external reference, the
payload is also transferred three times: from Producer to S-MSH (
Submit()),
from S-MSH to R-MSH (
HTTPS GET) and then from R-MSH to
Consumer (
Deliver()). The advantage would be that the
Receiver controls the timing of the exchange of the payload.
To secure access to the payload, we could include additional
authentication information (e.g. a per-payload unique
username/password) in the PartProperties for the payload, so that
only the recipient of the message can download the attachment.
There could be an agreement on the domain name or URL prefix the
Sender MSH would use, e.g. via some extension PMode parameters.
If the Receiver MSH knows the URL for an external payloads will
always start with e.g. "
https://sender.example.com/ebms3/externalpayloads",
and if that URL somehow needs to be rewritten for the request to
be processed (for DNS or other networking reasons), this could be
a feature that products could offer, and it would address the
issues of the URL being part of the signed eb:Messaging header.
The receiver MSH could reject messages with payload references
that do not have the agreed value as a prefix.
Some other considerations are that we need to define Reception
Awareness for messages with external payloads, I guess we would
want the Receipt to be sent only after the Receiving MSH has
successfully downloaded the payload. As AS4 Receipts also control
resends, this may cause a few resends of the original message,
but since that original message would be small, it should not be
problematic. In summary, I still think the external reference
feature is useful and probably easy to specify. The main tricky
bits in an implementation would be in the security module.
On the use of Range Requests with Pull:
I'm interested in the details of the proposal, because I'm
afraid it will be difficult to get this to comply with both the
RFCs and ebMS3. E.g. RFC 7233 states that "
A server MUST
ignore a Range header field received with a request method other
than GET" but pull requests use POST.
Also, would you just be re-posting the same PullRequest, with
the same MessageId, or new PullRequests? What is the relation
between the ebMS3 MessageId and the Etag? Is the PullRequest
still processed by the MSH, or is knowledge of the Etag
sufficient? An alternative could be to encode the range in the
PullRequest, as an additional parameter, but then we have
different mechanism for Push and Pull.
<eb3:PullRequest mpc=
"http://msh.example.com/mpc123">
<extensionnamespace:RangeRequest>bytes
65982464-307502442/307502443</extensionnamespace:RangeRequest>
</eb3:PullRequest>
On Split-Join:
Another consideration is that we already have the split-join
feature, which is mentioned in Superannuation and is implemented
in at least two commercial products. It support push and pull and
non-repudiation. It also works very well with multihop. Is
there really a need for another mechanism to address the same
requirement.
Kind Regards,
Pim
On 06/04/2015 01:22 PM, Sander Fieten wrote:
Hi all,
as agreed at our April meeting
I would look into the options for handling large messages with a
focus on using the external payload reference and alternatively
HTTP restart.
When creating a profile for
handling large message with external payload references I think
the target is that the MSH will completely handle the processing
of the message which includes on the sending side that the
external payload must be made available to the receiver and on
the receiving side to download the payload(s).
Making the payload available to
the receiver is strictly seen not necessary because the sending
MSH could also use the already uploaded payload for its
processing. That however has two drawbacks: first the producer
application has to arrange for making the payload available to
the receiver and second the MSH has to retrieve the payload from
this location.
If the producer has to arrange
for the payload being available for download by the receiver it
gets involved in the message transport itself, if only limited,
and I think the case for using ebMS is that the business
application should not concerned about the transport of the
message.
That the sending MSH must also
retrieve the payload from the location where it is made
available by the producer application is not very efficient
especially when the payload is hosted in the cloud. In that case
the payload is first uploaded by the producer and then
downloaded again by the MSH for processing (signing). It may
also cause issues with network security because both the
producer application and MSH must have access to external
networks. Of course this is all solvable but it gets complicated
and our target should be to create a profile that is simple.
As already noticed during the
call an exchange that uses external payloads is always a kind of
pull exchange as the payload must be retrieved by the receiver
of the message. This will limit the usability of the external
payload if the sending MSH can only push message and can not
operate as a server. A possible solution is to upload the
payload to the cloud. I think this should be included in a
profile although it will make it more complex (because an MSH
must be able to upload the payload to the cloud and we need to
determine which upload protocols must be supported).
However a bigger issue that
limits the usability of the external payload is that the URL
included in the user message may not be accessible by the
receiver when operating in a multi-hop context. For example
because endpoints are in different networks and can not access
resources out of their network. Because the URL is included in a
possibly signed message the intermediary can not change it.
I therefore think that a
possibility to restart a transmission on the http level is
better to meet the objective of a reliable transfer of large
messages that works in a multi-hop environment and that does not
create dependencies between endpoints. Because the restart
function applies to the transport level it can be used only for
the hops that need it.
Because the restart takes place
at the http level it is transparant to the ebMS processing and
therefor does not requires changes to already implemented ebMS
processing. Only the correct configuration for http transmission
needs to be set.
=
Part 2 of the ebMS spec already
mentions the possibility of using AS2 Restart. The problem
however with AS2 restart is that it only applies to push
exchanges as it only defines restart for the entity of a POST
method request and not for the response entity. So for
restarting a pull additional specification is need. This could
however be based on the http range request as defined in
RFC7233. Although this RFC only defines the range request for
the GET method it can also be implemented for the response
entity of a POST method. The restart of a pull request would
then look like in the following diagram.
=
The restart request in this
diagram uses the POST method but it could also use GET to make
it a more regular range request. This is possible because the
restart is on the http level only so there is no need to resend
the PullRequest and therefore there is no entity needed in
restart request.
Enabling such a restart opens
the possibility for an attacker to sent a restart request asking
for a restart from the beginning, i.e. with Range:
bytes=0- http header. Possible counter measures are
restricting restarts from a certain number of bytes or securing
the http connection with SSL/TLS. I think this is no greater
issue than normal since an attacker that can already read the
communication between the MSHs can also replay complete
PullRequests.
An advantage I see with the
HTTP restart is that it can be implemented using proxies as well
so support can be implemented without modifying the current
implementation.
Because this solution fits very
well with requirements I have been told by several people I
would like to the create a specification for this http restart
function rather than for the external payload option.
Looking forward to comments.
Regards
Sander
=