bdx message

Subject: Re: [bdx] BDX Addressing Mechanism Requirements

From: Mike Edwards <mike_edwards@uk.ibm.com>
To: "bdx@lists.oasis-open.org" <bdx@lists.oasis-open.org>
Date: Wed, 30 Mar 2011 11:27:26 +0100

Dale,

Yes, there is some history that we clearly have not communicated well.

I've put some inline comments where appropriate as <mje>...</mje>

Yours, Mike



Dr Mike Edwards		Mail Point 137, Hursley Park
STSM		Winchester, Hants SO21 2JN
SCA & Services Standards		United Kingdom
Co-Chair OASIS SCA Assembly TC
IBM Software Group
Phone:	+44-1962 818014
Mobile:	+44-7802-467431 (274097)
e-mail:	mike_edwards@uk.ibm.com

From:	Moberg Dale <dmoberg@axway.com>
To:	"bdx@lists.oasis-open.org" <bdx@lists.oasis-open.org>
Date:	28/03/2011 22:31
Subject:	[bdx] BDX Addressing Mechanism Requirements

Hi BDXers,

Mike Edwards’ presentation on BDX addressing left me wondering what were the original requirements for the proposed mechanism for the BDX discovery service, and perhaps what were the original goals leading to the 4-corner model ? It is often useful to understand the problem space for a design before assessing it or considering alternatives. Unfortunately, this leaves many of us who were not participants in a long prior effort at a disadvantage. Nevertheless, I will read off some of the questions that the addressing model “basic system elements” and beyond raise for me.

A requirement is that the system is to support sending business documents one-way “through” 2 intermediaries called “access points.” The diagram suggests that senders know how to send a message to one access point (possibly more), and how to indicate the ultimate recipient (somehow); recipients know how to publish some service metadata to a publisher. Important points reveal that participants can be either receivers or senders, a participant has an id of some sort, and the SOAP body has a “document type” (that we later learn has a URI identifier).

<mje>
First, the model is a 4 corner model, so that a sender always sends documents to an Access Point and a receiver always receives documents via an Access Point. The Access Points take responsibility for getting the documents between each other in a secure reliable way that gets the documents to the "right place". This is similar to the way in which email servers work.

Another aspect of this model is that the communication between sender and AP and between AP and receiver is NOT fixed and can (deliberately) take numerous forms that are a matter of agreement between the APs and the senders / receivers. This is perhaps similar to the case with email servers in that the clients (senders and receivers) can connect to those servers by various means (eg POP/SMTP clients, Web browser clients, etc).

Recipients don't have to know how to publish service metadata to an SMP - but the AP to which they are attached must know how to do this (logically the receiving endpoints are on the receiving AP - the actual receiver may have no endpoints at all, for example if the receiver uses a Web browser interface to access their incoming documents).

Participants can indeed be senders and/or receivers (most business exchanges in PEPPOL expect them to be both but for other business exchanges this is not strictly necessary).

It is expected that any participant will have a Participant ID - this is required even for a pure sender so that assertions can be made about the identity of the sender.

It is indeed also expected that what is sent is of some "document type" - and strictly a receiving AP could reject any document that is not of a type that is recognised by the AP (this gets interesting where the documents are encrypted, which is possible for the BDX network).
</mje>

Mike’s question was “how does a sender participant find where to send the document?” From the diagram, the participant needs to send documents to an access point. So one question is how a participant learns the address of the access point, which is as I understand it, done by a registration into the “domain” of an access point. So I think the question Mike is resolving might be: “ how does the access point know which access point to send my document to next?”

<mje>
Yes, you're right, it is strictly the sender's AP that does this lookup - in the same way that when sending email the sender client just attaches some target address to the email such as xxx@yyy.com and it is the sender's email server that does a lookup to find the address of the relevant receiver's email server.
</mje>

At this point Mike takes us into a detailed solution involving publishing metadata records, querying metadata records, DNS record manipulation, and then finally routing the message. I lost track of whether the message is always or sometimes forwarded by (SOAP) intermediaries (access points) or whether the sender sometimes or always goes through a metadata query, retrieval and DNS resolution preliminary process, which is then followed by a direct SOAP message to the ultimate SOAP recipient (whose URL is known and resolvable by DNS queries for A RR records.) It is perhaps worthwhile to reflect how other messaging protocols facing problems abstractly similar to BDX make use of DNS.
<mje>
Sorry if it confusing.

The documents are ALWAYS transmitted between the Access Points - these are the "heavyweight" guys that do all the secure, reliable stuff.

All the DNS records and lookups are the business of the APs.

SOAP messaging is strictly between the sending AP and the receiving AP. SOAP is not even required between sender and AP or between AP and recipient (although it could be used there)

Email does make some use of DNS lookup to find the address of the target server. BDX takes this a bit further since we don't resolve an address that simply belongs to the target AP, but rather one that relates directly to the recipient. The added flexibility that this implies is that a given business can move its service endpoints between different AP providers without having to change its logical address (unlike email, where a change of provider typically forces a change of email address, as I experienced in the early days of broadband before I learned not to use the broadband supplier's email but instead something less likely to change like Google & Yahoo email....). This one-level indirection is something that BDX adds as an explicit acknowledgement of the realities of the market for eCommerce.
</mje>

Both SMTP/POP and SIP, for example, can be thought of as having intermediaries (MTAs or proxies, respectively). In either case, there can be N intermediaries between sender and recipient (so 2 is easy!). An email address has a user identifier and a domain (user@somewhere), and a destination email address resolution involves using DNS by making a DNS query for the domain ( somewhere.topleveldomain.) to retrieve resource records (RR) for MX (mail exchange). The MX and NS (namespace) RR were generalized to SRV records way back when, and SIP, for example, uses SRV records for domains to figure out what proxy/intermediary to contact first. So the problem of finding an intermediary for a domain using a protocol is nowadays often translated into a DNS SRV query whose answer, when successful, often helpfully gives the A address information for the target server (the IP4 or IP6 dotted numerical address).

<mje>
I hope I've explained the difference that BDX has in my previous comments...
</mje>

The above assumes that there is a basic semantic to an address of record (AOR) involving a user identifier and a domain identifier. This leads me to my first question: why has BDX selected an identifier scheme (Universal Business Identifier) consisting of a scheme code and an id “0010:5798000001” and a second, how do I learn these values? I am guessing that these values are the ones I feed to the Metadata Service Providers in some wsdl defined SOAP or HTTP bound message in order to eventually get an address, through steps involving processing metadata info, to get an A record from DNS?

<mje>
First, I'm going to disappoint you by saying that the Participant Identifier scheme is more complex than that :-( - BDX does not only use the Universal Business Identifier - and the reason for this scheme code / ID combination is to allow for a great deal of flexibility in how Participants are identified.

The reality is that business and organizations (uniquely) identify themselves in a whole variety of ways - and this differs by country and by industry. Larger businesses may well use a 'universal' identificaton scheme such as DUNS numbers, but there are huge numbers of small businesses out there that don't have this kind of identity. They usually have SOME kind of clear identity - for example, they may have a Tax ID (VAT registration numbers are common throughout the EU, for example) - but this is typically country-specific - or they may have a Company Registration ID, again typically country specific.

So, BDX allows for a large number of possible identification schemes - the "scheme code" field aims to deal with this - which of course does then require maintenance of some form of Registry of those schemes (PEPPOL has a simple form of such a Registry). The ID can then be whatever form of code that is unique within that given scheme ( there is an assumption of uniqueness within any given scheme).

The Participant ID in this form must be used when identifying the sender and receiver of a document. The receiver's ID is used for the lookup of the target service addresses via the SMP (note that it is used in the form of an MD5 hash simply to ensure that the lookup only uses characters that are valid for a DNS address lookup - this can't be guaranteed for the plain ID form for all possible schemes) - but yes, ultimately it is this Participant ID that is used for the DNS lookup.

How you learn the Participant ID(s) you need to use? Just the same as an email address - someone has to tell you. This is out-of-band. You have to know who you are trying to talk with. You almost certainly have some kind of business relationship with them, so this does not seem burdensome - to send a regular snailmail letter to them you had to have their mail address - how did you find that out?
</mje>

I will spare you from repeating details about how other messaging systems (like MSRP, VOIP, or GS1 ONS for the “internet of things”) leverage DNS, intermediaries, domains, domain level services, user ids, security, auth, and trust. I would like to understand whether DNS SRV records themselves might give BDX enough redirection to accomplish the routing tricks that are required. But to make this assessment, the requirements for routing, record update frequency, service naming need to be enumerated. In addition, because “content” seems to be involved in routing (besides recipient participant id) will there be a standard notation for service.doctype values? (Mike did state that doctype alone is informationally incomplete; “service” here is the qualifier (from the process namespace) that is intended to give an informationally complete identification of the business action offered by the recipient or his agents; the location of this service can presumably come somehow from DNS.)

<mje>
I think that the input BDX specs do cover all of this - but perhaps we need to explain it better.

The fact that the connections from sender - AP and AP - recipient are left "open" may be confusing, but this is a deliberate level of flexibility that is important in the model.

</mje>

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

References:
- SMP and CPP
  - From: "Pim van der Eijk" <pvde@sonnenglanz.net>
- BDX Addressing Mechanism Requirements
  - From: Moberg Dale <dmoberg@axway.com>