bt-spec message

Subject: Re: [bt-spec] addresses and identification

From: Alastair Green <alastair.green@choreology.com>
To: Peter Furniss <peter.furniss@choreology.com>
Date: Sat, 02 Feb 2002 12:06:04 +0000

Mark, Peter:

Let me enter a caveat on all that follows: I know from bitter experience that it
is very easy to miss the precise point that someone is making in these types of
e-mail discussions, and also to miss the underlying motivation or concern. If
I've got the wrong end of the stick, or I'm violently agreeing, please be
tolerant. I am still having trouble understanding what is being put forward by
you, Mark, and Jim, so I'll recast the discussion somewhat in my terms, to check
whether we're thinking about the same problems

The first thing I find hard to understand in this debate is "why do all of
this?".

The existing proposal that we in Choreology put forward involves a trivial
change to the specification (namely, make the ids URIs). Everthing else remains
the same. The addresses vector remains in place. It uses binding strings
(spec-defined) or URIs (extender-defined) to specify the binding, from which
(inter alia) the carrier protocol can be discovered. We then know the address
type to be used (and incidentally, and very importantly, this can be any kind of
address, including addresses that have nothing to do with URIs). The additional
information allows the endpoint to locate the processor behind it, if need be
(allowing any granularity of processor, any level of fanout from endpoint to
processor).

The only thing I can see driving the HP proposal is the ability to make the
identifying URI overlap with one of the addresses. For this to occur we must
construct rules about which address to use, and/or begin to run into the
possibility of seeing multiple names (treat all addresses as equal). Now, all
problems in computer science can be solved by another level of indirection, as
Cape Clear's CTO once said to me. Of course you can do this -- I just don't see
why you would want to bother. There is no gain of functionality, there is an
increase in complexity. And we save a few bytes. I think I recall an early
meeting of BTP where one Mark Little pointed out that saving bytes was an
ambition best realized when not using XML.

Then there seems to be a proposal to create a URI scheme which is used something
like this: (imagine the LFs aren't there; I'm using element names from memory,
not from the draft spec; I haven't bothered to check what a legal URI would
actually look like):

<address-as-superior>
"btp://soap1.1-http1.1/http://trade.acme.com/servlets/choreology-btp/trid=AEC0F335"

</address-as-superior>

as opposed to what we currently have, which looks something like this

<address-as-superior>
    <binding>soap1.1-http1.1</binding>
    <address>http://trade.acme.com/servlets/choreology-btp</address>
    <additional-information>trid=AEC0F335</additional-information>
</address-as-superior>

Now, I don't want to be accused of excessive fondness for XML, but I can't see
that these alternatives are distinguished by anything other than style. One is
terser, and less intuitive or informational; the other is more longwinded and
explanatory. I could vote for either, but I still don't understand the drive to
change the status quo. Defining a new URI scheme seems like a lot of work for
little reward.

The second thing I don't get is the following kind of statement:

>    It's a transaction protocol that lives in a distributed environment and
works with (hopefully)
> other services and functionality provided by other components in that
distributed system.
> It is *not* THE distributed system. Being protocol agnostic does not mean
> that we should be incorporating support for the lowest common denominator
> (e.g., "I know that protocol X doesn't support functionality Y that we may
> need Z% of the time so we must provide support for it in BTP.) If
> we go that route then we will be building our own distributed system
architecture
> around BTP and eventually the only recourse would be to start at
> the bottom up.

I don't think that the BTP draft attempts at all to define "THE distributed
system".

BTP defines abstract messages, and their legitimate contents and sequences in
bilateral relationships defined by endpoint states. Then it maps those down to
the underlying "distributed system" using ... pre-existing encodings or
representations (such as XML) and pre-existing carrier protocols, or protocol
stacks that BTP can treat as having carrier (endpoint-to-endpoint transmission)
capability. In using such carrier protocols it uses their pre-existing
addressing scheme, whatever that might be. (Note the three "pre-existing"s
here.)

BTP values carrier protocol independence. The binding scheme expresses that
fact, and I think it is one of the greatest strengths of BTP by comparison with
older protocols. Here are the non-HTTP carrier protocols our prospective
customers or partners have mentioned to us in discussion on their requirements:
MQSeries, Java RMI, FIX, JMS.

Being protocol agnostic is absolutely meaningless unless we incorporate support
for the lowest common denominator. We have a terribly simple scheme in the spec
whereby the binding can be anything you can dream up, with more or less gunk
surrounding the actual carrier capability (e.g. FIX). The addressing scheme can
be anything that the bound protocol wants (a stringified CORBA object reference,
a non-URI string address of another kind, e.g. a name/directory service entry, a
binary address). The additional information field allows us to send processor
addressing information to the endpoint, if the addressing scheme for the
endpoint does not allow the insertion/extraction of such user data through the
carrier protocol itself. This allows complete freedom of implementation in terms
of processor granularity (from object per actor-in-role-per-transaction to one
object for all actors for all roles for all transactions at a site).

Another way of expressing this is: BTP requires that carrier protocols permit
payloads to be sent from A to address B and back from B to address A. That is
indeed our lowest common denominator, and it seems an excellent idea to have
such a simple requirement.

With this scheme we (the BTP technical committee) have already allowed for *any*
"distributed system" to be used. It is the precise opposite of building our own
distributed system architecture in BTP. We make BTP float above all manner of
present and conceivable distributed system architectures, by defining a
message-oriented approach to communication which imposes well-stated but minimal
impositions on the underlying carrier (unordered, complete delivery or wholesale
failure to deliver).

Anything less flexible has a witting or unwitting bias in favour of some
particular stripe of distributed system, and I think that would be a grave
error.

If all you want to do is SOAP/HTTP then that's great -- the bias of  the spec is
all in that direction, that's where the work's gone in collectively on the first
binding. If you want to do more, that's allowed, and that's good because a good
standard should enable competition and variation, not just homogenise
everything.

Now, there are some features that BTP must have, and that virtually no existing
carrier protocol or stack can provide to us ready made.

We need: address vectors (to accommodate multiple concurrent protocols at a
given site). This is pre-eminently a BTP feature (cannot, by definition, be
incorporated in any single underlying carrier protocol).

We need reply addresses (which some protocols may provide).

We need routing capability (which some protocols may provide).

We need redirection capability (which some protocols may provide, but which is
very unlikely to be present in the spontaneous variety needed when a REDIRECT is
sent out).

If a carrier protocol provides suitable reply addresses, routing and redirection
then the binding can take advantage of that fact. The abstract view is
unaltered.

Finally, if the carrier offers a request-response feature then, in some
circumstances, we can take advantage of that, and therefore we must express, in
the course of binding, how that optimization can be exploited and what that
imposes on the implementation.

As and when a super-carrier protocol emerges which takes account of all of these
(unavoidable) requirements, then it will be a matter of writing a binding that
causes the message payloads to be very terse. In the meantime, we have to put a
certain amount of distribution protocol into our transactional protocol in most
bindings. This is definitely the case with SOAP/HTTP, where some new proposals
are afloat, but are a long way from realization or acceptance or availablity.

This seems to me to be a reasonable compromise with pre-existing reality, as we
all find it.

Yours,

Alastair

Peter Furniss wrote:

> > > Is this how your proposal works:
> > >   in a relationhsip where there were multiple URIs on the
> > CONTEXT, the URI
> > > on (in the payload of) PREPARE (for example) *must* be the one used for
> > > sending the PREPARE.
> >
> > Not necessarily. As long as it identifies the specific transaction then it
> > could be something entirely different looking.
>
> Umm - still confusing. Can you state the algorithm an Inferior must use to
> select the identifying URI (for the Supeiror) to put in the ENROL from among
> the multiple URIs there were in the CONTEXT, given that the URI for
> addressing has (had to be) one of the URIs that was not used for
> identification on the CONTEXT.
>
> from your answer above it is not "put the URI used for the addressing in the
> payload for identification"
>
> at one point, we thought it was "use the URI that was used in the CONTEXT
> for identification", but some of your other answers appeared to contradict
> that
>
> it could be "use any of the URIs, at senders choice"
>
> > Sorry, but it need not be baroque. That would have to be a design
> > choice you
> > make.
>
> We are talking about specification. The TC has the design choice to be
> complicated. If the specification is complicated, the understanding or the
> implementation are complicated too (not necessarily both). If the
> specification is less complicated, implementation is easier (though could
> still be baroque if one wanted )
>
> > > The multiple matching was one of the main vices of the old approach, but
> > it
> > > could be done. And if it has to be done, it can be. But if you start
> > > redirecting, things can get next to impossible - entity starts out with
> > URIs
> > > A and B; then migrates so new URIs are C and D. Sends a "pro-active"
> > > REDIRECT, announcing this, but that gets lost. Then it sends a message,
> > > identifying itself as C. That arrives where it is still known as only A
> > and
> > > B. The message is thought to be junk (protocol error) and
> > discarded. This
> > > gets even worse if the second message is another REDIRECT -
> > it's now at E
> > > and F !
> >
> > True, but that is assuming a) you do not ensure the migration protocol is
> > complete and avoids this(!),
>
> so now we must have REDIRECT-acks ?
>
> >         b) the URIs do not contain sufficient
> > information to allow determination of past identity,
>
> Surely explicitly disallowed by some of your previous statements -
> non-creator of phone:234355?fred may not assume it is related to
> phone:2366334?MrSmith/fred
>
>  and c) that
> > BTP should
> > be responsible for this. The latter one is probably the most
> > important: just
> > as BTP should not be a protocol bridge it should not be
> > attempting to solve
> > all of the worlds distributed system woes!
>
> It isn't trying to solve the world's, only its own. It hadn't out addresses
> saying "this BTP entity will be accessible using these
> address(es)+identifier(s)", and that information has been invalidated.
>
> >    It's a transaction
> > protocol that
> > lives in a distributed environment and works with (hopefully)
> > other services
> > and functionality provided by other components in that distributed system.
> > It is *not* THE distributed system. Being protocol agnostic does not mean
> > that we should be incorporating support for the lowest common denominator
> > (e.g., "I know that protocol X doesn't support functionality Y that we may
> > need Z% of the time so we must provide support for it in BTP.) If
> > we go that
> > route then we will be building our own distributed system architecture
> > around BTP and eventually the only recourse would be to start at
> > the bottom
> > up.
>
> At abstact or model level, the statement is "BTP requires functionality Y,
> and models this requirement with features P.", and then that P may be mapped
> to facilities of the carrier (constraining the choice of carriers), is
> always provided by BTP,  may be provided by BTP in the case that the carrier
> does not, or may be provided by a shim protocol that adds the feature to the
> carrier (this last is actually a redefinition of the carrier; the previous
> option could be viewed as merging the shim protocol into BTP). But Y and P
> have to be stated at abstract level in some way.
>
> For some Y, we do state them as requirements on the carrier (e.g. messages
> shall be delivered uncorrupted or not at all). For others, the overall
> representation and binding rules would allow them to be mapped to carrier
> facilities if such are available, or to be handled by BTP. So a binding to
> carrier that did have the requisite redirection itself would map REDIRECT to
> that, and the BTP REDIRECT message itself would not be used for addresses
> with that binding.
>
> > > (analogy: you get a Christmas card from "Tom and Lucy Smith", names you
> > have
> > > never heard of. This is because when you knew her, she was Miss Lucy
> > Jones,
> > > and she's got married (low-key wedding :-). If she doesn't identify
> > herself
> > > by some unchanging identifier, you're not going to be able to
> > sort it out.
> > > (obviously lots of ways humans do that).
> >
> > Yes, but if she'd written her maiden name on the card too I'd have known
> > immediately.
>
> So a message from a relocated entity must put all (one ?) of its obsolete
> URIs on every message it sends in future, as well as the one it is using now
> ?   Why is this different from having a separate identification-only URI ?
>
> > > Surely, this is the same in both cases (and a security matter anyway) -
> > it's
> > > completely irrelevant to the issue. We have assumed we aren't
> > dealing with
> > > malicious parties.
> >
> > No it is not irrelevant. What I'm saying is that if an entity has three
> > addresses (URIs) that it wants to publish so that it can be contacted for
> > transaction 1234 then it had better be able to match up the incoming URI
> > with the transaction. This then becomes an end-point resolution
> > problem and nothing to do with BTP or the binding protocol.
>
> Not exactly a "problem". On our scheme the binding-address takes you to
> however detailed a carrier endpoint you want, and then, if it's needed, the
> additional-information can take you further to somehting that does recognise
> the identifier.  And also, while BTP only requires the identifier to be
> unambiguous and everyone treats it as opaque, that doesn't mean you can't
> put detail information in it, using an internal syntax of your own design
> (so http://acme.com/not-a-real-url?fred/alias=joe would be fine)
>
> (this freedom to put locally-understood stuff in the identifier URI isn't
> the same in your scheme, because the URI will need to be understood by the
> far-end that is trying to use it as an address)
>
> > > At least if there is a separate unchanging identifier, one can say "this
> > is
> > > a message for the person whose National Insurance number is BG 63 59 K",
> > > regardless of what address or "local name" the person is using.
> >
> > However, in some environments/applications I may not be able to guarantee
> > that I can get the same id in multiple places. Have you ever tried to get
> > email addresses on yahoo, AOL and freeserve, for example? It's an
> > iterative
> > process to get something you can live with - you'll probably never get the
> > name you want on all three. But they do point to you.
>
> yes they are addresses.  I'm still the same person, and I have the same full
> name (and national insurance number) at all of them.  If the NI people sent
> email, they'ed always quote the NI number in all the messages, you can be
> sure.
>
> > > (just to check: we are both talking about having the multiples only on
> > what
> > > Doug called the "initial"  messages, aren't we?  )
> >
> > Sorry, can't remember what the "initial" messages meant.
>
> BEGUN, CONTEXT and ENROL - the messages that are exchanged so parties in a
> relationship know where the other one is.  As far as I can make out, the
> byte-saving virtues of your approach apply only to the set of URIs on these
> messages, where one of the addressing ones can be dropped if the
> identification URI is also an address.  So all this is about saving 100
> bytes on 3 messages per transaction ?
>
> > > My discussion with Jim towards the end of the meeting yesterday included
> > > that if you want it to be
> > >    http:// ...
> > > then that can be used only one BTP binding in all of history.
> >
> > That is true if no additional information were present in the URI.
>
> What - you'ed put a suffix on a URI from the http scheme, and say this isn't
> part of the http use ?
>
> > > So if we say
> > > that BTP URIs for soap-http-1 binding are http:, then
> > soap-http-2 has got
> > to
> > > use something different.
> >
> > Correct, which means you wouldn't do it that way, i.e., I'd
> > expect something
> > like soap-http-1:// and soap-http-2://
>
> Exactly.
>
> > > As was said yesterday, using pre-existing URI
> > > schemes also can be problematic if there are bits of identifying
> > information
> > > that are not going to used by the carrier protocol (i.e. must be ignored
> > in
> > > the addressing use).
> >
> > Don't see why it's problematic. If I don't need it I'll just ignore it.
>
> For the soap-http-1 style, you could state rules about where the http url
> inside it stops and the non-carrier information begins, so something trying
> to use it to send to you would know. But all you've really done then is
> squeeze the three fields of the current btp address structure into a URI
> with an ad-hoc format, that has to be specified for the URI scheme.   What
> was the point of this again ?
>
> > > I can see that it might seem rather tiresome to send, as an
> > address field
> > > something that is in fact a globally unambiguous name for the entity in
> > > question, and this is also being sent as the identifier. But trying to
> > > overload the identifier field with all the addressing
> > implications doesn't
> > > seem to work in the generality. A BTP address can't neatly be just a URI
> > > that is a URL for the carrier. You need the binding name and the
> > additional
> > > information  as well.
> >
> > Again correct, but as I have shown above that is not an insurmountable
> > problem. Any information that is available in your proposed scheme can be
> > made available in our scheme. We are not arguing about losing
> > information or
> > making it harder to understand.
>
> Yes, your scheme could work. It needs quite a lot of extra specification in
> various places, some of it equivalent to what is already there (i.e. moved
> or copied), some of it additional. But it doesn't seem to give any gain
> worth the candle.
>
> > Going back to our last proposal, which I don't think you've actually
> > addressed: what is wrong with allowing both such that if the
> > address-identifier URI is not of a specific format (defined by the
> > carrier-binding) then it must only be interpreted as an identifier and
> > additional addressing information must be present? And in any message you
> > cannot mix and match.
>
> I'm trying to understand precisely what you are proposing.
>
> By "defined by the carrier-binding", do you mean the "carrier-binding used
> to bring the messsage that brought the URI here from the place it refers
> to", or the "carrier-binding that the URI, by its format and content,
> indicates should be used when the URI is used an address".
>
> And "in any message you cannot mix and match" ? that all URIs in a message
> must be for the same carrier-binding ? that the identifier URI must match
> the carrier being used for the transmission ?
>
> Peter
>
> ----------------------------------------------------------------
> To subscribe or unsubscribe from this elist use the subscription
> manager: <http://lists.oasis-open.org/ob/adm.pl>

begin:vcard 
n:Green;Alastair
tel;cell:+44 795 841 2107
tel;fax:+44 207 670 1785
tel;work:+44 207 670 1780
x-mozilla-html:FALSE
url:www.choreology.com
org:Choreology Ltd
version:2.1
email;internet:alastair.green@choreology.com
title:Managing Director
adr;quoted-printable:;;13 Austin Friars=0D=0A;London;;EC2N 2JX;
fn:Alastair Green
end:vcard

Follow-Ups:
- Re: [bt-spec] addresses and identification
  - From: Mark Little <mark_little@hp.com>

References:
- RE: [bt-spec] addresses and identification
  - From: Peter Furniss <peter.furniss@choreology.com>