Re: [xri] subject matching

On Fri, Aug 21, 2009 at 3:35 PM, Eran Hammer-Lahav <eran@hueniverse.com> wrote:

No one is a big fan of this solution.

The reasons why ‘match’ was selected are that it did not require changing the schema type from URI to string, and it fit the trust model suggested in which the authority of the subject was compared to the certificate used to sign. It is very much a hack.

I strongly object adding a type to <Subject>. XRD describes web resources and web resources use the URI namespace.

It looks like we found a "web resource" (hosts) that can't be named in this namespace, but that is a legitimate subject of an XRD. I'm not proposing to re-invent the namespace that is already defined by URIs. If the subject of the XRD can be described using a URI, we use type="uri". If it can't, then we use type="somethingelse".

Your argument sounds to me like you're saying "everything we need can be described by a URI, so there is no need to come up with a new namespace". But since the premise seems to be wrong (we can't seem to figure out how to describe the subject of a host-meta using a URI), I'm suspecting that the "we don't need a new namespace" may also be wrong :-).

Inventing another namespace (which is what a type attribute does) is a really bad idea. At the same time, inventing a new mechanism for subject sets is out of scope of this work because we don’t have any use cases or requirements beyond host-meta. So the solution has to be somewhere in between a new subject namespace and a new construct for subject sets.

The concerns raised below regarding the use of match in host-meta and non-http identifiers is valid, but can be “excused” by saying that host-meta is really about http resources (something many people argued for on the IETF list when the topic of email URI came up), but WebFinger uses it to store its metadata. It is not clean but allowed. WebFinger is a separate protocol with its own rules and trust requirements. Does this explanation make me happy? No. But I can live with it.

But since the past few posts make it clear we don’t really have consensus about it, we should attempt to reach a better resolution. Here are the alternatives I was able to come up with:

1. host-meta will define its own element <Host> and will not use <Subject>. Since host-meta will not define a trust profile, WebFinger will need to figure out how to deal with <Host> instead of <Subject>.

I could live with that, although it seems a bit of a cop-out. Clearly, the (upper-case) Host is the (lower-case) subject of this host-meta, so why not stick it into the (upper-case) Subject element?

So the idea here would be that as far as XRD (the spec) is concerned, there would be no Subject in host-metas. And host-meta (the spec) would say "the subject of a host-meta is in the Host element". That means we would have to make Subject optional in XRD, right?

2. host-meta will use a DNS URI (something like dns:example.com or something more complex with SRV record).

I could live with that, although I predict I won't understand half of the debate that will undoubtedly break loose when the web purists get wind of this (because, you know, baby angels die when you violate dns: URIs like that).

3. host-meta will use a <Link> with extension relation type and the address of the host-meta file. Clients will need to figure out trust issues elsewhere.

Not sure I understand this proposal.

I think my vote goes to (1), for now. The more I think about it, the more I like it - perhaps even more than my own type="host" proposal :-)

Dirk.

Any more?

EHL

From: Dirk Balfanz [mailto:balfanz@google.com]
Sent: Friday, August 21, 2009 2:11 PM
To: XRI TC
Subject: [xri] subject matching

Hi guys,

I don't like the idea of "subject sets", and in particular the "beginswith" mechanism to express a certain kind of subject sets.

Let me start by explaining how I understand the feature. If I misunderstood, then much of my rant below will not make sense.

An XRD with

<Subject>http://www.example.com/foo</Subject>

is authoritative for the resource http://www.example.com/foo. If there is a <Link><Rel>author</Rel><URI>mailto:bob@gmail.com</URI></Link> in the XRD, it means that the author of http://www.example.com/foo is bob@gmail.com.

An XRD with

<Subject match="beginswith">http://www.example.com/foo</Subject>

is authoritative for all resources that begin with http://www.example.com/foo, which in this case means (1) they're http resources, (2) they're hosted on www.example.com, and (3) their paths start with /foo. If there is a <Link><Rel>author</Rel><URI>mailto:bob@gmail.com</URI></Link> in that XRD, then that means that the author for all the above-mentioned resources is bob@gmail.com.

Am I getting this right so far?

As far as I can tell, this design came about as follows:

- we decided to make the format of host-meta XRD, which meant we now have XRDs for hosts (as opposed to just URI-addressable resources).

- we needed a way to specify the Subject of such a host-meta, which needs to be a URI.

- Eran tried to get support for a URI scheme for hosts (or, alternatively, was asking for better ideas), so we could say something like <Subject>host:example.com</Subject> to mean that this XRD is about a _host_, but didn't get much love.

- As an alternative, this scheme was proposed.

My first gripe is that this doesn't seem to solve the original problem, which was to find a way to say that this XRD is about a host. Instead, it allows us to say that this XRD is about a set of (usually http) resources, which is different.

My second gripe is that the idea of subject sets doesn't seem to be compatible with one of the constraints that started us down this road: that the Subject must be a URI. It is pure coincidence that the "beginswith" matching rule results in a set-describing pattern that looks like a URI. If we really believe that being able to denote a whole set of subjects is an important use case (I haven't seen evidence of this), then we should put our money where our mouth is and allow something like this:

<Subject match="regex">(http://)|(mailto:)(\s+@)?example.com</Subject>

At this point, Subject is no longer a URI. It's not too surprising that something that's supposed to describe a set of URIs is not, itself, a URI. Relying on the fact that the one set-describing pattern we're currently defining happens to result in patterns that look like URIs is IMO quite brittle.

My third gripe is that it's a hacky solution for things like OpenID or webfinger. Let's look at webfinger: You start off with an email-like identifier like joe@example.com, and want to discover meta-data about it. The steps you need to do are as follows:

(1) peel out the host from the identifier (yields "example.com")

(2) slap the string "http://" in front of it (yields "http://example.com")

(3) Look at the Subject in the host-meta that you believe is authoritative for this meta-data-resolution. If "http://example.com" starts with whatever it says in the Subject, then you're looking at the right host-meta.

(4) Look for a URITemplate in the XRD, etc., etc....

Step (2) is there for no other purpose than to make this hack work. That's just ugly.

My fourth gripe is that I don't understand the trust implications of subject sets. Trust is something that apps are supposed to develop their own profiles for, so let's pretend we're trying to do this for webfinger. With the language we're currently setting up in the spec, I would think that webfinger would want to say something like this:

(1) Extract the host from the identifier (e.g., joe@example.com -> example.com)

(2) Find the host-meta for that host (i.e., host-meta for example.com)

(3) Make sure that the Subject in the host-meta _matches_ http://example.com (we can't say "... _is_ http://example.com", because such an XRD would be about the root resource on example.com, which is not what webfinger is looking for).

(4) Check that the signature on the XRD is generated by someone authoritative for the XRD's Subject.

(5) ....

That, however, is not secure. Let's say I somehow ended up with an XRD that looks like this:

<XRD>

  <Subject match="beginswith">http://example.co</Subject>

  <Link><Rel>webfinger</Rel><URITemplate>...</URITemplate></Link>

  <Signature>...</Signature>

</XRD>

(maybe a man-in-the middle injected it as I was fetching http://example.com/.well-known/host-meta, or I got the wrong host-meta from http://hostmetas-r-us.com/?domain=example.com - whatever). The Subject matches http://example.com (according the current definition in the XRD spec). So now if the XRD is signed by example.co the signature checks out, and we just got hacked by the Colombian mafia.

I'm not saying that there is no way that webfinger could possibly define a secure profile, but as you can see, the "obvious" way to define a trust profile for webfinger resulted in something bad because the "beginswith" directive interacts strangely with the trust assumptions.

Ok, I think I'm all griped out :-).

So, unless some of my assumptions here are wrong, I would like us to reconsider this beginswith business.

Since we don't have URIs that represent hosts, I think our only option is to relax the requirement that a Subject has to be a URI (something I believe we're already on the way toward if we want allow "subject sets").

My proposal: have two subject types. One for hosts, one for URIs.

<Subject type="uri">acct:joe@example.com</Subject> // describes Joe's meta data

<Subject type="uri">http://example.com</Subject> // describes meta data of root http resource in example.com

<Subject type="uri">http://example.com/</Subject> // describes meta data of root http resource in example.com

<Subject type="host">example.com</Subject> // describes meta-data of host example.com

What do you guys think?

Dirk.

xri message