Re: [xri] subject matching

On 21-Aug-09, at 5:10 PM, Dirk Balfanz wrote:

Hi guys,

I don't like the idea of "subject sets", and in particular the "beginswith" mechanism to express a certain kind of subject sets.

Let me start by explaining how I understand the feature. If I misunderstood, then much of my rant below will not make sense.

An XRD with

<Subject>http://www.example.com/foo</Subject>

is authoritative for the resource http://www.example.com/foo. If there is a <Link><Rel>author</Rel><URI>mailto:bob@gmail.com</URI></Link> in the XRD, it means that the author of http://www.example.com/foo is bob@gmail.com.

An XRD with

<Subject match="beginswith">http://www.example.com/foo</Subject>

is authoritative for all resources that begin with http://www.example.com/foo, which in this case means (1) they're http resources, (2) they're hosted on www.example.com, and (3) their paths start with /foo. If there is a <Link><Rel>author</Rel><URI>mailto:bob@gmail.com</URI></Link> in that XRD, then that means that the author for all the above-mentioned resources is bob@gmail.com.

Am I getting this right so far?

As far as I can tell, this design came about as follows:

- we decided to make the format of host-meta XRD, which meant we now have XRDs for hosts (as opposed to just URI-addressable resources).

- we needed a way to specify the Subject of such a host-meta, which needs to be a URI.
- Eran tried to get support for a URI scheme for hosts (or, alternatively, was asking for better ideas), so we could say something like <Subject>host:example.com</Subject> to mean that this XRD is about a _host_, but didn't get much love.

- As an alternative, this scheme was proposed.

My first gripe is that this doesn't seem to solve the original problem, which was to find a way to say that this XRD is about a host. Instead, it allows us to say that this XRD is about a set of (usually http) resources, which is different.

My second gripe is that the idea of subject sets doesn't seem to be compatible with one of the constraints that started us down this road: that the Subject must be a URI. It is pure coincidence that the "beginswith" matching rule results in a set-describing pattern that looks like a URI. If we really believe that being able to denote a whole set of subjects is an important use case (I haven't seen evidence of this), then we should put our money where our mouth is and allow something like this:

<Subject match="regex">(http://)|(mailto:)(\s+@)?example.com</Subject>

At this point, Subject is no longer a URI. It's not too surprising that something that's supposed to describe a set of URIs is not, itself, a URI. Relying on the fact that the one set-describing pattern we're currently defining happens to result in patterns that look like URIs is IMO quite brittle.

My third gripe is that it's a hacky solution for things like OpenID or webfinger. Let's look at webfinger: You start off with an email-like identifier like joe@example.com, and want to discover meta-data about it. The steps you need to do are as follows:

(1) peel out the host from the identifier (yields "example.com")
(2) slap the string "http://" in front of it (yields "http://example.com")

(3) Look at the Subject in the host-meta that you believe is authoritative for this meta-data-resolution. If "http://example.com" starts with whatever it says in the Subject, then you're looking at the right host-meta.

(4) Look for a URITemplate in the XRD, etc., etc....

Step (2) is there for no other purpose than to make this hack work. That's just ugly.

My fourth gripe is that I don't understand the trust implications of subject sets. Trust is something that apps are supposed to develop their own profiles for, so let's pretend we're trying to do this for webfinger. With the language we're currently setting up in the spec, I would think that webfinger would want to say something like this:

(1) Extract the host from the identifier (e.g., joe@example.com -> example.com)
(2) Find the host-meta for that host (i.e., host-meta for example.com)

(3) Make sure that the Subject in the host-meta _matches_ http://example.com (we can't say "... _is_ http://example.com", because such an XRD would be about the root resource on example.com, which is not what webfinger is looking for).

(4) Check that the signature on the XRD is generated by someone authoritative for the XRD's Subject.
(5) ....

That, however, is not secure. Let's say I somehow ended up with an XRD that looks like this:

<XRD>
  <Subject match="beginswith">http://example.co</Subject>
  <Link><Rel>webfinger</Rel><URITemplate>...</URITemplate></Link>

  <Signature>...</Signature>
</XRD>

(maybe a man-in-the middle injected it as I was fetching http://example.com/.well-known/host-meta, or I got the wrong host-meta from http://hostmetas-r-us.com/?domain=example.com - whatever). The Subject matches http://example.com (according the current definition in the XRD spec). So now if the XRD is signed by example.co the signature checks out, and we just got hacked by the Colombian mafia.

I'm not saying that there is no way that webfinger could possibly define a secure profile, but as you can see, the "obvious" way to define a trust profile for webfinger resulted in something bad because the "beginswith" directive interacts strangely with the trust assumptions.

Ok, I think I'm all griped out :-).

So, unless some of my assumptions here are wrong, I would like us to reconsider this beginswith business.

Since we don't have URIs that represent hosts, I think our only option is to relax the requirement that a Subject has to be a URI (something I believe we're already on the way toward if we want allow "subject sets").

My proposal: have two subject types. One for hosts, one for URIs.

<Subject type="uri">acct:joe@example.com</Subject> // describes Joe's meta data

<Subject type="uri">http://example.com</Subject> // describes meta data of root http resource in example.com
<Subject type="uri">http://example.com/</Subject> // describes meta data of root http resource in example.com
<Subject type="host">example.com</Subject> // describes meta-data of host example.com

What do you guys think?

Dirk.

xri message