RE: [xri] CID changes in wd11

What I am hearing from Steve and Les essentially boils down to this: a CanonicalID value should not be allowed to be polyarchical, because if it is polyarchical, it might need to change. If a CanonicalID value MUST be hierachical (which in had to be in order to be verified in WD11 ED02 -- the draft I believe Les is proposing we revert CanonicalID to), then indeed verification is indeed simpler, as a CanonicalID MUST be issued by the same authority authoritative for the XRD in which it appears. And if an authority uses a persistent hierachical identifier as a CanonicalID, it never needs to change, because a hierachical identifier is always under the control of the authority that issues it, whereas a polyarchical identifier is not.

Lastly it follows that if a CanonicalID value MUST be hierarchical (which was the proposed definition of the GlobalID element), then the primary rationale for GlobalID goes away (there may still another secondary rationale for it, but that’s another subject).

However if we go this direction, it leaves us with a different problem: how can a real-world resource (such as a person) prove that they are the same resource represented by two different XRDs with two different CanonicalIDs issued by two different parent authorities?

We’d need to move the burden of this proof to our polyarchical synonyms, i.e., Refs and Backrefs. In this approach, XRD #1 from parent authority #1 could assert that it represented the same resource as XRD #2 from parent authority #2 by including a Ref element whose value was an identifier that resolved to XRD #2 (preferably the CanonicalID for XRD #2, but any absolute identifier for XRD #2 would work).

To verify that this synonym assertion was true, an XRI resolver would need to do the same thing proposed in ED03 section 12.2, i.e., confirm that a corresponding Backref element exists in XRD #2 pointing back to an identifier for XRD #2 (again, preferably the CanonicalID for XRD #1). I would argue that we should also allow a Ref element to be used for verification, i.e., if XRD #1 contains a Ref element pointing to XRD #2, and XRD #2 contains a Ref element pointing back to XRD #1, the synonyms are verified in *both* directions.

Since this “Ref verification” only works polyarchically on Ref elements, it is a separate process that “CanonicalID verification” which only works hierarchically on CanonicalID elements. This means we’d need to add another XRI resolution parameter for requesting Ref verification (I’d propose to call it “ref” but we already have the “refs” parameter which is used to control whether refs are followed in service endpoint selection, so another name would be better).

The key thing we lose by going this direction is the ability for the resource represented by an XRD to assert a polyarchical identifier as its canonical identifier. Let me give an example.

If I want to go into twelve different businesses today to establish an account and I want to prove to each of them that I have the same identity (for example, so they all give me good credit), I can show all twelve of them the same credential with the same identifier (say it’s my WA state driver’s license #). If they believe this credential (which they can verify), they can record this identifier in their databases and they don’t need to assign me their own local identifier (they may still want to do that, but they don’t HAVE to do that). This is the CanonicalID-can-be-polyarchical model proposed in ED03.

By contrast, if none of the twelve businesses will accept my my WA state driver’s license # (or another external identifier) as their identifier for me, they all MUST assign me their own local identifiers. To prove I am the same person, they can all put in their records that I have a WA state driver’s license #, but to do this they MUST store at least two identifiers: the one they assigned me, and my WA state driver’s license #. This is the CanonicalID-must-be-hierarchical model that I believe Les and Steve are proposing.

Either model will work. They have contrasting advantages/disadvantages:

CANONICALID-CAN-BE-POLYARCHICAL

Advantages:

- XRI authority can assert the same identifier everywhere if it wants

- Separate Ref verification process is not needed to prove cross-domain identity

- Consuming applications do not need to store more than one identifier to support cross-domain identification

Disadvantages:

- CanonicalID can change

- Verification of polyarchical CanonicalID value involves an extra resolution step

- GlobalID is needed for verification of polyarchical CanonicalIDs

CANONICALID-MUST-BE-HIERACHICAL

Advantages:

- CanonicalID never needs to change

- Verification of polyarchical CanonicalID values is more efficient

- GlobalID is not needed for verification

Disadvantages:

- XRI authority cannot assert the same identifier everywhere if it wants

- Separate Ref verification process is needed to prove cross-domain identity

- Consuming applications need to store more than one identifier to support cross-domain identification

Thoughts?

=Drummond

From: Steven Churchill [mailto:steven.churchill@xdi.org]
Sent: Tuesday, August 14, 2007 10:46 AM
To: 'Chasen, Les'; 'Drummond Reed'; xri@lists.oasis-open.org
Cc: 'Andy Dale'
Subject: RE: [xri] CID changes in wd11

Les is taking the correct position in this debate.

XRI Resolution has long supported an important identity model where an XRI authority’s identity can be distinguished by its CanonicalID. For example, if resolving an XRI produces a (verifiable) CanonicalID, then, as an XRI resolution client, I can treat that XRI as a synonym to a unique XRI authority—a unique record in the global database that Les describes below. I like to think of this database as a hierarchical graph, but these are really two legitimate ways of talking about the same identity model. Each record in Les’ database is just a node in my graph. In both cases, these records/nodes can be thought of as “XRI authorities”, and in both cases the absolute identity of this XRI authority—that characteristic which distinguishes it from all other XRI authorities—is its CanonicalID.

Given this basic identity model, any resolution that produces a different verifiable CanonicalID simply addresses a different authority. This is by definition of the model. (It is the same way that in a relational model, a different PK must address a different record.) Say I resolve a given XRI with a given set of input parameters and it produces a verifiable CID. Now say I resolve it a minute later with the same set of input parameters and it produces another verifiable CID. This scenario can and does occur—especially in the face of Ref processing and people provisioning their SEPs. For example, I can (right now) simply add an SEP to @ootao*steve’s authority, and then the same resolution call a minute later will return a different verifiable CID. So, indeed, a client can get back a different XRI authority when making two consecutive (equivalent) resolution calls. But this is all fine and good because it is the way that we designed Ref processing (a long, long time ago.) Given this behavior, the (CanonicalID) identity model is still sound, because, by definition, the second resolution call simply returns a different XRI authority.

As for the CanonicalID being optional, <CanonicalID> is simply an element in the XML metadata that one XRI authority uses to describe another. The first authority can choose to use it or not. If it does not use it, then a Resolution client obviously cannot use the element to distinguish authorities. No harm no foul. As for immutability: if resolving two XRIs produce to different verifiable CanonicalIDs then, by definition of the model, they address different authorities—two different records in Les’ global database.

I really respect and appreciate Les’ effort to protect these fundamentals. The introduction of GlobalID is a giant step in the wrong direction. It is an attempt to define a more complicated identity model in the interest of solving a newly introduced use case. If that use case is indeed important (which I doubt) then it should be solved within the existing model—not by trying to define a new one.

~ Steve

PS: For the typical disclaimer, I need to point out that XRI resolution supports many identity models, and resolution clients may not care at all about using a CanonicalID in the fashion described above.

From: Chasen, Les [mailto:les.chasen@neustar.biz]
Sent: Tuesday, August 14, 2007 12:16 AM
To: Drummond Reed; xri@lists.oasis-open.org
Subject: RE: [xri] CID changes in wd11

Hi Drummond,

Welcome back hope you had a nice vacation.

Yes CID has always been optional and we cannot do anymore than recommend that it be persistent. We have also never actually spelled out that it cannot change. However, the implication has always been there that it is immutable. That is until the introduction of globalId and the specification, for the first time, stating that CID is editable. I think this is a huge architectural mistake given where we are in the life of XRI. We have a base of applications out there, at our insistence, using CID as a persistent key. It is too late to change that now.

I therefore propose that we take CID back to where it was in WD10 and add extra text to codify that it should be left immutable. Personally I would make it a MUST requirement but I recognize for the same reason that it is an optional field and persistence is a recommendation we cannot really require that it MUST be immutable. So a SHOULD be immutable is fine.

contact: =les

voice: =les/(+phone)

chat: =les/skype/chat

pibb me =les/+pibb

From: Drummond Reed [mailto:drummond.reed@cordance.net]
Sent: Tuesday, August 14, 2007 1:37 AM
To: Chasen, Les; xri@lists.oasis-open.org
Subject: RE: [xri] CID changes in wd11

Les,

I have just returned from vacation and am still catching up on email and the minutes of the meetings while I was gone. But regarding your point about CIDs, here’s some initial thoughts:

1) First, CanonicalID, like all synonym elements, has always been optional. There’s no requirement than an XRD MUST assert an CanonicalID. It’s RECOMMENDED, but for obvious reasons it’s not REQUIRED at the spec level because some users of XRDS architecture don’t need CanonicalIDs at all.

2) Second, there is no requirement that a CanonicalID value be persistent. Again, it’s RECOMMENDED, but not REQUIRED, as some authorities don’t either want or need persistent identifiers.

So my first point is that as much as it would be nice for all XRDs to: a) have a CanonicalID value, and b) make it a persistent identifier that never changes, we have never (in WD10 or any earlier draft) required for either to be true. An authority has always been able to assert any CanonicalID value they want, and change it anytime they want. The only change from WD10 to WD11 is that the cardinality of CanonicalID went from zero-or-more to zero-or-one.

Secondly, the main purpose of XRI synonym architecture is to model the real world in which a resource may have any number of identifiers assigned to it by any number of authorities. Each of these identifiers may be either reassignable or persistent. WD11 is the first draft in which we have, in section 11 and specifically in Table 23 (page 60 of the PDF), fully captured the semantics necessary for an authority to assert the set of identifiers it uses to identify a resource in such a manner that client applications have all the metadata they need to understand how to consume those identifiers to maintain a reference to the resource.

Your specific concern is that client applications be able to know which identifier they can use as a persistent global foreign key for a resource. Table 23 explains that of the five synonym elements available, only three fit the requirements of a global foreign key: CanonicalID, GlobalID, and Ref. LocalID and Backref do not meet the requirements because:

* LocalID is relative and not absolute.

* Backref is an assertion that another authority is referencing the synonyms in the current XRD to identify the resource.

However the other three – CanonicalID, GlobalID, and Ref -- *all* can meet the requirements of global foreign keys for a resource. This begs the question: why have three XRD synonym elements that can all serve as global foreign keys?

Table 23 provides the answer. GlobalID and Ref cleanly separate global keys for a resource into two categories for trust purposes:

1) Category #1 – GlobalIDs – are hierachical identifiers that are assigned by the authority for the XRD and thus can be verified hierachically.

2) Category #2 – Refs – are polyarchical identifiers that are assigned by authorities OTHER than the authority for the XRD and which thus must be verified polyarchically, i.e., by confirming the corresponding Backref.

Given that between these two categories, we’ve covered 100% of the use cases (to the best of my knowledge), what then is the purpose of the CanonicalID element? Why do we even need it?

The answer is that, because an authority can assert any number of GlobalIDs or Refs for a resource (the use cases for asserting multiple GlobalIDs are pretty weak but the use cases for asserting multiple Refs can be very strong), the additional value of the CanonicalID element is that it gives XRD authorities a way to assert which ONE of these multiple global foreign keys the authority RECOMMENDS client applications use to maintain a reference to the resource.

So the net net is that the value(s) of the GlobalID (zero-or-more), Ref (zero-or-more), and the CanonicalID (zero-or-one) elements are all absolute identifiers that can serve as global foreign keys for a resource. All the element tag tells you about these identifiers is:

* Was it assigned by the authority for the XRD (GlobalID)?

* Was it NOT assigned by the authority for the XRD (Ref)?

* Of all the options, is it the recommended global foreign key for the resource (CanonicalID)?

This reveals the precise reason that the value of a CanonicalID element in an XRD could change over time: the parent authority learns that the recommended global foreign key for a resource is different than the one the parent authority has heretofore been recommending. For example, a parent authority could initially publish:

<XRDS>

<XRD>

<Query>*example</Query>

<Ref>http://example.com/example/resource#1234</Ref>

<Ref>https://example.com/example/resource#1234</Ref>

<CanonicalID>https://example.com/example/resource#1234</CanonicalID>

</XRD>

</XRDS>

But the resource identified by these three synonyms may lose control over the domain name “example.com”. In this case, even though https://example.com/example/resource#1234 is a persistent identifier (see below), the authority may decide that at that point it is better to recommend a different persistent identifier as the CanonicalID. Thus the XRD could change to:

<XRDS>

<XRD>

<Query>*example</Query>

<Ref>http://example.com/example/resource#1234</Ref>

<Ref>https://example.com/example/resource#1234</Ref>

</XRD>

</XRDS>

Note that the identifier “https://example.com/example/resource#1234” did NOT go away as a persistent global foreign key for the resource. It’s still there as a Ref, just as it was in the first example. The only change is that the CanonicalID now points to a different global foreign key as the preferred one.

Again note that NONE of the XRI synonym elements has the semantics that the identifier value MUST be persistent (not in WD11, WD10, or any earlier draft). The way for a consuming application to tell whether the identifier is asserted as persistent is to check for either XRI persistence semantics (! syntax for i-numbers) or URI persistence semantics (urn: or other persistent URI schemes).

***********

I hope this helps. Clearly this issue is deep enough that it can benefit more from direct phone or f2f discussion than from email. I nominate it for the agenda for this week’s TC call, but in the meantime feel free to call me if you want to discuss further.

=Drummond

From: Chasen, Les [mailto:les.chasen@neustar.biz]
Sent: Monday, August 13, 2007 3:16 PM
To: xri@lists.oasis-open.org
Subject: [xri] CID changes in wd11

Hi all –

After reviewing the latest wd11 I have one major concern. This version allows a CID to be changed after it is already set. I believe that this is a big mistake. The CID is the persistent identifier for the queried XRD. We need to ensure that once an XRD has a CID that that CID identifies that XRD forever.

I have always thought of the CID as a primary key to the global database we have created with XRI resolution. Client applications have been and are being written that depend on the value of this primary key for the mapping of an identity described by an XRDS to their internal account structure. If we allow this primary key to be changed we have caused a major data integrity problem.

I propose that the definition of CID not only revert back to the WD10 definition but we also more strongly codify that a CID once set should never be changed.

Thanks

Les

contact: =les

voice: =les/(+phone)

chat: =les/skype/chat

pibb me =les/+pibb

xri message