From: Dave McAlpin
September 05, 2003 2:43
To: 'Wachob, Gabe';
Subject: RE: [xri] Draft -07
feedback from another Visa person (responses cont'd)
From: Wachob, Gabe
August 28, 2003 4:29
Subject: RE: [xri] Draft -07 feedback
from another Visa person (responses cont'd)
Outlook continues to drive me up the wall - this should be the end
of my initial responses to the comments from Terence Spielman.
Responses continue in this email:
550-551 I didn't get the
meaning of persistent or re-assignable identifiers
out of this description.
Yes, this needs beefing up, as the inline comment mentions.
188.8.131.52 Is it
allowable to escape unicode characters? For example, if one
wanted to express an international XRI in IA5 (ASCII)? In this
case, the %AB format described in 184.108.40.206 is insufficient to support
the expanded character width.
I'll defer this question to our resident unicode & escaping guru, Dave McAlpin.
I think step 5 in 220.127.116.11
addresses this when we specify “one escaped triplet for each octet in the
UTF-8 encoding of the disallowed character”. Did you have something else
694 Does the lack od
idempotency affect semantics or syntax? I would
hope it would only be syntax.
Again, this gets deferred to Dave McAlpin.
It affects semantics. If
an XRI is inadvertently escaped twice and unescaped once, for example, the
result might be semantically different than the original XRI (this depends, of
course, on the original XRI). It’s the essentially the same problem
mentioned in section 2.4.2 of 2396, which says “implementers should be
careful not to escape or unescaped the same string more than once, since
unescaping an already unescaped string might lead to misinterpreting a percent
data character as another escaped character, or vice versa in the case of
escaping an already escaped string.”
18.104.22.168 How about
this as an alternative?
Escape all current escapes (%s).
Escape all syntactic elements with cross references
Escape all parens.
Dave McAlpin has thought through the escaping issues quite a bit.
We are trying to track the (as-yet-not-finalized) RFC 2396bis and IRI
(internationalized resource identifiers) specs, and this adds some complexity
with the benefit of aligning with emerging best practices and architectures.
I'd leave it to Dave to explain exactly how he ended up with the escaping
procedure we have.
I don’t understand
the second step. Can you give an example of escaping “all syntactic
elements with cross references”?
878-879 Why are XRI
authorities compared in a case-insensitive manner?
Thats a good question. Not sure, honestly. Dave? Drummond?
Mostly, I think, to make
the comparison rules for XRIAuthority consistent with those for URIAuthority
(as specified by section 6 of 2396). It may be confusing, though, in that it
only applies to characters in the ALPHA production. That’s fine for
URIAuthorities because they only allow characters in the ALPHA production, but
the XRIAuthority can contain international characters. Is your objection is
that it’s odd that ‘e’ and ‘E’ are equivalent,
but ‘e’ with an accent mark is not equivalent to ‘E’
with the same mark? If it is, then I agree. Is there a good way to specify
case-insensitivity for all Unicode characters?
Section 3 (I
still need to do some reading)
Has there been any
work on DECODING XRIs? It's not immediately
clear from the ABNF that decoding is unambiguous.
I believe the decoding is mechanical and unambigous. Dave?
In general, the escaping/unescaping mirrors IRI work, along with
one extra step for escaping () (parentheses). We definitely wanted to make sure
the transformations were reversible.
I think the question is
actually whether the BNF is unambiguous, i.e. does an XRI exist that could be
interpreted in more than one way by the BNF? I’ve done some work in this
area, but I certainly wouldn’t consider the BNF “proven” at
In addition, aside
from unresolvable references, is it possible
to canonicalize XRIs? This is a highly desireable feature
(for equivalence, at a minimum).
We talked quite a bit about this. The decision was made to be
silent on canonicalization because equivalence is actually
unambigious given the rules stated. Now, that doesn't mean that its at all
I do think giving names to the escaped vs. unescpaed
forms of XRI, at least, would be useful. Canonicalization would
then just be transforming an identifier into one of those forms. We didn't want
to mandate a single canonical form because different environments would need
XRIs in different levels of escaping and it would be unfortunate to require a
specific canonicalization form that would require otherwise-unneeded
Again, Dave McAlpin probably has better input on this.
representation might be useful for comparison, but it would involve a formal
definition of things like “minimally escaped”, which would be
fairly difficult to nail down. It would also depend on the existence of a
canonical form for URIs used as cross-references. In other words, an XRI
wouldn’t have a canonical form if it contained cross-references that
didn’t define a canonical form.
Note that equivalence
rules are generally problematic. The IRI proposal, for example, completely
dodges the question of equivalence when it says, “There is no general
rule or procedure to decide whether two arbitrary IRIs are equivalent or
not… Each specification or application that uses IRIs has to decide on
the appropriate criterion for IRI equivalence.” 2396bis notes that even
terms like “different” and “equivalent” are fuzzy in
the general spec and ultimately application dependent.
An XRI is not a URI
(because of the expanded syntax). But
is an URI an XRI? (no, because of different scheme (xri)).
I think it would be nice to all URIs be valid XRIs.
Well, by definition, all URIs can't be XRIs because URI's have
different schemes - XRI's must all have the "xri:" scheme. I think
the goal of having all URIs easily and trivially transformable into XRIs
(ie remove the scheme and insert xri:) is laudable, though its unclear that in
many cases this makes a lot of sense. This is because the XRIs are structured
and resolution of the XRIs (at the very least) gives special meaning to the
firs segment (the authority) -- not all URIs are hierarchical
or treat the first "segment" specially. Examples include
mailto:, uuid:, cid: etc
Note also that it’s
trivial to convert any legal URI into an XRI by simply enclosing it in a
cross-reference, e.g. mailto:firstname.lastname@example.org
-> xri:(mailto:email@example.com), though
I don’t know that that’s generally useful.
Hope that kicks off the conversation and gives us editors
some good pointers on where we need to focus on cleaning up of language.