xri message

Subject: RE: [xri] Issue 1 Subthread - freeing colon for producer-specific algorithms
From: Drummond Reed <drummond.reed@cordance.net>
To: xri@lists.oasis-open.org
Date: Thu, 8 Jul 2004 19:39:33 -0700 (PDT)
[Note: I changed the subject of this thread to
introduce a subthread about an important aspect of
this decision. I just wish email would better support
this type of subthreading.]

Dave has said he believes RFC2396bisv6 does not allow
use of a reserved character within a segment if there
is no defined meaning for that character within a
segment. However I have read section 2.2 on reserved
characters closely and I believe it explicitly allows
the use of reserved characters that are not defined as
delimiters with a segment.

The full text of section 2.2 is quoted at the end of
this message, but the specific sentence I would
highlight is: "Thus, characters in the reserved set
are protected from normalization and are therefore
safe to be used by scheme-specific and
producer-specific algorithms for delimiting data
subcomponents within a URI."

What this means is that if colon is NOT reserved by
the XRI spec as a "scheme-specific delimiter", but
only as a subsegment decorator (to use Gabe's term for
a character that only has a defined meaning when used
in first position after another delimiter), then it
frees colon to be used elsewhere within a subsegment
as determined by "producer-specific algorithms".

I believe this is a very significant benefit of not
defining colon as a delimiter. With the large number
of URI reserved chars that we have defined as
scheme-specific delimiters in XRI syntax, it leaves
very few chars to be used as delimiters by
producer-specific algorithms. I believe colon is a
particularly attractive character for this purpose
(second only to dot.) I already posted (about a month
ago) an example of one potential producer-specific
algorithm (in this case for XRI authority subsegments)
in which it would be attractive to use colons. I can
only imagine that there are many more.

The other advantage is that this preserves
backwards-compatability with XRI 1.0 XRIs because the
colons that appear in these as scheme-specific
delimiters under XRI 1.0 syntax would still be legal
as producer-specific delimiters under XRI 1.1 - the
only difference is how colons in the XRI authority
segment would be interpreted by XRI 1.1 resolvers.

=Drummond

The references from 

http://gbiv.com/protocols/uri/rev-2002/rfc2396bis.html#reserved

is quoted below:

2.2 Reserved Characters

URIs include components and subcomponents that are
delimited by characters in the "reserved" set. These
characters are called "reserved" because they may (or
may not) be defined as delimiters by the generic
syntax, by each scheme-specific syntax, or by the
implementation-specific syntax of a URI's
dereferencing algorithm. If data for a URI component
would conflict with a reserved character's purpose as
a delimiter, then the conflicting data must be
percent-encoded before forming the URI. 

   reserved    = gen-delims / sub-delims

   gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" /
"@"

   sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
               / "*" / "+" / "," / ";" / "="
The purpose of reserved characters is to provide a set
of delimiting characters that are distinguishable from
other data within a URI. URIs that differ in the
replacement of a reserved character with its
corresponding percent-encoded octet are not
equivalent. Percent-encoding a reserved character, or
decoding a percent-encoded octet that corresponds to a
reserved character, will change how the URI is
interpreted by most applications. Thus, characters in
the reserved set are protected from normalization and
are therefore safe to be used by scheme-specific and
producer-specific algorithms for delimiting data
subcomponents within a URI. 

A subset of the reserved characters (gen-delims) are
used as delimiters of the generic URI components
described in Section 3. A component's ABNF syntax rule
will not use the reserved or gen-delims rule names
directly; instead, each syntax rule lists the
characters allowed within that component (i.e., not
delimiting it) and any of those characters that are
also in the reserved set are "reserved" for use as
subcomponent delimiters within the component. Only the
most common subcomponents are defined by this
specification; other subcomponents may be defined by a
URI scheme's specification, or by the
implementation-specific syntax of a URI's
dereferencing algorithm, provided that such
subcomponents are delimited by characters in the
reserved set allowed within that component. 

URI producing applications should percent-encode data
octets that correspond to characters in the reserved
set. However, if a reserved character is found in a
URI component and no delimiting role is known for that
character, then it should be interpreted as
representing the data octet corresponding to that
character's encoding in US-ASCII. 



--- Dave McAlpin <Dave.McAlpin@epok.net> wrote:
&gt; Are you suggesting that the : between 12 and 34
&gt; would be considered a
&gt; regular character, not a delimiter? If so, I
don't
&gt; think that's legal
&gt; per 2396bis.
&gt; 
&gt; Dave
&gt; 
&gt; &gt; -----Original Message-----
&gt; &gt; From: Fen Labalme [mailto:fen@idcommons.org]
&gt; &gt; Sent: Thursday, July 08, 2004 11:03 AM
&gt; &gt; To: Loren West
&gt; &gt; Cc: xri@lists.oasis-open.org
&gt; &gt; Subject: Re: [xri] Issue 1: Clarifying *
Semantics
&gt; &gt; 
&gt; &gt; Loren -
&gt; &gt; 
&gt; &gt; Note that :12:34 would still be a legal
persistent
&gt; identifier, it just
&gt; &gt; would
&gt; &gt; not imply a separation (or delegation)
between two
&gt; parts.  In other
&gt; words,
&gt; &gt; it
&gt; &gt; is similar to the identifier :12.34 (using
the new
&gt; semantics for dot
&gt; as a
&gt; &gt; regular character).
&gt; &gt; 
&gt; &gt; In my strongly held opinion, if we are going
to
&gt; make any
&gt; simplifications,
&gt; &gt; they
&gt; &gt; should be aimed at making the semantics
easier to
&gt; understand and the
&gt; human
&gt; &gt; friendly identifiers simpler and easier to
read
&gt; and (humanly) parse.
&gt; I
&gt; &gt; believe that is what this proposed
simplification
&gt; does.  If it does so
&gt; at
&gt; &gt; a
&gt; &gt; slight cost to the human readability of
non-human
&gt; (machine) friendly
&gt; &gt; identifiers, that's a good decision.
&gt; &gt; 
&gt; &gt; Fen
&gt; &gt; 
&gt; &gt; 
&gt; &gt; Loren West wrote:
&gt; &gt; &gt; I understand how you see a single
separator as a
&gt; simplification,
&gt; &gt; &gt; and hope you can understand how I see
":" as a
&gt; simplification
&gt; &gt; &gt; over "*:".  They're both "simpler", but
one
&gt; doesn't require
&gt; &gt; &gt; a change to the specification.
&gt; &gt; 
&gt; &gt; 
&gt; &gt; To unsubscribe from this mailing list (and
be
&gt; removed from the roster
&gt; of
&gt; &gt; the OASIS TC), go to http://www.oasis-
&gt; &gt;
&gt;
open.org/apps/org/workgroup/xri/members/leave_workgroup.php.
&gt; 
&gt; 
&gt; To unsubscribe from this mailing list (and be
&gt; removed from the roster of the OASIS TC), go to
&gt;
http://www.oasis-open.org/apps/org/workgroup/xri/members/leave_workgroup.php.
&gt;
References:
- RE: [xri] Issue 1: Clarifying * Semantics
  - From: "Dave McAlpin" <Dave.McAlpin@epok.net>