OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xri message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: One vs. two alt hierarchy characters


I'm starting a new thread now that we're on to the "second half" of the
alternative hierarchy character issue. (Note that Dave calls this the
"secondary hierarchy character" issue, with slash being the "primary
hierarchy character", but since I believe such distinctions may depend on
your POV, I'm just going to call it the alternative hierarchy character, or
alt-hier char.)

Anyway, the second half of the alt-hier char issue is whether there should
be just one alt-hier char, or two. The XRI 1.0 spec has two - dot and colon.
Loren, Dave, Gabe, and Mike have all already expressed a preference to keep
two, which if star is adopted in place of dot, would mean star and colon.

Let me then state the case for having only one. In this design, star would
be the alt-hier char, and in XRI syntax colon would only have a special
meaning when used as the first character of a subsegment, in which case it
would designate this subsegment as persistent. (The BNF for this is at
http://xrixdi.idcommons.net/moin.cgi/ProposedXriSyntaxRevision#head-1e821383
5f5624d207ee790744620ea15cf0022f). 

The rationale for this approach falls into three buckets:

RATIONALE #1

The first reason is that in XRI 1.0, both dot and colon were overloaded with
two meanings: first, they were alt-hier chars, and second, they had
persistence semantics (dot = reassignable sub-segment and colon = persistent
sub-segment).

This resulted in two funky situations: 

1) Sometimes dot is optional, i.e., since the default subsegment type is
reassignable, you only needed dots for subsequent reassignable subsegments.
Example: in "@foo/bar.baz", "foo" and "bar" are reassignable, but they
didn't need dots because reassignable is the default. So "@foo/bar.baz" is
actually equivalent to "@.foo/bar.baz" and "@.foo/.bar.baz". This required
covering this in the equivalence section of the spec, and thus potentially
in every XRI equivalence engine every built.

2) In XRI 1.0 resolution, sometimes colon means delegation, sometimes it
just means persistence. Example: in "@:foo:bar/:baz", the first colon does
NOT mean delegation, it only means the subsegment "foo" is persistent, but
the second colon means delegation to the identifier authority identified as
"bar". 

By designating one XRI syntax char as the alt-hier char (star) and one as
the leading char for persistent subsegments (colon), we eliminate both funky
situations. Star would always mean alternate hierarchy, and colon would
always mean it denoted a persistent subsegment, and these rules would be
uniform everywhere.

RATIONALE #2

The second rationale for only having one alt-hier char is that simplied
rules mean simplified parsers, because they have only two hierarchy chars to
parse - slash and star - and only one way of denoting persistent segments
(they all start with colon), with no overloading. Gabe has said he believes
this is a weak reason, but Fen, who has also written an XRI resolver, thinks
it is a relatively strong one. I'll let Fen speak further to this. 

RATIONALE #3

The third reason for not keeping colon as an alt-hier char, and possibly the
strongest one in my view because it speaks to actual use cases vs. aesthetic
or technical preferences, is the same as one reason for moving away from dot
as an alt-hier char, namely that it frees up colon to be used as a normal
character (after the first character) within XRI subsegments.

I have come across an increasing number of cases where colons are already
used in some fashion in existing identifiers - most specifically IPv6
addresses (which are reassignable). Just as not using dot as an alt-hier
char allows us to permit strings that already use that char (e.g. DNS names)
as XRI subsegments, not using colons as an alt-hier char would enable us to
permit strings that already include colons as XRI subsegments.

It would also allow us to design new strings that use colons as logical
delimiters for other human readability and usability purposes. A specific
use case is a design challenge facing those of us working on XRI registry
services. By allowing dots as logical separators for personal or
organizational e-names (reassignable XRIs registered under the = or @ GCS
chars), we open up these namespaces to many more expressive and mnemonically
distinctive compound names, e.g., "=Bob.Smith", "=Bob.H.Smith", and
"=Bob.Henry.Smith". However for certain common personal names, these three
combinations may all be registered. For example, all it takes is for 3
gentlemen named "Bob Henry Smith" to register the 3 variants above and
suddenly the 4th "Bob Henry Smith" is stuck.

In this case it would be helpful to have a convention for how registrants
can turn their desired e-name (be it either simple or compound) into
something globally unique e-name by adding a "postfix" - another name
segment offset by a logical delimiter DIFFERENT than dot because the
namespace-specific meaning of this logical delimiter would be, "the
following segment is added only for global uniqueness".

If star becomes the alt-hier char, our only other char choices that are not
already declared in XRI 1.0 are single quote, comma, semicolon, ampersand,
dash and colon. Single quote, comma, and semicolon are all undesirable
because they are too hard to distinguish (either at all, or from other
similar chars.) Ampersand is too hard to write and say. Dash is undesirable
because it is already a natural language character used in some Western
names, e.g., "Mary Smith-Johnson" or "Bob Smith-Johnson". That leaves colon.
Given the use of dots as the other logical separator in compound names,
colons are in fact a natural choice. For example, using the "=Bob.Smith"
case, you could have:

=Bob.Smith:England
=Bob.Smith:Hawaii
=Bob.Smith:Red
=Bob.Smith:Jupiter
=Bob.Smith:Rocket

All of these would tell the parties reading the e-name that the registrant's
preferred name is "Bob Smith", and that the segment after the colon is added
only to achieve global uniqueness.

At Cordance we have run some small tests of this syntax with end-users and
received very favorable results. They understand the distinction clearly and
feel the "meaning" of colon is appropriate here.

*****

One last point: I've seen a number of mentions that with e-numbers, the
separation of star (for alt-hierarchy) and colon (for persistence) results
in longer identifiers that don't "look" as nice because they use double
chars ("*:" instead of just ":") to delimit persistent segments.

However this applies only to the global authority segment where the XRI spec
defines resolution rules. As others have pointed out, after the first slash,
the meaning of colon can be defined by the local authority, and any
authority is free to define colon by itself to mean BOTH persistence (as
required by the XRI spec) AND delegation to another authority in their own
namespace (i.e., give it the same meaning : has in XRI 1.0).

That means the only real effect of separating the alt-hier char and the
persistent-segment char is in the global XRI authority segment. Since I
believe more than two levels of delegation will be rare in this space (just
as more than two levels of delegation in DNS names is relatively rare, i.e.,
www.example.com), and since e-numbers are not intended for human readability
anyway, I believe this tradeoff is justified and actually makes delegation
within global e-numbers easier to read and understand by developers.

=Drummond 





[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]