xri message

Subject: RE: [xri] URI/IRI/XRI - what should extend what?
From: "Drummond Reed" <drummond.reed@cordance.net>
To: "'Schleiff, Marty'" <marty.schleiff@boeing.com>,"'Gabe Wachob'" <gabe.wachob@amsoft.net>, <xri@lists.oasis-open.org>
Date: Tue, 28 Nov 2006 22:40:41 -0800
Marty,

This is a very good discussion about a very important element of XRI
architecture -- which is why I think the list is the perfect place for it
(I've gotten so used to exposing my ignorance on the list that I pretty much
don't think about it any more ;-)

After considering this carefully, I too come down on the side of sticking
with the current XRI-extends-IRI-extends-URI architecture. My #1 reason is
that I believe Boeing's requirements and preferences can be met by following
local policies to either: a) support only XRIs in URI-normal form (thus
restricting them to an ASCII-only character set even for representation of
Unicode characters), or b) support on XRIs that use valid RFC 3986 URI
characters (and not permitting characters outside the ASCII character set
supported by RFC 3986, even if they are percent-encoded.)

IMHO either would be a defensible local policy for Boeing or anyone else
with similar needs.

My other reasons:

#2: As Gabe and Wil point out, the transformation from XRI-normal form to
URI-normal form involves several other rules for encoding of XRI syntax
(specifically XRI cross-references) that is necessary to prevent the XRIs
from being misinterpreted. This means you still have a very clear separation
between XRI-normal form and URI-normal form. If you then want to be able to
internationalize an XRI, what do you do? Would you be able to go directly
from XRI-normal form to IRI-normal form? And from URI-normal form to
IRI-normal form? In that case we'd need to define two IRI-normal forms:
XRI-IRI-normal-form and URI-IRI-normal form. Ug.

The current architecture is a direct 3-step ladder: down the ladder from
XRI-normal form to IRI-normal form to URI-normal form and back up the ladder
via the same three steps.

#3: In the current XRI/IRI/URI transformation architecture, the new XRI 2.1
canonical form is based on the "highest" normal form: XRI-normal form. This
is in keeping with the guidance (in both IRI and XRI 2.0) to apply the
fewest transformations possible, i.e., keep identifier closest to its native
form. If XRI was based directly on URI, and IRI was a separate form (or two
separate forms per the above), which would be the basis for the new
canonical form? If it was XRI-normal form, then any XRI that used non-ASCII
characters could not be canonical. That's not good. URI-normal form would be
a poor choice because it would force all XRIs to be transformed into
URI-normal form just to be canonical. And IRI-normal form(s) would not be a
good choice because it would force all XRIs into IRI-normal form just to be
canonical.

#4: Internationalization has become a very important feature of Web
architecture and XML architecture, and support is becoming much more
widespread. I believe we stand on much stronger ground basing XRI syntax on
the already-internationalized IRI syntax than on the non-internationalized
URI syntax.

=Drummond 

-----Original Message-----
From: Schleiff, Marty [mailto:marty.schleiff@boeing.com] 
Sent: Tuesday, November 28, 2006 7:03 PM
To: Gabe Wachob; xri@lists.oasis-open.org
Subject: RE: [xri] URI/IRI/XRI - what should extend what?

Hi Gabe (& All),

I'm really hesitant to continue broadcasting my own stupidity so widely,
so I'm tempted to leave the distribution list off the ongoing
discussion. However, the cost of my participation on the committee is
that sometimes I'll bother you with my misconceptions and/or different
viewpoints. And until you convince me that something is a misconception,
of course I consider it just a differe viewpoint. Thanks for putting up
with me so far.

At this time we're still trying to figure out what XRI normal form is
--at least Drummond and I are still discussing what it should be. Gabe,
to help me better understand, can you provide an an example of a
normalized XRI that would not be a legal URI?

I don't think we'd have to define how to do IRI on top of XRI (although
some examples illustrating the IRI/XRI mappings/conversions would be
helpful), because the IRI spec already defines how to do IRI on top of
URI.

I'd like for people to be able to be interested in XRI without ever
hearing mention of IRI, just like when I read about RFC2141 URN, or
RFC2254 LDAP, or address specification in RFC2822, maybe LID, or other
identifier efforts. When I want to figure out how to handle
international characters, then I can look to the IRI spec.

LDAP directories support unicode, but at Boeing today we pretty much
stick with ASCII in searchable fields (like names and identifiers) for a
couple reasons: The HR systems that provide much of our directory data
may not deal with non-ASCII, if we change people's names and identifiers
to support foreign character sets most of the users would no longer know
how to enter a search string containing unlauts and other stuff, and
some unknown portion of the 1000+ production applications that rely on
the directory would likely croak. There's probably more reasons if I
think about it a bit longer. I'm not saying that Boeing directories at
some point in time won't support IRI; I'm just saying that to do so will
raise some costly, difficult, and time consuming issues. I don't think
we'll support IRI (in searchable fields) for a long, long time. 

My middle management is now bragging to higher-level execs that one of
our 2006 accomplishments is the introduction of support for XRI in our
directory service. When we cannot accept our partners' SAML assertions
containing non-ASCII XRI, I'd like to say that it's because we don't
support IRI rather than that we only partially support XRI, or that our
XRI support is broken. Adoption of XRI will be hampered if people start
to associate the IRI difficulties with XRI.

Marty.Schleiff@boeing.com; CISSP
Associate Technical Fellow - Cyber Identity Specialist
Computing Security Infrastructure
(206) 679-5933
 

> -----Original Message-----
> From: Gabe Wachob [mailto:gabe.wachob@amsoft.net] 
> Sent: Tuesday, November 28, 2006 5:07 PM
> To: Schleiff, Marty; xri@lists.oasis-open.org
> Subject: RE: [xri] URI/IRI/XRI - what should extend what?
> 
> What I hear is this:
> 
> You want to be able to say:
> 
> 1) If you are just doing us-ascii, then you can ignore 
> implementing any IRI stuff at all
> 
> 2) If you are doing XRI with more characters, then use 
> something like IRI on top of XRI - something we'd have to 
> define since XRI syntax (in XRI normal
> form) is a superset of URI - that is a legal us-ascii XRI in 
> XRI normal form may not be a legal URI. 
> 
> What we can say today is:
> 
> 1) If you all you are doing today is us-ascii XRIs, then you 
> can ignore implementing any IRI stuff at all (but this is 
> only "partial implementation"
> of the XRI spec - since we don't define "us-ascii-only XRIs")
> 
> 2) If you are doing anything other than us-ascii XRIs, then 
> you have to do IRI processing after XRI normalization. 
> 
> I don't see that what you are saying is actually all that 
> more attractive over what we can say today. The only change 
> we might want to add is a note saying that you can ignore all 
> the IRI stuff if you don't care about working with anything 
> but us-ascii characters...  (we'd have to actually confirm 
> that in detail, but I'm fairly sure). 
> 
> Would that satisfy your need/interest/want/desire? 
> 
>     -Gabe
> 
> > -----Original Message-----
> > From: Schleiff, Marty [mailto:marty.schleiff@boeing.com]
> > Sent: Tuesday, November 28, 2006 4:17 PM
> > To: Gabe Wachob; xri@lists.oasis-open.org
> > Subject: RE: [xri] URI/IRI/XRI - what should extend what?
> > 
> > Hi Gabe (& All),
> > 
> > I'll try again.
> > 
> > If we say XRI is a URI scheme, then we can focus on ASCII-only. I 
> > think we can (almost) ignore IRI and its issues, just like I think 
> > http is oblivious to IRI.
> > 
> > So the folks who aren't English-speakers can use IRI to represent 
> > their XRIs just like they use IRI to represent their http URIs.
> > 
> > Marty.Schleiff@boeing.com; CISSP
> > Associate Technical Fellow - Cyber Identity Specialist Computing 
> > Security Infrastructure
> > (206) 679-5933
> > 
> > 
> > > -----Original Message-----
> > > From: Gabe Wachob [mailto:gabe.wachob@amsoft.net]
> > > Sent: Tuesday, November 28, 2006 12:02 PM
> > > To: Schleiff, Marty; xri@lists.oasis-open.org
> > > Subject: RE: [xri] URI/IRI/XRI - what should extend what?
> > >
> > > Marty-
> > > 	I think you may have a misconception about all these things.
> > >
> > > 	First, URI's are defined with US-ASCII only. If you don't do 
> > > US-ASCII, you don't do URI's.
> > > 	So the folks who aren't Engish-speakers decided they 
> wanted to play 
> > > in the URI world and so they defined IRI. IRI is 
> basically just the 
> > > way of encoding the full range of UTF-8 characters into URI-legal 
> > > strings.
> > >
> > > 	So if we don't leverage IRI, we just have to rewrite 
> IRI. I don't 
> > > see any point in that.
> > >
> > > 	If you want to support XRI, you have to support the full set of 
> > > internationalized characters, and the easiest way to do 
> that is to 
> > > use IRI libraries which are pretty ubiquitous now. There 
> are a lot 
> > > of Unicode corner cases and I'm fairly certain not 
> everyone handles 
> > > all of Unicode correctly.
> > > But this is one of those areas where 99.99% of the cases 
> are handled 
> > > correctly and we should be happy with that.
> > >
> > > 	So, I'm not sure its really a big deal for a vendor to 
> support URI 
> > > and not IRI. And if they don't want to support IRI, then they 
> > > *really* won't want to support XRI.
> > >
> > > 	-Gabe
> > >
> > > > -----Original Message-----
> > > > From: Schleiff, Marty [mailto:marty.schleiff@boeing.com]
> > > > Sent: Tuesday, November 28, 2006 9:28 AM
> > > > To: xri@lists.oasis-open.org
> > > > Subject: [xri] URI/IRI/XRI - what should extend what?
> > > >
> > > > Hi All,
> > > >
> > > > The XRI Syntax spec describes IRI as extending the 
> character set 
> > > > of URI, and then describes XRI as extending the syntactic
> > > elements (but
> > > > not the character set) of IRI. If I were a product vendor, it 
> > > > would sound to me like in order to support XRI, my 
> products would
> > > first (or
> > > > also) have to support IRI. I might think IRI support sounds 
> > > > complex with lots of implications to my install base, so if I 
> > > > decide not to support IRI it also means I wouldn't be 
> supporting XRI.
> > > > To me it seems like IRI adds lots of complexity to XRI. 
> I'd rather 
> > > > just say XRI is a URI scheme, restricted to UTF-8 like any
> > > other URI.
> > > > In XRI let's not even worry about other encodings. When
> > > international
> > > > characters are needed in an XRI, then the IRI spec deals
> > > with how to
> > > > do it. Let's leave the complexity in the IRI spec. Of
> > > course we could
> > > > include a section in the XRI Syntax spec that gives
> > > examples of how to
> > > > convert a URI with a scheme of xri:// into an IRI 
> according to the 
> > > > steps described in RFC 3987.
> > > > I put this idea on the wiki (item #3.11 under XRI Syntax).
> > > >
> > > > Marty.Schleiff@boeing.com; CISSP
> > > > Associate Technical Fellow - Cyber Identity Specialist 
> Computing 
> > > > Security Infrastructure
> > > > (206) 679-5933
> > >
> > >
> 
>
References:
- RE: [xri] URI/IRI/XRI - what should extend what?
  - From: "Schleiff, Marty" <marty.schleiff@boeing.com>