xri-editors message

Subject: RE: [xri-editors] RE: Closure on I18N approach (was RE: [xri-edit ors] Status on draft spec)
From: "Sakimura, Nat" <n-sakimura@nri.co.jp>
To: "Wachob, Gabe" <gwachob@visa.com>, "Drummond Reed" <drummond.reed@onename.com>, "Dave McAlpin" <dave.mcalpin@epokinc.com>, <xri-editors@lists.oasis-open.org>
Date: Mon, 7 Jul 2003 02:39:27 +0900
Indeed. My suggestion is to define XRIs across the scope of unicode
characters, and then define a transform to URI-legal characters for the
purpose of use of the XRI in places where URIs are expected. XRI could
be used in many instances, and resolution is one of them. Even some
resolution mechanism may be multi-bytes transparent that it might not
require the transformation to us-ascii. I do not think it is always
necessary to escape into us-ascii. When we need to, there should be a
standard way to do so, but when we do not need to, we do not have to. 

There is another reason for writing the spec in terms of UTF-8 and then
defining the transformation to us-ascii. As I have noted on earlier
note, not all legal URI octet combination can be reverted back to UTF-8.
Thus, if we define XRI in terms of RFC 2396, we have to add a condition
such as "the sequence of the octet when ascii2utf-8 transformation is
applied must result in legal UTF-8 sequence." This is rather mouthful
and not well defined. We can give much simpler and clear cut definition
by defining XRI in terms of UTF-8, I think. 

Nat

-----Original Message-----
From: Wachob, Gabe [mailto:gwachob@visa.com] 
Sent: Saturday, July 05, 2003 4:30 AM
To: Sakimura, Nat; Wachob, Gabe; Drummond Reed; Dave McAlpin;
xri-editors@lists.oasis-open.org
Subject: RE: [xri-editors] RE: Closure on I18N approach (was RE:
[xri-edit ors] Status on draft spec)


I think the underlying issue we've all had with specifying XRI syntax in
terms of unicode characters (rather than US-ASCII characters) is that
there is a lot of infrastructure already deployed (based on a number of
specificaitons like RFC 2396 and the HTTP specification for example)
that deal only with 8-bit US-ASCII characters.

The point of IRI was to be able to define syntaxes for identifiers using
unicode characters and then have those unicode strings automatically be
converted (through IRI-defined transformations) into the URI-legal (a
subset of US-ASCII) characters that comprise URIs. 

When talking about resolution, I *really* need to be able to assume that
all characters being resolved are URI-legal (or escapable into URI-legal
characters) because underlying network protocols assume US-ascii
character sets. And even when talking about *presenting* (as opposed to
resolving) XRIs) US-ASCII or URI-legal is the only set of characters
that are legal in many environments.

The Question:

Are you merely suggesting that we define XRIs across the scope of
unicode characters, and then define a transform to URI-legal characters
for the purpose of use of the XRI in places where URIs are expected?
That sounds like it may be a reasonable approach - it may not even
affect resolution (if we simply require that the transformation to
URI-legal characters occurs before resolution).

Thanks for your patience in explaining this to us..

	-Gabe

> -----Original Message-----
> From: Sakimura, Nat [mailto:n-sakimura@nri.co.jp]
> Sent: Friday, July 04, 2003 12:20 AM
> To: Wachob, Gabe; Drummond Reed; Dave McAlpin; 
> xri-editors@lists.oasis-open.org
> Subject: RE: [xri-editors] RE: Closure on I18N approach (was RE: 
> [xri-edit ors] Status on draft spec)
> 
> 
> IMHO XRI should be internationalized from the beginning.
> Introducing XRI
> and IXRI will create unnecessary confusion and uncleanness in the
> implementation as well as no adaptation in the reality. We 
> should design
> the system UTF-8 clean, and %escaping of non-ascii characters should
> happen only as the last resort. 
> 
> Is there any problem in making the following range as unreserved as 
> well?
> 
>     ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF /
>                    / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
>                    / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
>                    / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
>                    / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
>                    / %xD0000-DFFFD / %xE1000-EFFFD
> 
> Doing this will impact the structure of the document a little that 
> perhaps we have to make a section in I18N chapter about the UTF-8 to 
> ASCII translation.
> 
> Nat
> 
> -----Original Message-----
> From: Wachob, Gabe [mailto:gwachob@visa.com]
> Sent: Friday, July 04, 2003 10:27 AM
> To: Sakimura, Nat; Drummond Reed; Wachob, Gabe; Dave McAlpin;
> xri-editors@lists.oasis-open.org
> Subject: RE: [xri-editors] RE: Closure on I18N approach (was RE:
> [xri-edit ors] Status on draft spec)
> 
> Perhaps the approach should be that the stuff currently there is the 
> "correctness rules" after a unicode -> US-Ascii transformation has 
> been performed. Any string that ends up as a legal XRI (in the 2396
> definition we have now) after this transformation is thus a legal IXRI
> (internationalized XRI). 
> 
> Does that approach make sense?
> 
> Thanks for guiding us on this Nat - I think we'd be hopelessly lost if

> we didn't have you on board here.
> 
> 	-Gabe
> 
> > -----Original Message-----
> > From: Sakimura, Nat [mailto:n-sakimura@nri.co.jp]
> > Sent: Thursday, July 03, 2003 6:21 PM
> > To: Drummond Reed; Wachob, Gabe; Dave McAlpin; 
> > xri-editors@lists.oasis-open.org
> > Subject: [xri-editors] RE: Closure on I18N approach (was RE: 
> > [xri-editors] Status on draft spec)
> > 
> > 
> > I am basically with it. My point was just that in the last
> edition, it
> > was stated that character set and thus reserved and
> > unreserved character
> > set will basically come from RFC2396. I think it should 
> come from IRI
> > instead (for obvious reason).
> > 
> > I am going to work on the section today.
> > 
> > Nat
> > 
> > -----Original Message-----
> > From: Drummond Reed [mailto:drummond.reed@onename.com]
> > Sent: Friday, July 04, 2003 2:46 AM
> > To: Wachob, Gabe; Sakimura, Nat; Dave McAlpin;
> > xri-editors@lists.oasis-open.org
> > Subject: Closure on I18N approach (was RE: [xri-editors] 
> > Status on draft
> > spec)
> > 
> > Nat,
> > 
> > First, +1 on Gabe's reply (glad I read it before I typed my own).
> > 
> > Second, glad you are back from your trip. From a process standpoint,

> > with Gabe's submission of the resolution portion of the spec, which 
> > DaveM is incorporating into the main body of the doc today, the 
> > Encoding and I18N sections remain the last to be filled in.
> > 
> > Which means closing on our overall approach to this issue
> is the next
> > major decision at hand.
> > 
> > Third, to reinforce one point that DaveM and I have been
> dealing with
> > extensively with regard to RFC 2396bis: any IETF spec that is at 
> > Internet Draft status can't be referenced normatively by
> the XRI spec.
> > That's the case with 2396bis, and it's also the case with
> > IRI. So if we
> > want to use the IRI approach, we'd have to, as Gabe says, 
> incorporate
> > its substantive content directly.
> > 
> > What do you suggest is the best approach?
> > 
> > =Drummond
> > 
> > 
> > -----Original Message-----
> > From: Wachob, Gabe [mailto:gwachob@visa.com]
> > Sent: Thursday, July 03, 2003 9:58 AM
> > To: 'Sakimura, Nat'; Dave McAlpin; xri-editors@lists.oasis-open.org
> > Subject: RE: [xri-editors] Status on draft spec
> > 
> > Nat
> >         I'm not sure I see the distinction you are making.
> > 
> >         I think we define the XRI syntax in terms of 2396 but then 
> > define a set of IRI-like transformation rules from scripts and 
> > character sets other than US-ASCII (actually the more limited set of

> > URI-legal characters). In other words, do exactly what the IRI draft

> > proposes. Unfortunately, the IRI draft is not a real specification,
> so we cannot
> > cite it normatively, but I would strongly favor adopting
> its approach
> > (even that means lifting sections word for word).
> > 
> >         For those of us in US-ASCII land, this has little or
> > no effect.
> > For those who have more interesting character sets, this 
> > means that yes,
> > user interfaces will have to convert XRI from the 
> URI-escaped form to
> > the localized form for the particular user. But in either
> > case the XRIs
> > will be human readable, so long as the client software performs i18n
> > unescaping and translation into local character sets.
> > 
> >         Is this #2?
> > 
> > 
> >         -Gabe
> > 
> > > -----Original Message-----
> > > From: Sakimura, Nat [mailto:n-sakimura@nri.co.jp]
> > > Sent: Thursday, July 03, 2003 2:59 AM
> > > To: Dave McAlpin; xri-editors@lists.oasis-open.org
> > > Subject: RE: [xri-editors] Status on draft spec
> > >
> > >
> > > Sorry for the delay. I am finally back from two weeks consecutive 
> > > trips.
> > >
> > >
> > > Looking at the discussion, it looks like we base most syntax on 
> > > RFC2396. This would assume/implies the following:
> > >
> > > 1) Most international XRI will not be human readable.
> > >     Or
> > > 2) We are talking about the URI escape form of XRI for
> machine level
> > > handling, which a user will not see because the XRI client 
> > > software will take care of the conversion.
> > >
> > > Which is true?
> > >
> > > My inclination is towards 2) by the way. 1) will not fulfill our 
> > > promise of human readability. This will in turn have impact on the
> > > section 2.1.
> > > Instead of RFC 2396, we probably need to be basing it on IRI.
> > >
> > > Nat Sakimura
> > >
> > > -----Original Message-----
> > > From: Dave McAlpin [mailto:dave.mcalpin@epokinc.com]
> > > Sent: Thursday, July 03, 2003 4:04 AM
> > > To: xri-editors@lists.oasis-open.org
> > > Subject: [xri-editors] Status on draft spec
> > >
> > > The following sections of the draft spec are currently waiting for

> > > input.
> > >
> > > Section 2.3 Character Encoding and Internationalization
> > (Gabe and Nat)
> > > Section 2.5.3 Internationalized XRI Equivalence (Gabe and Nat) 
> > > Section 3 Resolution (Gabe, Mike and Peter)
> > >
> > > I'm doing a pass through the doc and making editorial changes 
> > > right now. I'll post a new version (04) this afternoon so people 
> > > can
> > see how it's
> > > shaping up and to see how missing sections will fit into
> > the doc as a
> > > whole.
> > >
> > > Dave
> > >
> > >
> > >
> > > 
> > 
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> xri-editors-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail:
> > xri-editors-help@lists.oasis-open.org
> > >
> > > 
> > 
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> xri-editors-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail:
> > xri-editors-help@lists.oasis-open.org
> > >
> > 
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: xri-editors-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail:
> xri-editors-help@lists.oasis-open.org
> > 
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: xri-editors-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail:
> xri-editors-help@lists.oasis-open.org
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xri-editors-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: xri-editors-help@lists.oasis-open.org
>
Follow-Ups:
- RE: [xri-editors] RE: Closure on I18N approach (was RE: [xri-edit ors] Status on draft spec)
  - From: "Dave McAlpin" <dave.mcalpin@epokinc.com>