OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xri-editors message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [xri-editors] RE: Closure on I18N approach (was RE: [xri-edit ors] Status on draft spec)


My very rough first cut. 

In the use of language selector/identifier, I introduced the notion of
font as well because it looks like to determine the glyph, Unicode needs
the code point (UTF-8 in a sense), Language (actually, it can be
represented in non-printing character in UTF-8 also), and font selector.
Since I have not encountered with the UTF-8 way of assigining font
selector, I have sugested to use the cross reference such as $l/en/Arial
. 

By the way, I suppose we need some kind of restriction on the character
set definition in section 2.2. Perhaps adding 

   2.2.5 Legal character sequence
   Not all ascii sequence can be drived from UTF-8 sequence. 
   A valid XRI character sequence must be derivable by escaping 
   UTF-8 sequence. 

would do? 

Nat

------- from here -------------

2.3 Character Encoding and Internationalization
The basic character encoding of XRI is UTF-8 as per recommended by
[RFC2718]. Since XRI is a human readable identifier, the representation
of the XRI on the underlying document should use the character encoding
of the underlying document. However, this string must be converted to
UTF-8 before any further processing. Thus, URI conversion must be made
only after UTF-8 conversion. In general, conversion between local
language encoding representation and URI representation will require the
following two steps.  

  1.	Conversion between Local language encoding and UTF-8
  2.	Conversion between UTF-8 and URI

2.3.1 Local language encoding to UTF-8 conversion
To represent the glyph of UTF-8 string correctly, language information
and font information may be required. One short coming of UTF-8 is that
it does not necessarily carry these information with it. On the other
hand, local language encoding always has the language information
associated with it. Thus, to make it possible to revert back to the
local language representation, there has to be a way to record the
language and font context. To accommodate this requirement, XRI
facilitates the mark up by use of cross references and $l special
identifier defined in Appendix B. Once the language and font context is
set up, this will be valid until it is reset by another cross reference.
[Note: It may be better to use the the 14th plane of the ISO 10646]. 

Example: 
xri://($l/en/Times).english.($l/en/Arial)string.($l/ja).japaneseString.(
$l/ko).koreanString.($l/ch).chineseString  

When converting the local language encoding, it must be converted to a
sequence of characters from the UCS normalized according to
Normalization Form C. 

2.3.2 Conversion between UTF-8 and URI
To convert UTF-8 to RFC2396 format, hostname and other parts needs to be
treated separately. For hostname, the conversion must use Punycode. For
other parts, the conversion must use the escaping method defined in
section 2.2.3. 

-----Original Message-----
From: Sakimura, Nat 
Sent: Friday, July 04, 2003 6:47 PM
To: xri-editors@lists.oasis-open.org
Subject: RE: [xri-editors] RE: Closure on I18N approach (was RE:
[xri-edit ors] Status on draft spec)


Just to make a note of: 

Not all URIs are valid IRIs. All the URIs that were converted from IRI
can be reverted back to IRI, but URI can be composed oby escaping a non
UTF-8 encoded string. 

Nat 

-----Original Message-----
From: Sakimura, Nat 
Sent: Friday, July 04, 2003 4:20 PM
To: Wachob, Gabe; Drummond Reed; Dave McAlpin;
xri-editors@lists.oasis-open.org
Subject: RE: [xri-editors] RE: Closure on I18N approach (was RE:
[xri-edit ors] Status on draft spec)


IMHO XRI should be internationalized from the beginning. Introducing XRI
and IXRI will create unnecessary confusion and uncleanness in the
implementation as well as no adaptation in the reality. We should design
the system UTF-8 clean, and %escaping of non-ascii characters should
happen only as the last resort. 

Is there any problem in making the following range as unreserved as
well?

    ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF /
                   / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
                   / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
                   / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
                   / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
                   / %xD0000-DFFFD / %xE1000-EFFFD

Doing this will impact the structure of the document a little that
perhaps we have to make a section in I18N chapter about the UTF-8 to
ASCII translation. 

Nat

-----Original Message-----
From: Wachob, Gabe [mailto:gwachob@visa.com] 
Sent: Friday, July 04, 2003 10:27 AM
To: Sakimura, Nat; Drummond Reed; Wachob, Gabe; Dave McAlpin;
xri-editors@lists.oasis-open.org
Subject: RE: [xri-editors] RE: Closure on I18N approach (was RE:
[xri-edit ors] Status on draft spec)

Perhaps the approach should be that the stuff currently there is the
"correctness rules" after a unicode -> US-Ascii transformation has been
performed. Any string that ends up as a legal XRI (in the 2396
definition we have now) after this transformation is thus a legal IXRI
(internationalized XRI). 

Does that approach make sense?

Thanks for guiding us on this Nat - I think we'd be hopelessly lost if
we didn't have you on board here.

	-Gabe

> -----Original Message-----
> From: Sakimura, Nat [mailto:n-sakimura@nri.co.jp]
> Sent: Thursday, July 03, 2003 6:21 PM
> To: Drummond Reed; Wachob, Gabe; Dave McAlpin;
> xri-editors@lists.oasis-open.org
> Subject: [xri-editors] RE: Closure on I18N approach (was RE: 
> [xri-editors] Status on draft spec)
> 
> 
> I am basically with it. My point was just that in the last edition, it

> was stated that character set and thus reserved and unreserved
> character set will basically come from RFC2396. I think it should come

> from IRI instead (for obvious reason).
> 
> I am going to work on the section today.
> 
> Nat
> 
> -----Original Message-----
> From: Drummond Reed [mailto:drummond.reed@onename.com]
> Sent: Friday, July 04, 2003 2:46 AM
> To: Wachob, Gabe; Sakimura, Nat; Dave McAlpin; 
> xri-editors@lists.oasis-open.org
> Subject: Closure on I18N approach (was RE: [xri-editors]
> Status on draft
> spec)
> 
> Nat,
> 
> First, +1 on Gabe's reply (glad I read it before I typed my own).
> 
> Second, glad you are back from your trip. From a process standpoint,
> with Gabe's submission of the resolution portion of the spec, which 
> DaveM is incorporating into the main body of the doc today, the 
> Encoding and I18N sections remain the last to be filled in.
> 
> Which means closing on our overall approach to this issue is the next
> major decision at hand.
> 
> Third, to reinforce one point that DaveM and I have been dealing with
> extensively with regard to RFC 2396bis: any IETF spec that is at 
> Internet Draft status can't be referenced normatively by the XRI spec.

> That's the case with 2396bis, and it's also the case with IRI. So if
> we want to use the IRI approach, we'd have to, as Gabe says, 
> incorporate its substantive content directly.
> 
> What do you suggest is the best approach?
> 
> =Drummond
> 
> 
> -----Original Message-----
> From: Wachob, Gabe [mailto:gwachob@visa.com]
> Sent: Thursday, July 03, 2003 9:58 AM
> To: 'Sakimura, Nat'; Dave McAlpin; xri-editors@lists.oasis-open.org
> Subject: RE: [xri-editors] Status on draft spec
> 
> Nat
>         I'm not sure I see the distinction you are making.
> 
>         I think we define the XRI syntax in terms of 2396 but then
> define a set of IRI-like transformation rules from scripts and 
> character sets other than US-ASCII (actually the more limited set of 
> URI-legal characters). In other words, do exactly what the IRI draft 
> proposes. Unfortunately, the IRI draft is not a real specification, so

> we cannot cite it normatively, but I would strongly favor adopting its

> approach (even that means lifting sections word for word).
> 
>         For those of us in US-ASCII land, this has little or no 
> effect. For those who have more interesting character sets, this
> means that yes,
> user interfaces will have to convert XRI from the URI-escaped form to
> the localized form for the particular user. But in either 
> case the XRIs
> will be human readable, so long as the client software performs i18n
> unescaping and translation into local character sets.
> 
>         Is this #2?
> 
> 
>         -Gabe
> 
> > -----Original Message-----
> > From: Sakimura, Nat [mailto:n-sakimura@nri.co.jp]
> > Sent: Thursday, July 03, 2003 2:59 AM
> > To: Dave McAlpin; xri-editors@lists.oasis-open.org
> > Subject: RE: [xri-editors] Status on draft spec
> >
> >
> > Sorry for the delay. I am finally back from two weeks consecutive
> > trips.
> >
> >
> > Looking at the discussion, it looks like we base most syntax on
> > RFC2396. This would assume/implies the following:
> >
> > 1) Most international XRI will not be human readable.
> >     Or
> > 2) We are talking about the URI escape form of XRI for machine level

> > handling, which a user will not see because the XRI client software
> > will take care of the conversion.
> >
> > Which is true?
> >
> > My inclination is towards 2) by the way. 1) will not fulfill our
> > promise of human readability. This will in turn have impact on the
> > section 2.1.
> > Instead of RFC 2396, we probably need to be basing it on IRI.
> >
> > Nat Sakimura
> >
> > -----Original Message-----
> > From: Dave McAlpin [mailto:dave.mcalpin@epokinc.com]
> > Sent: Thursday, July 03, 2003 4:04 AM
> > To: xri-editors@lists.oasis-open.org
> > Subject: [xri-editors] Status on draft spec
> >
> > The following sections of the draft spec are currently waiting for
> > input.
> >
> > Section 2.3 Character Encoding and Internationalization
> (Gabe and Nat)
> > Section 2.5.3 Internationalized XRI Equivalence (Gabe and Nat)
> > Section 3 Resolution (Gabe, Mike and Peter)
> >
> > I'm doing a pass through the doc and making editorial changes right
> > now. I'll post a new version (04) this afternoon so people can
> see how it's
> > shaping up and to see how missing sections will fit into
> the doc as a
> > whole.
> >
> > Dave
> >
> >
> >
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: xri-editors-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail:
> xri-editors-help@lists.oasis-open.org
> >
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: xri-editors-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail:
> xri-editors-help@lists.oasis-open.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xri-editors-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: xri-editors-help@lists.oasis-open.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xri-editors-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: xri-editors-help@lists.oasis-open.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: xri-editors-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: xri-editors-help@lists.oasis-open.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xri-editors-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: xri-editors-help@lists.oasis-open.org


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]