[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [xri-editors] RE: Closure on I18N approach (was RE: [xri-edit ors] Status on draft spec)
My very rough first cut. In the use of language selector/identifier, I introduced the notion of font as well because it looks like to determine the glyph, Unicode needs the code point (UTF-8 in a sense), Language (actually, it can be represented in non-printing character in UTF-8 also), and font selector. Since I have not encountered with the UTF-8 way of assigining font selector, I have sugested to use the cross reference such as $l/en/Arial . By the way, I suppose we need some kind of restriction on the character set definition in section 2.2. Perhaps adding 2.2.5 Legal character sequence Not all ascii sequence can be drived from UTF-8 sequence. A valid XRI character sequence must be derivable by escaping UTF-8 sequence. would do? Nat ------- from here ------------- 2.3 Character Encoding and Internationalization The basic character encoding of XRI is UTF-8 as per recommended by [RFC2718]. Since XRI is a human readable identifier, the representation of the XRI on the underlying document should use the character encoding of the underlying document. However, this string must be converted to UTF-8 before any further processing. Thus, URI conversion must be made only after UTF-8 conversion. In general, conversion between local language encoding representation and URI representation will require the following two steps. 1. Conversion between Local language encoding and UTF-8 2. Conversion between UTF-8 and URI 2.3.1 Local language encoding to UTF-8 conversion To represent the glyph of UTF-8 string correctly, language information and font information may be required. One short coming of UTF-8 is that it does not necessarily carry these information with it. On the other hand, local language encoding always has the language information associated with it. Thus, to make it possible to revert back to the local language representation, there has to be a way to record the language and font context. To accommodate this requirement, XRI facilitates the mark up by use of cross references and $l special identifier defined in Appendix B. Once the language and font context is set up, this will be valid until it is reset by another cross reference. [Note: It may be better to use the the 14th plane of the ISO 10646]. Example: xri://($l/en/Times).english.($l/en/Arial)string.($l/ja).japaneseString.( $l/ko).koreanString.($l/ch).chineseString When converting the local language encoding, it must be converted to a sequence of characters from the UCS normalized according to Normalization Form C. 2.3.2 Conversion between UTF-8 and URI To convert UTF-8 to RFC2396 format, hostname and other parts needs to be treated separately. For hostname, the conversion must use Punycode. For other parts, the conversion must use the escaping method defined in section 2.2.3. -----Original Message----- From: Sakimura, Nat Sent: Friday, July 04, 2003 6:47 PM To: xri-editors@lists.oasis-open.org Subject: RE: [xri-editors] RE: Closure on I18N approach (was RE: [xri-edit ors] Status on draft spec) Just to make a note of: Not all URIs are valid IRIs. All the URIs that were converted from IRI can be reverted back to IRI, but URI can be composed oby escaping a non UTF-8 encoded string. Nat -----Original Message----- From: Sakimura, Nat Sent: Friday, July 04, 2003 4:20 PM To: Wachob, Gabe; Drummond Reed; Dave McAlpin; xri-editors@lists.oasis-open.org Subject: RE: [xri-editors] RE: Closure on I18N approach (was RE: [xri-edit ors] Status on draft spec) IMHO XRI should be internationalized from the beginning. Introducing XRI and IXRI will create unnecessary confusion and uncleanness in the implementation as well as no adaptation in the reality. We should design the system UTF-8 clean, and %escaping of non-ascii characters should happen only as the last resort. Is there any problem in making the following range as unreserved as well? ucschar = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF / / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD / %xD0000-DFFFD / %xE1000-EFFFD Doing this will impact the structure of the document a little that perhaps we have to make a section in I18N chapter about the UTF-8 to ASCII translation. Nat -----Original Message----- From: Wachob, Gabe [mailto:gwachob@visa.com] Sent: Friday, July 04, 2003 10:27 AM To: Sakimura, Nat; Drummond Reed; Wachob, Gabe; Dave McAlpin; xri-editors@lists.oasis-open.org Subject: RE: [xri-editors] RE: Closure on I18N approach (was RE: [xri-edit ors] Status on draft spec) Perhaps the approach should be that the stuff currently there is the "correctness rules" after a unicode -> US-Ascii transformation has been performed. Any string that ends up as a legal XRI (in the 2396 definition we have now) after this transformation is thus a legal IXRI (internationalized XRI). Does that approach make sense? Thanks for guiding us on this Nat - I think we'd be hopelessly lost if we didn't have you on board here. -Gabe > -----Original Message----- > From: Sakimura, Nat [mailto:n-sakimura@nri.co.jp] > Sent: Thursday, July 03, 2003 6:21 PM > To: Drummond Reed; Wachob, Gabe; Dave McAlpin; > xri-editors@lists.oasis-open.org > Subject: [xri-editors] RE: Closure on I18N approach (was RE: > [xri-editors] Status on draft spec) > > > I am basically with it. My point was just that in the last edition, it > was stated that character set and thus reserved and unreserved > character set will basically come from RFC2396. I think it should come > from IRI instead (for obvious reason). > > I am going to work on the section today. > > Nat > > -----Original Message----- > From: Drummond Reed [mailto:drummond.reed@onename.com] > Sent: Friday, July 04, 2003 2:46 AM > To: Wachob, Gabe; Sakimura, Nat; Dave McAlpin; > xri-editors@lists.oasis-open.org > Subject: Closure on I18N approach (was RE: [xri-editors] > Status on draft > spec) > > Nat, > > First, +1 on Gabe's reply (glad I read it before I typed my own). > > Second, glad you are back from your trip. From a process standpoint, > with Gabe's submission of the resolution portion of the spec, which > DaveM is incorporating into the main body of the doc today, the > Encoding and I18N sections remain the last to be filled in. > > Which means closing on our overall approach to this issue is the next > major decision at hand. > > Third, to reinforce one point that DaveM and I have been dealing with > extensively with regard to RFC 2396bis: any IETF spec that is at > Internet Draft status can't be referenced normatively by the XRI spec. > That's the case with 2396bis, and it's also the case with IRI. So if > we want to use the IRI approach, we'd have to, as Gabe says, > incorporate its substantive content directly. > > What do you suggest is the best approach? > > =Drummond > > > -----Original Message----- > From: Wachob, Gabe [mailto:gwachob@visa.com] > Sent: Thursday, July 03, 2003 9:58 AM > To: 'Sakimura, Nat'; Dave McAlpin; xri-editors@lists.oasis-open.org > Subject: RE: [xri-editors] Status on draft spec > > Nat > I'm not sure I see the distinction you are making. > > I think we define the XRI syntax in terms of 2396 but then > define a set of IRI-like transformation rules from scripts and > character sets other than US-ASCII (actually the more limited set of > URI-legal characters). In other words, do exactly what the IRI draft > proposes. Unfortunately, the IRI draft is not a real specification, so > we cannot cite it normatively, but I would strongly favor adopting its > approach (even that means lifting sections word for word). > > For those of us in US-ASCII land, this has little or no > effect. For those who have more interesting character sets, this > means that yes, > user interfaces will have to convert XRI from the URI-escaped form to > the localized form for the particular user. But in either > case the XRIs > will be human readable, so long as the client software performs i18n > unescaping and translation into local character sets. > > Is this #2? > > > -Gabe > > > -----Original Message----- > > From: Sakimura, Nat [mailto:n-sakimura@nri.co.jp] > > Sent: Thursday, July 03, 2003 2:59 AM > > To: Dave McAlpin; xri-editors@lists.oasis-open.org > > Subject: RE: [xri-editors] Status on draft spec > > > > > > Sorry for the delay. I am finally back from two weeks consecutive > > trips. > > > > > > Looking at the discussion, it looks like we base most syntax on > > RFC2396. This would assume/implies the following: > > > > 1) Most international XRI will not be human readable. > > Or > > 2) We are talking about the URI escape form of XRI for machine level > > handling, which a user will not see because the XRI client software > > will take care of the conversion. > > > > Which is true? > > > > My inclination is towards 2) by the way. 1) will not fulfill our > > promise of human readability. This will in turn have impact on the > > section 2.1. > > Instead of RFC 2396, we probably need to be basing it on IRI. > > > > Nat Sakimura > > > > -----Original Message----- > > From: Dave McAlpin [mailto:dave.mcalpin@epokinc.com] > > Sent: Thursday, July 03, 2003 4:04 AM > > To: xri-editors@lists.oasis-open.org > > Subject: [xri-editors] Status on draft spec > > > > The following sections of the draft spec are currently waiting for > > input. > > > > Section 2.3 Character Encoding and Internationalization > (Gabe and Nat) > > Section 2.5.3 Internationalized XRI Equivalence (Gabe and Nat) > > Section 3 Resolution (Gabe, Mike and Peter) > > > > I'm doing a pass through the doc and making editorial changes right > > now. I'll post a new version (04) this afternoon so people can > see how it's > > shaping up and to see how missing sections will fit into > the doc as a > > whole. > > > > Dave > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: xri-editors-unsubscribe@lists.oasis-open.org > > For additional commands, e-mail: > xri-editors-help@lists.oasis-open.org > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: xri-editors-unsubscribe@lists.oasis-open.org > > For additional commands, e-mail: > xri-editors-help@lists.oasis-open.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: xri-editors-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: xri-editors-help@lists.oasis-open.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: xri-editors-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: xri-editors-help@lists.oasis-open.org > --------------------------------------------------------------------- To unsubscribe, e-mail: xri-editors-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xri-editors-help@lists.oasis-open.org --------------------------------------------------------------------- To unsubscribe, e-mail: xri-editors-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xri-editors-help@lists.oasis-open.org
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]