xri message

Subject: RE: [xri] I18n and $ tags
From: "Drummond Reed" <drummond.reed@onename.com>
To: "Dave McAlpin" <dave.mcalpin@epokinc.com>,<xri@lists.oasis-open.org>
Date: Fri, 11 Jul 2003 21:58:26 -0700
-----Original Message-----
From: Dave McAlpin [mailto:dave.mcalpin@epokinc.com]
Sent: Friday, July 11, 2003 3:57 PM
To: xri@lists.oasis-open.org
Subject: [xri] I18n and $ tags

I assume internationalization does not apply to the $ tags. For example,
there's no internationalized version of $v. Is this correct? Is this ok?

Dave

*****Drummond replies*****

I think it's not only correct, but also a good thing. There should be no
need to internationalize the $ space for the following reason: IMHO, the
purpose of the $ space is to provide a mechanism for extending the very
limited set of reserved chars in 2396 (which we've already had to bust
out of in order to add support for xrefs and sub-segments) in order to
have sufficient metadata (and extensibility) to describe identifiers in
ways that are vital to the act of identification, i.e., language, font,
version syntax, query syntax, resolvability, human-readable comment,
etc.

For this reason, I propose that in Appendix B we state a formal a
requirement that the vocabulary in the $ identifier space (note that I
don't call it a namespace for the reasons I'm about to argue) be as
terse as possible, not just to enforce compactness, but to reinforce
that it is an extension of the reserved-symbol-space and not intended to
carry linguistic-level semantics.

For example, the $l (language) space should, as Nat proposed, use the
two-letter codes for languages specified in ISO standard 639 referenced
in RFC 1766. It should NOT use full-length equivalents.

The proposed $f (font) space for font names would violate this rule if
it used full-length English font names. (Furthermore, if we did that, it
would beg for internationalization). To avoid both problems, we should
try to find a compact font name abbreviation registry that we can
reference, similar to ISO 639 for language abbreviations.

If we can't find one, and we don't want to create one (at least I don't
want to), there is another solution - one that applies nicely to any $
space. In place of an exact, rigorously specified vocabulary, every $
space can also cross-reference common names in the + space. Here's an
example of how that would work for a font name:

	xri:($l/fr).($f/(+Arial)).french-word-in-Arial-font/foo

Rather than using "($f/Arial)", which would means "Arial" was formally
registered in the "$f" space, the segment "($f/(+Arial)" simply means
"Arial" is a common name in the context of a font. I'm not a font
expert, but I'd be willing to guess that a large percentage of
typographic software would recognize that common name for a font.
Furthermore, the xri above would also tell the XRI parser that the
common name "Arial" should be interpreted not just in the context of
being a font, but specifically being a French name for a font. That
should reduce the chance of misinterpretation even further.

Use of the + space for real-world common names for metadata like fonts
means there is an easy way to apply the 80/20 rule, while leaving it
open for the $f space to reference a more exhaustive and non-ambiguous
font name abbreviation registry later.

Again, I think this rule should be applied across the board to all $
spaces, including language, font, version syntax, query syntax, etc.

=Drummond