RE: [xri] I18n and $ tags (on the $l and $f proposals)

The reason $f beside $l came up was that to represent the octet stream in a human readable fashion, Unicode and hence ISO 10464 requires following information:

1. Actual octet stream

2. Language

3. Glyph selector

4. Font

I know, this sounds like a sick joke, but this is the reality.

(That’s why I was grumbling earlier that I wish we had DIS 10464 ver.1 as the ISO standard.)

I believe we can go pretty far with only 1 and 2, but I do not want to pretend that I know the problems other languages will encounter, so it is better to leave some room to preserve original amount of information. You are right, it might not be used very often in the real life, but some people may need it. Then, why should we remove it?

As far as the equivalence is concerned, I believe that we should be comparing either the actual octet stream itself or the terminal outcome of the resolution. Equivalence is another huge topic involving the normalization, which may be even harder than multi-lingalization itself. I would even go forward and say that equivalence should not be dealing with normalization, but that might be a little too extreme. I suspect normalization will be a nightmare for the implementers, because one has to have a mapping between the composed form and decomposed form, and of course, you need the language and context information for this to happen. Also, when a new composed form is added, one has to add it to the mappings. It sounds too difficult to me.

Nat

-----Original Message-----
From: Wachob, Gabe [mailto:gwachob@visa.com]
Sent: Saturday, July 19, 2003 4:57 AM
To: xri@lists.oasis-open.org
Subject: RE: [xri] I18n and $ tags (on the $l and $f proposals)

While I really think these proposals *could* be useful, I think they would be used (especially the $f one) in a relatively limited set of situations (i.e. those where the XRIs are presented to humans).

Thats a provocative statement I've just made. Some folks have in their minds that most (many?) XRIs will be presented to humans. Some folks (me included) believe most won't.

What I truly believe is that for some applications of XRIs, a large proportion will be presented to humans, and for other applications, they won't be presented to humans. Of course, we see this sort of flexibility as a strength. But this sort of flexibility is also the source of tension when deciding when to include or exclude features.

I sound like a broken record, but I want to make sure that we are addressing a *real* need and that the solution doesn't create more complexity than it tries to eliminate.

For example, the $f/(+Arial) proposal looks good on the surface but there are several complicating factors:

1) You probably don't want a top-level +<font-name> entry because I could easily see a font name conflicting with another use of the term which is the font name. There are a ton of fanciful font names and I could easily see +Modern being ambiguous as a font name or something else. So we'd end up with +font/Modern, which would appear as $f/(+font/Modern).

2) Look how complicated the XRIs get... Even if you assume the font information is inserted by the UIs (and not presented to the user), this seems to complicate equivalence rules...

3) It seems that no matter what the structure is for font names, someone is going to have to manage a list of font names. Fonts are subject to intellectual property rights (at least in some places) and this tends to mean that there is no central registry of font names that everyone agrees on and is managed. Fonts are considered "property" which is licensed (though there are "public domain" ones). This is not a problem directly, but leads (I believe) to a situation where the universe of fonts is rather scattered and hard to survey properly. Certainly not something we want to do anyway. Use of the +font namespace seems appropriate.

So, we need to be very clear about the problems we are solving using this $f mechanism, because if they don't outweight the complexity, we shouldn't do them.

Whats the use case? How is this driven by internationalization concerns? If so, can we be more specific about the disambiguation we are trying to address? Without having the background of i18n, it strikes me as *really* odd to specify presentation information in the identifier -- I know others will have the same response.

Outside of $f (to which i am specifically pushing back), I agree with Geoffrey that using + cross references under other $ names (language, version syntax, etc) is a Good Thing. They allow a great deal of flexibility at the cost of human readability/usability (which is a fine compromise for me, in the use cases I am biased towards).

-Gabe

> -----Original Message-----
> From: geoffrey.strongin@amd.com [mailto:geoffrey.strongin@amd.com]
> Sent: Monday, July 14, 2003 8:42 AM
> To: xri@lists.oasis-open.org
> Subject: RE: [xri] I18n and $ tags
>
>
> I like this. It really leverages the power of the + namespace.
>
> Geoffrey
>
> > -----Original Message-----
> > From: Drummond Reed [mailto:drummond.reed@onename.com]
> > Sent: Friday, July 11, 2003 11:58 PM
> > To: Dave McAlpin; xri@lists.oasis-open.org
> > Subject: RE: [xri] I18n and $ tags
> >
> >
> > -----Original Message-----
> > From: Dave McAlpin [mailto:dave.mcalpin@epokinc.com]
> > Sent: Friday, July 11, 2003 3:57 PM
> > To: xri@lists.oasis-open.org
> > Subject: [xri] I18n and $ tags
> >
> > I assume internationalization does not apply to the $ tags.
> > For example,
> > there's no internationalized version of $v. Is this correct?
> > Is this ok?
> >
> > Dave
> >
> > *****Drummond replies*****
> >
> > I think it's not only correct, but also a good thing. There
> > should be no
> > need to internationalize the $ space for the following
> > reason: IMHO, the
> > purpose of the $ space is to provide a mechanism for
> > extending the very
> > limited set of reserved chars in 2396 (which we've already
> had to bust
> > out of in order to add support for xrefs and sub-segments)
> in order to
> > have sufficient metadata (and extensibility) to describe
> > identifiers in
> > ways that are vital to the act of identification, i.e.,
> > language, font,
> > version syntax, query syntax, resolvability, human-readable comment,
> > etc.
> >
> > For this reason, I propose that in Appendix B we state a formal a
> > requirement that the vocabulary in the $ identifier space
> (note that I
> > don't call it a namespace for the reasons I'm about to argue) be as
> > terse as possible, not just to enforce compactness, but to reinforce
> > that it is an extension of the reserved-symbol-space and not
> > intended to
> > carry linguistic-level semantics.
> >
> > For example, the $l (language) space should, as Nat
> proposed, use the
> > two-letter codes for languages specified in ISO standard 639
> > referenced
> > in RFC 1766. It should NOT use full-length equivalents.
> >
> > The proposed $f (font) space for font names would violate
> this rule if
> > it used full-length English font names. (Furthermore, if we
> > did that, it
> > would beg for internationalization). To avoid both
> problems, we should
> > try to find a compact font name abbreviation registry that we can
> > reference, similar to ISO 639 for language abbreviations.
> >
> > If we can't find one, and we don't want to create one (at
> > least I don't
> > want to), there is another solution - one that applies
> nicely to any $
> > space. In place of an exact, rigorously specified
> vocabulary, every $
> > space can also cross-reference common names in the + space.
> Here's an
> > example of how that would work for a font name:
> >
> > xri:($l/fr).($f/(+Arial)).french-word-in-Arial-font/foo
> >
> > Rather than using "($f/Arial)", which would means "Arial"
> was formally
> > registered in the "$f" space, the segment "($f/(+Arial)"
> simply means
> > "Arial" is a common name in the context of a font. I'm not a font
> > expert, but I'd be willing to guess that a large percentage of
> > typographic software would recognize that common name for a font.
> > Furthermore, the xri above would also tell the XRI parser that the
> > common name "Arial" should be interpreted not just in the context of
> > being a font, but specifically being a French name for a font. That
> > should reduce the chance of misinterpretation even further.
> >
> > Use of the + space for real-world common names for metadata
> like fonts
> > means there is an easy way to apply the 80/20 rule, while leaving it
> > open for the $f space to reference a more exhaustive and
> non-ambiguous
> > font name abbreviation registry later.
> >
> > Again, I think this rule should be applied across the board to all $
> > spaces, including language, font, version syntax, query syntax, etc.
> >
> > =Drummond
> >
> >
> >
> >
> > You may leave a Technical Committee at any time by visiting
> http://www.oasis-open.org/apps/org/workgroup/xri/members/leave
_workgroup.php

You may leave a Technical Committee at any time by visiting http://www.oasis-open.org/apps/org/workgroup/xri/members/leave_workgroup.php

xri message