The reason $f beside $l came up was that to represent the octet
stream in a human readable fashion, Unicode and hence ISO 10464 requires following
information:
1.
Actual octet stream
2.
Language
3.
Glyph selector
4.
Font
I know, this sounds like a sick joke, but this is the reality.
(That’s why I was grumbling earlier that I wish we had DIS
10464 ver.1 as the ISO standard.)
I believe we can go pretty far with only 1 and 2, but I do not want
to pretend that I know the problems other languages will encounter, so it is
better to leave some room to preserve original amount of information. You are
right, it might not be used very often in the real life, but some people may
need it. Then, why should we remove it?
As far as the equivalence is concerned, I believe that we should be
comparing either the actual octet stream itself or the terminal outcome of the
resolution. Equivalence is another huge topic involving the normalization,
which may be even harder than multi-lingalization
itself. I would even go forward and say that equivalence should not be dealing
with normalization, but that might be a little too extreme. I suspect normalization
will be a nightmare for the implementers, because one has to have a mapping
between the composed form and decomposed form, and of course, you need the
language and context information for this to happen. Also, when a new composed
form is added, one has to add it to the mappings. It sounds too difficult to
me.
Nat
-----Original
Message-----
From: Wachob, Gabe
[mailto:gwachob@visa.com]
Sent: Saturday, July 19, 2003
4:57 AM
To: xri@lists.oasis-open.org
Subject: RE: [xri] I18n and $ tags
(on the $l and $f proposals)
While I really
think these proposals *could* be useful, I think they would be used (especially
the $f one) in a relatively limited set of situations (i.e. those where the
XRIs are presented to humans).
Thats a provocative statement I've just made. Some folks have in their minds
that most (many?) XRIs will be presented to humans. Some folks (me included)
believe most won't.
What I truly believe is that for some applications of XRIs, a large proportion
will be presented to humans, and for other applications, they won't be
presented to humans. Of course, we see this sort of flexibility as a strength.
But this sort of flexibility is also the source of tension when deciding when
to include or exclude features.
I sound like a broken record, but I want to make sure that we are addressing a
*real* need and that the solution doesn't create more complexity than it tries
to eliminate.
For example,
the $f/(+Arial) proposal looks good on the surface but there are several complicating
factors:
1) You
probably don't want a top-level +<font-name> entry because I could easily
see a font name conflicting with another use of the term which is the font
name. There are a ton of fanciful font names and I could easily see +Modern
being ambiguous as a font name or something else. So we'd end up with
+font/Modern, which would appear as $f/(+font/Modern).
2) Look how
complicated the XRIs get... Even if you assume the font information is inserted
by the UIs (and not presented to the user), this seems to complicate
equivalence rules...
3) It seems
that no matter what the structure is for font names, someone is going to
have to manage a list of font names. Fonts are subject to intellectual property
rights (at least in some places) and this tends to mean that there is no
central registry of font names that everyone agrees on and is managed. Fonts
are considered "property" which is licensed (though there are
"public domain" ones). This is not a problem directly, but leads (I
believe) to a situation where the universe of fonts is rather scattered and
hard to survey properly. Certainly not something we want to do anyway. Use of
the +font namespace seems appropriate.
So, we need to
be very clear about the problems we are solving using this $f mechanism,
because if they don't outweight the complexity, we shouldn't do them.
Whats
the use case? How is this driven by internationalization concerns?
If so, can we be more specific about the disambiguation we are trying to
address? Without having the background of i18n, it strikes me as *really* odd
to specify presentation information in the identifier -- I know
others will have the same response.
Outside of $f
(to which i am specifically pushing back), I agree with Geoffrey that
using + cross references under other $ names (language, version syntax,
etc) is a Good Thing. They allow a great deal of flexibility at the cost
of human readability/usability (which is a fine compromise for me, in the use
cases I am biased towards).
>
-----Original Message-----
> From: geoffrey.strongin@amd.com [mailto:geoffrey.strongin@amd.com]
> Sent: Monday, July 14, 2003 8:42
AM
> To: xri@lists.oasis-open.org
> Subject: RE: [xri] I18n and $ tags
>
>
> I like this. It really leverages the power of the + namespace.
>
> Geoffrey
>
> > -----Original Message-----
> > From: Drummond Reed [mailto:drummond.reed@onename.com]
> > Sent: Friday, July 11, 2003 11:58
PM
> > To: Dave McAlpin; xri@lists.oasis-open.org
> > Subject: RE: [xri] I18n and $ tags
> >
> >
> > -----Original Message-----
> > From: Dave McAlpin [mailto:dave.mcalpin@epokinc.com]
> > Sent: Friday, July 11, 2003 3:57
PM
> > To: xri@lists.oasis-open.org
> > Subject: [xri] I18n and $ tags
> >
> > I assume internationalization does not apply to the $ tags.
> > For example,
> > there's no internationalized version of $v. Is this correct?
> > Is this ok?
> >
> > Dave
> >
> > *****Drummond replies*****
> >
> > I think it's not only correct, but also a good thing. There
> > should be no
> > need to internationalize the $ space for the following
> > reason: IMHO, the
> > purpose of the $ space is to provide a mechanism for
> > extending the very
> > limited set of reserved chars in 2396 (which we've already
> had to bust
> > out of in order to add support for xrefs and sub-segments)
> in order to
> > have sufficient metadata (and extensibility) to describe
> > identifiers in
> > ways that are vital to the act of identification, i.e.,
> > language, font,
> > version syntax, query syntax, resolvability, human-readable comment,
> > etc.
> >
> > For this reason, I propose that in Appendix B we state a formal a
> > requirement that the vocabulary in the $ identifier space
> (note that I
> > don't call it a namespace for the reasons I'm about to argue) be as
> > terse as possible, not just to enforce compactness, but to reinforce
> > that it is an extension of the reserved-symbol-space and not
> > intended to
> > carry linguistic-level semantics.
> >
> > For example, the $l (language) space should, as Nat
> proposed, use the
> > two-letter codes for languages specified in ISO standard 639
> > referenced
> > in RFC 1766. It should NOT use full-length equivalents.
> >
> > The proposed $f (font) space for font names would violate
> this rule if
> > it used full-length English font names. (Furthermore, if we
> > did that, it
> > would beg for internationalization). To avoid both
> problems, we should
> > try to find a compact font name abbreviation registry that we can
> > reference, similar to ISO 639 for language abbreviations.
> >
> > If we can't find one, and we don't want to create one (at
> > least I don't
> > want to), there is another solution - one that applies
> nicely to any $
> > space. In place of an exact, rigorously specified
> vocabulary, every $
> > space can also cross-reference common names in the + space.
> Here's an
> > example of how that would work for a font name:
> >
> >
xri:($l/fr).($f/(+Arial)).french-word-in-Arial-font/foo
> >
> > Rather than using "($f/Arial)", which would means
"Arial"
> was formally
> > registered in the "$f" space, the segment
"($f/(+Arial)"
> simply means
> > "Arial" is a common name in the context of a font. I'm not
a font
> > expert, but I'd be willing to guess that a large percentage of
> > typographic software would recognize that common name for a font.
> > Furthermore, the xri above would also tell the XRI parser that the
> > common name "Arial" should be interpreted not just in the
context of
> > being a font, but specifically being a French name for a font. That
> > should reduce the chance of misinterpretation even further.
> >
> > Use of the + space for real-world common names for metadata
> like fonts
> > means there is an easy way to apply the 80/20 rule, while leaving it
> > open for the $f space to reference a more exhaustive and
> non-ambiguous
> > font name abbreviation registry later.
> >
> > Again, I think this rule should be applied across the board to all $
> > spaces, including language, font, version syntax, query syntax, etc.
> >
> > =Drummond
> >
> >
> >
> >
> > You may leave a Technical Committee at any time by visiting
> http://www.oasis-open.org/apps/org/workgroup/xri/members/leave
_workgroup.php
You may leave a Technical Committee at any time by visiting http://www.oasis-open.org/apps/org/workgroup/xri/members/leave_workgroup.php