geolang-comment message

Subject: Re: [geolang-comment] First proposals for ISO 639 and 3166 available

From: Steve Pepper <pepper@ontopia.net>
To: geolang-comment@lists.oasis-open.org (GeoLang List)
Date: Thu, 29 Aug 2002 10:29:07 +0200

At 10:00 29/08/02 +0200, Lars Marius Garshol wrote:
>| Therefore we should have a PSI for "language collection" and
>| indicate which codes are for languages and which for language
>| collections according to 639 itself.  I do not believe the use of
>| "may" has an RFC 2119 force (= "optionally") here.
>
>That is a good point. The trouble is that if you apply this heuristic
>you come to conclusions that don't actually make any sense. The first
>problem is that this would conclude that 'mul' and 'und' represent
>languages, which they do not. Secondly, 'nor' would be a single
>language, even though according to 639 itself Norwegian can be split
>into 'nno' and 'nob'.

Surely whether things "make sense" or not is not for us to decide?

We should have two goals:

(1) to reflect 639 as accurately as possible
(2) provide a PSI set and formal representations that are maximally
     convenient for users

Consideration (1) tells me that we should, indeed, create PSIs for
the concepts of "language" and "language collection", as defined
(implicitly or explicitly) in ISO 639. Nobody will be forced to use
those PSIs if the definitions don't suite their purposes.

Consideration (2) tells me that we should be very careful when
creating the formal representations. How about, for example, building
TWO topic maps:

(1) One that merely declares one topic for each language for which 639
has a language code, and includes both English and French names, and
occurrences for the alpha codes -- i.e. exactly as in 639-basic.xtm in
its current version, but with the alpha codes in addition, and
(possibly) without the "language" and "language group/collection"
topics.

(2) One that declares the "language" and "language group/collection"
topics and otherwise consists of a set of class-instance associations
between the topics declared in (1) and the classes to which 639
(rightly or wrongly) asserts that they belong (let's call this one
639-classes.xtm).

Most people would only have a use for (1), but we would at least have
fulfilled our goal of formally capturing all the assertions made in
639, including typing. Those that have a use for the latter, would
simply use both 639-basic.xtm and 639-classes.xtm.

>In other words, we could interpret the text as you write, and I think
>that might even be the intended interpretation, but it leads us into
>all manner of controversies that I would prefer not to have in the
>published subject set itself.

I would say that "the published subject set itself" is really only
the set of published subject indicators and corresponding published
subject identifiers. That is, it doesn't include the formal representations
in XTM and RDF, which are only provided for convenience.

>Instead, I could publish my Ethnologue-based type assignments as an
>individual, and you could publish your assignments based on the 4.1.1
>heuristic, and people could use whichever they preferred.

We certainly want to encourage people other than the TC to publish
useful mappings, such as to Ethnologue, so I agree with the first part
of this statement. But I don't see any reason why the TC itself shouldn't
look after the classification indicated by 4.1.1, provided it is done
in such a way that those people that don't need it (or disagree with
it) can easily ignore it.

Actually, the same line of argument (ease of use, the ability to get
at just those parts of the formal representation that are actually needed
for a particular application) could be used to justify splitting
639-basic.xtm into multiple topic maps. That way people can easily get
(say) just the French names without all the other assertions (English
names, 2- and 3 letter codes) coming along as unwanted baggage. But do
we really want to go that far?

Steve

--
Steve Pepper, Chief Executive Officer <pepper@ontopia.net>
Convenor, ISO/IEC JTC1/SC34/WG3  Editor, XTM (XML Topic Maps)
Ontopia AS, Waldemar Thranes gt. 98, N-0175 Oslo, Norway.
http://www.ontopia.net/ phone: +47-23233080 GSM: +47-90827246

Follow-Ups:
- Re: [geolang-comment] First proposals for ISO 639 and 3166 available
  - From: John Cowan <jcowan@reutershealth.com>
- Re: [geolang-comment] First proposals for ISO 639 and 3166 available
  - From: Lars Marius Garshol <larsga@garshol.priv.no>

References:
- Re: [geolang-comment] First proposals for ISO 639 and 3166 available
  - From: Lars Marius Garshol <larsga@garshol.priv.no>