geolang-comment message

Subject: Re: [geolang-comment] First proposals for ISO 639 and 3166 available

From: Lars Marius Garshol <larsga@garshol.priv.no>
To: geolang-comment@lists.oasis-open.org (GeoLang List)
Date: Thu, 29 Aug 2002 10:00:49 +0200


(John sent this to me privately, but as I don't think that was
intentional I'm replying to the list.)

* Lars Marius Garshol
|
| The 639 codes fall into three categories: language codes, collective
| language codes, and codes which don't represent either. The trouble
| is that distinguishing between these three is decidedly non-trivial
| and not done in the source standard. I feel this is something third
| parties should do, rather than the TC.

* John Cowan
| 
| From a strictly 639 viewpoint, though, there is a distinction
| between individual and collective codes.  

There is, which is why I proposed different published subjects for
those two. ('Language' and 'language group'.)

| SIL or other parties may choose to annotate some of the "individual"
| codes as actually being collective, but I think the plain language
| of clause 4.1.1 should not be ignored:
| 
| # The words <i>languages</i> or <i>(other)</i> as part of a language
| # name in the following tables may be taken to indicate that a language
| # code is a collective language code.
| 
| Therefore we should have a PSI for "language collection" and
| indicate which codes are for languages and which for language
| collections according to 639 itself.  I do not believe the use of
| "may" has an RFC 2119 force (= "optionally") here.

That is a good point. The trouble is that if you apply this heuristic
you come to conclusions that don't actually make any sense. The first
problem is that this would conclude that 'mul' and 'und' represent
languages, which they do not. Secondly, 'nor' would be a single
language, even though according to 639 itself Norwegian can be split
into 'nno' and 'nob'.

Another problem is 'Chinese' (chi). It's not clear which language this
is supposed to be, and the Ethnologue mapping splits it into 12
different languages. Similarly, Ethnologue splits 'Banda' (bad) into
16 languages, which is more than it splits 'Bamileke languages' into,
and so it's tempting to conclude that that is a language group, too.
 
In other words, we could interpret the text as you write, and I think
that might even be the intended interpretation, but it leads us into
all manner of controversies that I would prefer not to have in the
published subject set itself.

Instead, I could publish my Ethnologue-based type assignments as an
individual, and you could publish your assignments based on the 4.1.1
heuristic, and people could use whichever they preferred.

| BTW, the normative text also indicates that the 3-letter
| bibliographic codes are the most stable, so they should be used for
| the subject indicators.  This seems to already be the case,
| fortunately.

It is, and precisely for that reason. It's a pity that 639 doesn't
have stable numeric codes like 3166 has, but on the other hand they
seem to make very few changes in practice.

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
ISO SC34/WG3, OASIS GeoLang TC        <URL: http://www.garshol.priv.no >

Follow-Ups:
- Re: [geolang-comment] First proposals for ISO 639 and 3166 available
  - From: John Cowan <jcowan@reutershealth.com>
- Re: [geolang-comment] First proposals for ISO 639 and 3166 available
  - From: Steve Pepper <pepper@ontopia.net>