[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: [geolang-comment] First proposals for ISO 639 and 3166 available
(John sent this to me privately, but as I don't think that was intentional I'm replying to the list.) * Lars Marius Garshol | | The 639 codes fall into three categories: language codes, collective | language codes, and codes which don't represent either. The trouble | is that distinguishing between these three is decidedly non-trivial | and not done in the source standard. I feel this is something third | parties should do, rather than the TC. * John Cowan | | From a strictly 639 viewpoint, though, there is a distinction | between individual and collective codes. There is, which is why I proposed different published subjects for those two. ('Language' and 'language group'.) | SIL or other parties may choose to annotate some of the "individual" | codes as actually being collective, but I think the plain language | of clause 4.1.1 should not be ignored: | | # The words <i>languages</i> or <i>(other)</i> as part of a language | # name in the following tables may be taken to indicate that a language | # code is a collective language code. | | Therefore we should have a PSI for "language collection" and | indicate which codes are for languages and which for language | collections according to 639 itself. I do not believe the use of | "may" has an RFC 2119 force (= "optionally") here. That is a good point. The trouble is that if you apply this heuristic you come to conclusions that don't actually make any sense. The first problem is that this would conclude that 'mul' and 'und' represent languages, which they do not. Secondly, 'nor' would be a single language, even though according to 639 itself Norwegian can be split into 'nno' and 'nob'. Another problem is 'Chinese' (chi). It's not clear which language this is supposed to be, and the Ethnologue mapping splits it into 12 different languages. Similarly, Ethnologue splits 'Banda' (bad) into 16 languages, which is more than it splits 'Bamileke languages' into, and so it's tempting to conclude that that is a language group, too. In other words, we could interpret the text as you write, and I think that might even be the intended interpretation, but it leads us into all manner of controversies that I would prefer not to have in the published subject set itself. Instead, I could publish my Ethnologue-based type assignments as an individual, and you could publish your assignments based on the 4.1.1 heuristic, and people could use whichever they preferred. | BTW, the normative text also indicates that the 3-letter | bibliographic codes are the most stable, so they should be used for | the subject indicators. This seems to already be the case, | fortunately. It is, and precisely for that reason. It's a pity that 639 doesn't have stable numeric codes like 3166 has, but on the other hand they seem to make very few changes in practice. -- Lars Marius Garshol, Ontopian <URL: http://www.ontopia.net > ISO SC34/WG3, OASIS GeoLang TC <URL: http://www.garshol.priv.no >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC