geolang-comment message

Subject: Re: [geolang-comment] First proposals for ISO 639 and 3166 available

From: Lars Marius Garshol <larsga@garshol.priv.no>
To: geolang-comment@lists.oasis-open.org (GeoLang List)
Date: Thu, 29 Aug 2002 17:20:15 +0200


* Steve Pepper
| 
| Surely whether things "make sense" or not is not for us to decide?
 
I think that in transferring something as messy and unstructured as
ISO 639 into something as ontological as topic maps we have no choice.
To create a published subject set that mandates dubious assertions is
IMHO not at all a good idea.

| We should have two goals:
| 
| (1) to reflect 639 as accurately as possible
| (2) provide a PSI set and formal representations that are maximally
|      convenient for users

Agreed.
 
| Consideration (1) tells me that we should, indeed, create PSIs for
| the concepts of "language" and "language collection", [...]

They are in the proposal already.

| Consideration (2) tells me that we should be very careful when
| creating the formal representations. 

I certainly agree.

| How about, for example, building TWO topic maps:
| 
| (1) One that merely declares one topic for each language for which
| 639 has a language code, and includes both English and French names,
| and occurrences for the alpha codes -- i.e. exactly as in
| 639-basic.xtm in its current version, but with the alpha codes in
| addition, and (possibly) without the "language" and "language
| group/collection" topics.

I agree with this, though I would not take out the two typing topics.

| (2) One that declares the "language" and "language group/collection"
| topics and otherwise consists of a set of class-instance
| associations between the topics declared in (1) and the classes to
| which 639 (rightly or wrongly) asserts that they belong (let's call
| this one 639-classes.xtm).

You would want to do this despite the problems that I pointed out
above? If so, why?
 
| Most people would only have a use for (1), but we would at least
| have fulfilled our goal of formally capturing all the assertions
| made in 639, including typing. Those that have a use for the latter,
| would simply use both 639-basic.xtm and 639-classes.xtm.

It would make life very slightly easier for people, but at the expense
of us making assertions that we know are nonsensical. If we skipped
this we could still provide the same files, but now unofficially, and
there would be a choice of which one to use.
 
| I would say that "the published subject set itself" is really only
| the set of published subject indicators and corresponding published
| subject identifiers. That is, it doesn't include the formal
| representations in XTM and RDF, which are only provided for
| convenience.

I have one answer to that, but it has two parts. :)

Firstly, it's up to us to decide what is and is not part of the
published subject set (we're the publishers, after all). So one way
around this may be to just declare that the XTM files are not
"normative" (whatever that means in this context).
 
Secondly, regardless of what we say formally, most people will
consider whatever we publish as part of the PSI set to be canon, and
just use it. Do we really want to lead these innocently trusting souls
astray? 

| We certainly want to encourage people other than the TC to publish
| useful mappings, such as to Ethnologue, so I agree with the first
| part of this statement. But I don't see any reason why the TC itself
| shouldn't look after the classification indicated by 4.1.1, provided
| it is done in such a way that those people that don't need it (or
| disagree with it) can easily ignore it.

The reason we shouldn't do it is that if we did we would follow a
not-very-well-crafted standard to the letter, only to end up with a
not-very-good result. I feel we need some positive reason *for* doing
that, rather than reasons why people might ignore the result when they
know it is sub-optimal.

After all, they would still be able to ignore it if we didn't publish
the type assignments.

| Actually, the same line of argument (ease of use, the ability to get
| at just those parts of the formal representation that are actually
| needed for a particular application) could be used to justify
| splitting 639-basic.xtm into multiple topic maps. That way people
| can easily get (say) just the French names without all the other
| assertions (English names, 2- and 3 letter codes) coming along as
| unwanted baggage. But do we really want to go that far?

I was thinking the same about English/French, but I decided that would
be going too far. Ease of use doesn't really apply here, however.
Having English names hang around unused in your French application is
not so bad, but having incorrect types assigned to some of your
instances in ways that require specialist knowledge to correct *is*
bad.

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
ISO SC34/WG3, OASIS GeoLang TC        <URL: http://www.garshol.priv.no >

Follow-Ups:
- Re: [geolang-comment] First proposals for ISO 639 and 3166 available
  - From: John Cowan <jcowan@reutershealth.com>
- Re: [geolang-comment] First proposals for ISO 639 and 3166 available
  - From: Murray Altheim <m.altheim@open.ac.uk>

References:
- Re: [geolang-comment] First proposals for ISO 639 and 3166 available
  - From: Steve Pepper <pepper@ontopia.net>