OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

topicmaps-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: Re: [topicmaps-comment] multilingual thesaurus - language, scope,and topic naming constraint


At 19:06 31/01/2002 +0100, Bernard Vatant wrote:
>Folks
>
>I need a little help from my friends here ...
>
>I'm currently working on GEMET, a multilingual thesaurus published by the 
>European
>Environment Agency (over 9000 terms in 18 european languages, with 
>references to hundreds
>of sources ...) and trying to provide an XTM version ... a challenge ...
>
>I stumbled on a case where different descriptors have the same name in 
>some languages, and
>different ones in some others.
>For example, compare the two following descriptors, and their names in six 
>languages given
>by the thesaurus.
>
>topic 1: "The social study of the production, distribution, and 
>consumption of wealth."
>
>DAN : økonomi
>DUT : economie
>ENG : economics
>FRE : science économique
>GER : Ökonomie
>SPA : economía
>
>topic 2: "The system of activities and administration through which a 
>society uses its
>resources to produce wealth."
>
>DAN : økonomi
>DUT : economie
>ENG : economy
>FRE : économie
>GER : Wirtschaft
>SPA : economía
>
>It looks like english, german and french makes the difference, whereas 
>dutch, danish and
>spanish clearly don't ... although I'm pretty sure they do distinguish the 
>concepts
>(social science vs economical system) but it does not show in the names 
>the Thesaurus
>provides.

This is a classic example of the TNC causing a problem - the fact is that 
in certain languages, the same word will be used to signify multiple 
concepts. In this case, it seems obvious to me that the experts who created 
the thesaurus have identified two distinct abstract concepts and I would 
argue that it is important to retain them both.

>If I use languages as scopes - which is usual - how will the topic naming 
>constraint
>apply? Should my TM engine merge topic1 and topic2, because they have the 
>same name in the
>scope "SPA"? Does not make sense ...
>
>Suggestions?

Some ideas of the top of my head:

1) Select one language for the basename, and make all others variants - 
this is undesirable because if you choose Spanish as your "base" language 
you will still end up with an undesired merge.

2) If the thesaurus provides any notion of different contexts for the two 
definitions, use that context to create  a scope for all base names.

3) Don't use a base name! Just make your name strings occurrences and scope 
those by language.

4) Use some (probably unattractive) unique identifier as the basename (e.g. 
the thesaurus entry number suitably prefixed). Then add the entries as 
variants scoped as XTM-displayable, XTM-sortable and by language.

5) Don't apply the TNC...sorry, couldn't resist that one ;-)

Depending upon the nature of the processing you will be doing later, or the 
nature of the environment into which the topic map will be deployed, I 
would choose either (3), (4) or (5).

Cheers,

Kal



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC