[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: [topicmaps-comment] multilingual thesaurus - language, scope,and topic naming constraint
At 19:06 31/01/2002 +0100, Bernard Vatant wrote: >Folks > >I need a little help from my friends here ... > >I'm currently working on GEMET, a multilingual thesaurus published by the >European >Environment Agency (over 9000 terms in 18 european languages, with >references to hundreds >of sources ...) and trying to provide an XTM version ... a challenge ... > >I stumbled on a case where different descriptors have the same name in >some languages, and >different ones in some others. >For example, compare the two following descriptors, and their names in six >languages given >by the thesaurus. > >topic 1: "The social study of the production, distribution, and >consumption of wealth." > >DAN : økonomi >DUT : economie >ENG : economics >FRE : science économique >GER : Ökonomie >SPA : economía > >topic 2: "The system of activities and administration through which a >society uses its >resources to produce wealth." > >DAN : økonomi >DUT : economie >ENG : economy >FRE : économie >GER : Wirtschaft >SPA : economía > >It looks like english, german and french makes the difference, whereas >dutch, danish and >spanish clearly don't ... although I'm pretty sure they do distinguish the >concepts >(social science vs economical system) but it does not show in the names >the Thesaurus >provides. This is a classic example of the TNC causing a problem - the fact is that in certain languages, the same word will be used to signify multiple concepts. In this case, it seems obvious to me that the experts who created the thesaurus have identified two distinct abstract concepts and I would argue that it is important to retain them both. >If I use languages as scopes - which is usual - how will the topic naming >constraint >apply? Should my TM engine merge topic1 and topic2, because they have the >same name in the >scope "SPA"? Does not make sense ... > >Suggestions? Some ideas of the top of my head: 1) Select one language for the basename, and make all others variants - this is undesirable because if you choose Spanish as your "base" language you will still end up with an undesired merge. 2) If the thesaurus provides any notion of different contexts for the two definitions, use that context to create a scope for all base names. 3) Don't use a base name! Just make your name strings occurrences and scope those by language. 4) Use some (probably unattractive) unique identifier as the basename (e.g. the thesaurus entry number suitably prefixed). Then add the entries as variants scoped as XTM-displayable, XTM-sortable and by language. 5) Don't apply the TNC...sorry, couldn't resist that one ;-) Depending upon the nature of the processing you will be doing later, or the nature of the environment into which the topic map will be deployed, I would choose either (3), (4) or (5). Cheers, Kal
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC