OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

regrep message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]

Subject: Re: natural language

So what is the answer?  Can someone make a proposal?


At 01:43 PM 03/30/2000 -0600, you wrote:
>On Thu, 30 Mar 2000, Yutaka Yoshida wrote:
>>  > Date: Thu, 30 Mar 2000 12:49:53 -0600 (CST)
>>  > From: Robin Cover <robin@isogen.com>
>>  > 
>>  > 
>>  > The designation of language encoding for machine purposes is even
>>  > more critical, as we all know.  Here, it's necessary to isolate
>>  > language from script (Hebrew can be written in Arabic), and other
>>  > aspects of writing systems.
>>  Sorry, I don't understand what you said. Could you explain a little more?
>>  What I meant by encoding was 'encoding scheme', such as iso8859-1,
>>  eucjp, gb2312, etc. In that sense, for a computational purpose,
>>  it doesn't matter what script is used. Hebrew is 8859-8 and Arabic is
>>  8859-6, so we can process the content correctly if we knew those
>>  encodings.
>>  regards,
>>  yuta
>This is probably off topic.  I'm talking about natural language
>processing based upon linguistic features of written text.  When
>a word/phrase is transliterated or borrowed from one
>language into another (as when a Hebrew word is written in
>Arabic script), the word/phrase in the new context has
>linguistic properties that cannot be deduced from the encoding
>or script.  While simple display might work (direction of
>character flow, kerning, etc.), other processing would
>fail (correct word wrap, spell checking, thesaurus, and
>so forth).  In brief: a script or encoding does not always
>tell you what language the text is "in".  This is why (mere)
>"localization" does not work, of itself, in a
>multilingual setting.  Multilingualism necessitates true
>linguistic knowledge, where "internationalization"
>often (believes it) does not.
>Robin Cover

Lisa J. Carnahan
National Institute of Standards and Technology
Information Technology Laboratory
Room 562, Bldg. 820
Gaithersburg, Md. 20899

(301) 975-3362 voice 
(301) 948-6213 fax

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]

Powered by eList eXpress LLC