[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: natural language
So what is the answer? Can someone make a proposal? --lisa At 01:43 PM 03/30/2000 -0600, you wrote: > > >On Thu, 30 Mar 2000, Yutaka Yoshida wrote: > >> >> > Date: Thu, 30 Mar 2000 12:49:53 -0600 (CST) >> > From: Robin Cover <robin@isogen.com> >> > >> > >> > The designation of language encoding for machine purposes is even >> > more critical, as we all know. Here, it's necessary to isolate >> > language from script (Hebrew can be written in Arabic), and other >> > aspects of writing systems. >> >> Sorry, I don't understand what you said. Could you explain a little more? >> What I meant by encoding was 'encoding scheme', such as iso8859-1, >> eucjp, gb2312, etc. In that sense, for a computational purpose, >> it doesn't matter what script is used. Hebrew is 8859-8 and Arabic is >> 8859-6, so we can process the content correctly if we knew those >> encodings. >> >> regards, >> yuta >> >This is probably off topic. I'm talking about natural language >processing based upon linguistic features of written text. When >a word/phrase is transliterated or borrowed from one >language into another (as when a Hebrew word is written in >Arabic script), the word/phrase in the new context has >linguistic properties that cannot be deduced from the encoding >or script. While simple display might work (direction of >character flow, kerning, etc.), other processing would >fail (correct word wrap, spell checking, thesaurus, and >so forth). In brief: a script or encoding does not always >tell you what language the text is "in". This is why (mere) >"localization" does not work, of itself, in a >multilingual setting. Multilingualism necessitates true >linguistic knowledge, where "internationalization" >often (believes it) does not. > >Robin Cover > > > > Lisa J. Carnahan National Institute of Standards and Technology Information Technology Laboratory Room 562, Bldg. 820 Gaithersburg, Md. 20899 USA lisa.carnahan@nist.gov (301) 975-3362 voice (301) 948-6213 fax
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC