OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [office] word-count


Hi Robert,

On Monday, 2008-11-24 17:56:13 -0500, Robert Weir wrote:

> Also, I'm not sure that the concept of "word" is clearly stated.  It is 
> the kind of thing linguists go crazy about.  We probably want to state 
> explicitly that the word-count refers to "orthographic words",  which are 
> the groups of letters delimited by whitespace.  This works fine for modern 
> languages.  The ancient Greeks wrote their texts without spaces between 
> words.  Nothing we can do about that.  Even human experts have arguments 
> over where the word breaks go in those texts.  So we can't expect a word 
> processor to figure it out.

There are also "modern" languages that don't use whitespace at all, such
as Khmer, for example. Writers _may_ insert the ZWSP U+200B character to
help a word processing application. The situation for CTL languages may
be completely different from what most are used to or would call
"normal". But also in Western languages there may be differences whether
constructs such as here-it-is counts as one word or more. This may vary
between languages. I think we should not define how word count is to be
computed, and not state anything that may be correct for some languages
but wrong for others.

  Eike

-- 
 OpenOffice.org / StarOffice Calc core developer and i18n transpositionizer.
 SunSign   0x87F8D412 : 2F58 5236 DB02 F335 8304  7D6C 65C9 F9B5 87F8 D412
 OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS

PGP signature



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]