OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: word-count (was Re: [office] Data Grid Size element proposal)


"Andreas J. Guelzow" <aguelzow@math.concordia.ab.ca> wrote on 11/24/2008 
05:28:30 PM:
> 
> Warren,
> 
> It seems to me that your main argument against inclusion of an indicator
> of how large the grid was when the file was saved is that programs can
> figure it out parsing all formulae and seeing whether a cell outside the
> current range is referenced.
> 
> If this is seen as a valid reason against this inclusion I wonder why
> items such as 
> <text:page-count> 
> <text:paragraph-count> 
> <text:word-count> 
> <text:character-count> etc
> are included in the standard? All of that information can be counted
> even faster than parsing all formulas.
> 


Oh, that reminds me.  We need to better define word-count and 
character-count.  This came up at the interoperability workshop we 
recently had in Beijing.  We noticed that OpenOffice and Symphony returned 
different results for these counts for the same document.  It may come 
down to whether text in headers/footers, captions, footnotes etc. are 
counted in the totals or not.

I assume the main use of this is for those who have page number limits or 
who are paid by the word.  If so we should probably adopt whatever 
conventions are most common there.  Anyone have an idea what the common 
practice is, if there is one?

Also, I'm not sure that the concept of "word" is clearly stated.  It is 
the kind of thing linguists go crazy about.  We probably want to state 
explicitly that the word-count refers to "orthographic words",  which are 
the groups of letters delimited by whitespace.  This works fine for modern 
languages.  The ancient Greeks wrote their texts without spaces between 
words.  Nothing we can do about that.  Even human experts have arguments 
over where the word breaks go in those texts.  So we can't expect a word 
processor to figure it out.

Regards,

-Rob


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]