OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [office] word-count (was Re: [office] Data Grid Size elementproposal)


Rob,

robert_weir@us.ibm.com wrote:
> "Andreas J. Guelzow" <aguelzow@math.concordia.ab.ca> wrote on 11/24/2008 
> 05:59:28 PM:
>   
>> But why is this information saved in the file? 
>>     
>
> The reason is probably lost in the mists of time.  This information has 
> always been stored in documents going back to precursor binary formats 
> since the late 1980's.  It may come from common librarial practice, where 
> records of hard-copy resources would include the size of the book (in 
> inches and in pages) as well as title, author, subject, etc.  My guess is 
> they added word-count to electronic documents as an analogue to that 
> practice.  Remember in those days as well, the document format itself 
> might be proprietary and undocumented, but in Windows at least it was 
> common to store the metatadata as OLE Properties, which could be quickly 
> retrieved without understanding the underlying document format.  So that 
> would be useful for search engines, document servers, etc., and any other 
> programs that operated on the document metadata.
>
> But that is all in the past.  The same constraints don't necessarily exist 
> today.  In particular, with a standard document format, the entire 
> contents of the document is open for reading/scanning, not just the 
> metadata.
>
> On the other hand, I don't see any reason to remove these features from 
> ODF, since there may be applications that use them.
>
>   
I would not advocate their removal but I have posted notes in the latest 
drafts about the need to specify definitions to accompany these counts. 
Simply saying word and character count is insufficient once you move 
beyond English. Saying that they are locale specific would be better, 
although that does seem to push the question of interoperability off 
into locale. Which may be the best we can do.

We could expend a lot of resources trying to duplicate what is already 
standardized for some locales and probably not do as good a job for 
locales where such standards don't exist.

I do think we need to specify these and other features (sort 
ascending/descending for example) are explicitly locale specific.

Personally I find that unsatisfactory as different applications will 
have different levels of locale support and that has a negative impact 
on interoperability. But if we have to draw a boundary around our 
concerns as a format, I think using locale is at least defensible if not 
really satisfactory.

Hope you are having a great day!

Patrick

PS: I will be off line (traveling) most of the rest of today. Back 
online tomorrow.

-- 
Patrick Durusau 
patrick@durusau.net 
Chair, V1 - US TAG to JTC 1/SC 34 
Convener, JTC 1/SC 34/WG 3 (Topic Maps) 
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300 
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps) 




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]