OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [office] Conforming OpenDocument Text Document, etc.


I share Andreas's concern, although I certainly support the idea of having
specific document-type determination, and if it takes a conformance target
to do that (because of upward compatibility issues, etc.), I can live with
that too.

I do think we need to be careful about file extensions.

1. For ODF 1.2, I think we should consider requiring strict associations of
MIME types (and the mimetype Zip item value and the office:mimetype
attribute value) with document types in the sense that the content of the
<office:body> element is strictly determined for the main document types and
the templates for those types.  This is for all ODF document conformance
targets.

2. I think the use of filename extensions should be a recommended practice,
but not a strict requirement at any level of document conformance target.

3. With regard to the document-type-specific conformance targets, I think
the those should be defined as supplementing the basic document-type
determination with specific features that shall be supported (and shall not
be supported, if it comes to that).


 - Dennis

ADDITIONAL ANALYSIS

It is true that file extensions are usable on some operating systems as a
way to automatically associate a file with the *default* application to be
used to consume and to process it.  Later versions of that common operating
system will also maintain alternative associations for a given extension.
(For example, on the machine where I am writing this, *.odt is associated
with OO.o 3.0 by default and immediately available by right-click selection
are swriter (actually OO.o 2.4) and Zip.

Also, it is not unusual for applications to be indifferent to the filename
extensions of the files they are given to consume, instead inspecting the
file itself for appropriate treatment.  OO.o does this.  If I give OO.o Text
an *.odt that is really a Calc sheet, Calc opens with it.  (I had that
happen by accident where a download was misnamed.)  Similarly, Microsoft
Office often doesn't care what the extension is, delving into the content to
determine what the document format is.  

As an interesting side note, the OOXML specification doesn't mention a file
naming convention of any kind.  I have tried to trick Office 2007 by lying
about file extensions and I have failed miserably in the case of OOXML
documents.

For ODF 1.1, the filename extension is not normative in any way (and
unfortunately, neither are the MIME types).  That is a mistake in the case
of the MIME types because they are instrumental in determining what single
content element there must be for the <office:body> element.

Also, for ODF 1.1, the mimetype Zip item is not required to be in the
favored magic-number position.  (I was surprised to learn that, just
recently).


-----Original Message-----
From: robert_weir@us.ibm.com [mailto:robert_weir@us.ibm.com] 
http://lists.oasis-open.org/archives/office/200903/msg00072.html
Sent: Sunday, March 15, 2009 19:40
To: office@lists.oasis-open.org
Subject: Re: [office] Conforming OpenDocument Text Document, etc.

Andreas J Guelzow <aguelzow@math.concordia.ab.ca> wrote on 03/15/2009 
09:46:09 PM:
http://lists.oasis-open.org/archives/office/200903/msg00071.html
> 
> What is the point of this exercise? The document is what happens to be
> stored in the file. Saying that a user can turn a user can turn a
> Conforming OpenDocument Spreadsheet Document into a non-conforming one
> by simply changing the name of the file is just ridiculous! The mimetype
> can be deduced from the content of the file so why does one have to
> specify that the name of the file (or even part of the name).
> 

The point is you often want to dispatch a document to a particular 
application without first going through the expense of unzipping it and 
parsing the XML.  That is why we've ended up with 4 different mechanisms 
for determining the type of the document.   In order of increasing cost 
for determining the document type, we have:

1a) MIME content type for streamed documents. This is not part of the 
document per-se, but is how a properly-configured web server can indicate 
the type of the document.

1b) The file extension.  This serves the same purpose as 1a, but in the 
file system case.

2) mimetype stream in the package at fixed offset in the file for 
environments where filetype is determined by "magic numbers".  This is 
inexpensive since it doesn't require unzipping or XML parsing.

3) office:mimetype attribute, especially needed for the single XML version 
of ODF.  Requires XML parsing.  Or I suppose you could try to regex it, 
but I bet that approach could be fooled.

4) Duck typing:  "If it walks like a duck and sounds like a duck, then it 
probably is a duck" such as "I have a document, and it has a table and 
some formulas, so I should probably treat it like spreadsheet".  This is 
the most flexible, but also the most expensive technique. 

The point of the conformance proposal was that a spreadsheet should 
consistently indicate its application type and that any inconsistency, at 
least in the document itself (we can't state requires on the web server 
for 1a) would be nonconformant.  If, on the other hand, we don't require 
consistency, then we'll need to define some heuristic for resolving the 
application type.  I think it is a reasonable goal to have some way of 
doing this that does not require unzipping and XML parsing, since such an 
approach is commonly used by operating system GUI's to dispatch to the 
correct application for handling that application type.

To accomplish there we need to either ensure that producers write the data 
consistently, or that consumers use a more complicated heuristic to 
determine document type.  I'm inclined to believe that this will work best 
if we require both.

-Rob



---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]