office message

Subject: Re: [office] Conforming OpenDocument Text Document, etc.

From: robert_weir@us.ibm.com
To: office@lists.oasis-open.org
Date: Sun, 15 Mar 2009 22:39:33 -0400

Andreas J Guelzow <aguelzow@math.concordia.ab.ca> wrote on 03/15/2009 
09:46:09 PM:
> 
> What is the point of this exercise? The document is what happens to be
> stored in the file. Saying that a user can turn a user can turn a
> Conforming OpenDocument Spreadsheet Document into a non-conforming one
> by simply changing the name of the file is just ridiculous! The mimetype
> can be deduced from the content of the file so why does one have to
> specify that the name of the file (or even part of the name).
> 

The point is you often want to dispatch a document to a particular 
application without first going through the expense of unzipping it and 
parsing the XML.  That is why we've ended up with 4 different mechanisms 
for determining the type of the document.   In order of increasing cost 
for determining the document type, we have:

1a) MIME content type for streamed documents. This is not part of the 
document per-se, but is how a properly-configured web server can indicate 
the type of the document.

1b) The file extension.  This serves the same purpose as 1a, but in the 
file system case.

2) mimetype stream in the package at fixed offset in the file for 
environments where filetype is determined by "magic numbers".  This is 
inexpensive since it doesn't require unzipping or XML parsing.

3) office:mimetype attribute, especially needed for the single XML version 
of ODF.  Requires XML parsing.  Or I suppose you could try to regex it, 
but I bet that approach could be fooled.

4) Duck typing:  "If it walks like a duck and sounds like a duck, then it 
probably is a duck" such as "I have a document, and it has a table and 
some formulas, so I should probably treat it like spreadsheet".  This is 
the most flexible, but also the most expensive technique. 

The point of the conformance proposal was that a spreadsheet should 
consistently indicate its application type and that any inconsistency, at 
least in the document itself (we can't state requires on the web server 
for 1a) would be nonconformant.  If, on the other hand, we don't require 
consistency, then we'll need to define some heuristic for resolving the 
application type.  I think it is a reasonable goal to have some way of 
doing this that does not require unzipping and XML parsing, since such an 
approach is commonly used by operating system GUI's to dispatch to the 
correct application for handling that application type.

To accomplish there we need to either ensure that producers write the data 
consistently, or that consumers use a more complicated heuristic to 
determine document type.  I'm inclined to believe that this will work best 
if we require both.

-Rob

Follow-Ups:
- RE: [office] Conforming OpenDocument Text Document, etc.
  - From: "Dennis E. Hamilton" <dennis.hamilton@acm.org>
- Re: [office] Conforming OpenDocument Text Document, etc.
  - From: Andreas J Guelzow <aguelzow@math.concordia.ab.ca>

References:
- Conforming OpenDocument Text Document, etc.
  - From: robert_weir@us.ibm.com
- Re: [office] Conforming OpenDocument Text Document, etc.
  - From: Andreas J Guelzow <aguelzow@math.concordia.ab.ca>