Subject: Re: [office] Conforming OpenDocument Text Document, etc.
Andreas J Guelzow <email@example.com> wrote on 03/15/2009 09:46:09 PM: > > What is the point of this exercise? The document is what happens to be > stored in the file. Saying that a user can turn a user can turn a > Conforming OpenDocument Spreadsheet Document into a non-conforming one > by simply changing the name of the file is just ridiculous! The mimetype > can be deduced from the content of the file so why does one have to > specify that the name of the file (or even part of the name). > The point is you often want to dispatch a document to a particular application without first going through the expense of unzipping it and parsing the XML. That is why we've ended up with 4 different mechanisms for determining the type of the document. In order of increasing cost for determining the document type, we have: 1a) MIME content type for streamed documents. This is not part of the document per-se, but is how a properly-configured web server can indicate the type of the document. 1b) The file extension. This serves the same purpose as 1a, but in the file system case. 2) mimetype stream in the package at fixed offset in the file for environments where filetype is determined by "magic numbers". This is inexpensive since it doesn't require unzipping or XML parsing. 3) office:mimetype attribute, especially needed for the single XML version of ODF. Requires XML parsing. Or I suppose you could try to regex it, but I bet that approach could be fooled. 4) Duck typing: "If it walks like a duck and sounds like a duck, then it probably is a duck" such as "I have a document, and it has a table and some formulas, so I should probably treat it like spreadsheet". This is the most flexible, but also the most expensive technique. The point of the conformance proposal was that a spreadsheet should consistently indicate its application type and that any inconsistency, at least in the document itself (we can't state requires on the web server for 1a) would be nonconformant. If, on the other hand, we don't require consistency, then we'll need to define some heuristic for resolving the application type. I think it is a reasonable goal to have some way of doing this that does not require unzipping and XML parsing, since such an approach is commonly used by operating system GUI's to dispatch to the correct application for handling that application type. To accomplish there we need to either ensure that producers write the data consistently, or that consumers use a more complicated heuristic to determine document type. I'm inclined to believe that this will work best if we require both. -Rob