[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [office] Conforming OpenDocument Text Document, etc.
I share Andreas's concern, although I certainly support the idea of having specific document-type determination, and if it takes a conformance target to do that (because of upward compatibility issues, etc.), I can live with that too. I do think we need to be careful about file extensions. 1. For ODF 1.2, I think we should consider requiring strict associations of MIME types (and the mimetype Zip item value and the office:mimetype attribute value) with document types in the sense that the content of the <office:body> element is strictly determined for the main document types and the templates for those types. This is for all ODF document conformance targets. 2. I think the use of filename extensions should be a recommended practice, but not a strict requirement at any level of document conformance target. 3. With regard to the document-type-specific conformance targets, I think the those should be defined as supplementing the basic document-type determination with specific features that shall be supported (and shall not be supported, if it comes to that). - Dennis ADDITIONAL ANALYSIS It is true that file extensions are usable on some operating systems as a way to automatically associate a file with the *default* application to be used to consume and to process it. Later versions of that common operating system will also maintain alternative associations for a given extension. (For example, on the machine where I am writing this, *.odt is associated with OO.o 3.0 by default and immediately available by right-click selection are swriter (actually OO.o 2.4) and Zip. Also, it is not unusual for applications to be indifferent to the filename extensions of the files they are given to consume, instead inspecting the file itself for appropriate treatment. OO.o does this. If I give OO.o Text an *.odt that is really a Calc sheet, Calc opens with it. (I had that happen by accident where a download was misnamed.) Similarly, Microsoft Office often doesn't care what the extension is, delving into the content to determine what the document format is. As an interesting side note, the OOXML specification doesn't mention a file naming convention of any kind. I have tried to trick Office 2007 by lying about file extensions and I have failed miserably in the case of OOXML documents. For ODF 1.1, the filename extension is not normative in any way (and unfortunately, neither are the MIME types). That is a mistake in the case of the MIME types because they are instrumental in determining what single content element there must be for the <office:body> element. Also, for ODF 1.1, the mimetype Zip item is not required to be in the favored magic-number position. (I was surprised to learn that, just recently). -----Original Message----- From: email@example.com [mailto:firstname.lastname@example.org] http://lists.oasis-open.org/archives/office/200903/msg00072.html Sent: Sunday, March 15, 2009 19:40 To: email@example.com Subject: Re: [office] Conforming OpenDocument Text Document, etc. Andreas J Guelzow <firstname.lastname@example.org> wrote on 03/15/2009 09:46:09 PM: http://lists.oasis-open.org/archives/office/200903/msg00071.html > > What is the point of this exercise? The document is what happens to be > stored in the file. Saying that a user can turn a user can turn a > Conforming OpenDocument Spreadsheet Document into a non-conforming one > by simply changing the name of the file is just ridiculous! The mimetype > can be deduced from the content of the file so why does one have to > specify that the name of the file (or even part of the name). > The point is you often want to dispatch a document to a particular application without first going through the expense of unzipping it and parsing the XML. That is why we've ended up with 4 different mechanisms for determining the type of the document. In order of increasing cost for determining the document type, we have: 1a) MIME content type for streamed documents. This is not part of the document per-se, but is how a properly-configured web server can indicate the type of the document. 1b) The file extension. This serves the same purpose as 1a, but in the file system case. 2) mimetype stream in the package at fixed offset in the file for environments where filetype is determined by "magic numbers". This is inexpensive since it doesn't require unzipping or XML parsing. 3) office:mimetype attribute, especially needed for the single XML version of ODF. Requires XML parsing. Or I suppose you could try to regex it, but I bet that approach could be fooled. 4) Duck typing: "If it walks like a duck and sounds like a duck, then it probably is a duck" such as "I have a document, and it has a table and some formulas, so I should probably treat it like spreadsheet". This is the most flexible, but also the most expensive technique. The point of the conformance proposal was that a spreadsheet should consistently indicate its application type and that any inconsistency, at least in the document itself (we can't state requires on the web server for 1a) would be nonconformant. If, on the other hand, we don't require consistency, then we'll need to define some heuristic for resolving the application type. I think it is a reasonable goal to have some way of doing this that does not require unzipping and XML parsing, since such an approach is commonly used by operating system GUI's to dispatch to the correct application for handling that application type. To accomplish there we need to either ensure that producers write the data consistently, or that consumers use a more complicated heuristic to determine document type. I'm inclined to believe that this will work best if we require both. -Rob --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php