OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [office] RE: Directories in Zip packages


Hi

I just completed a small experiment.  A small program reading through
a zip package created by openoffice (3.1.1) copying zipentries from
this file to a new zipfile, but leaving out any entry which is either
a directory or of zero length.

As I suspected, besides being marginally (1k) smaller, this remains a
fully acceptable odf file, at least so far as OOo is concerned.  What
is maybe a bit surprizing is that the manifest now refers to entries
which are no  longer in the zip package.  I was expecting a complaint
there but none was forthcoming.  Anyway, being a bit tidier, I
modified the manifest of the newly created file to remove the
redundant entries and I still remain with what seems to be a perfectly
acceptable package with no information loss.

All of which I think reinforce my earlier point.  ZIP may well have
valid use cases for storing directory information and zero length
files, and as Rob points out, the appnote totally allows it.  But from
our perspective (which is to package odf streams into a single archive
rather than to emulate a file system), there seems to be no good
reason to package these types of entries.  And at least one leading
implementation seems not to care if they are not there.  Removing
them, removes ambiguity over what should or not be signed.

So I would say that an odf producer should only produce entries in the
zipfile for non-zero-length streams (this would by default also
excludes directories).  And that each of these shall be referenced in
a full document signature.

An odf consumer, when validating a signature, shall verify that the
signature references all non-zero-length entries in the package.  The
presence of other zipentries in the package could be either ignored or
treated as an error.  Following Postel, I am leaning towards the more
permissive approach.  The benefit of simply ignoring being that it
would allow naive general purpose zip tools to produce valid odf
files, even though they would likely be violating the recommendation
above regarding odf producers.  I think this is reasonable given the
various toolchains people might construct which might involve an
eventual packaging stage using pkzip or something similar.

Regards
Bob

PS.  test files attached

On 27 September 2010 13:58,  <robert_weir@us.ibm.com> wrote:
> I'm not sure I've read all the posts in this thread, but I believe:
>
> 1) ZIP totally allows zip items representing zero-byte files as well as
> items representing empty directories.  The later in particular is quite
> useful in general ZIP usage.  I remember seeing some bugs in the early
> 1990's with some ZIP programs not handling this correctly.  But some uses,
> like self-extracting ZIPs that contain a pre-made empty directory for user
> data, will not work correctly without support for empty directories.
>
> 2) A zero-byte XML file is never correct.  Or at least it doesn't conform
> the the XML Recommendation since it is not well-formed XML.
>
> 3) On the other hand, except for the handful of ODF-defined ZIP items,
> like contents.xml, etc., we don't have any anti-spoofing requirement,
> right?  In other words we don't have a conformance requirement that says
> that content-type in the manifest matches the zip item.  If we had that
> restriction then it would not be conforming to have an zero-byte XML with
> content type text/xml or application/xml.  This would also make
> non-conformant potentially more sinister things like an EXE pretending to
> be image/png and stuff like that.
>
> 4)However, there would be nothing wrong with a zero-byte foo.xml with a
> content type of text/plain or something similar.
>
> 5)Digital signatures apply to the contents of a file.  So you might think
> there is nothing to sign.  But in fact the zip item does bear a name and a
> time stamp, and either of these may bear information that could be harmed
> by tampering.  We cover the name by singing the manifest.  But we don't
> appear to cover tampering with the time stamp.  Of course, this is
> independent of the zero byte issue.
>
> 6) The most straightforward way for someone to implement a generic ODF
> package consumer would be to create a hashtable of each "file" in the ZIP
> and associate it with a record that contains metata on the entry (date,
> zip, compression method, etc.) as well as access to the underlying data.
> This is very simply in most programming languages.  Then when modifying
> and saving the package, I would recreate the manifest and write out all
> the other ZIP items.
>
> My guess is authors of this straightforward approach will often fail to
> properly handle the empty directory case, both on reading and on writing.
> (We have no way of notating an empty directory in the manifest).  So I'd
> favor a recommendation against (should not) or a prohibition against
> (shall not) an ODF package containing empty directories.  We have no need
> of it, and it will probably not work well across implementations.
>
> -Rob
>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>
>

bobtest.odt

newFile.odt

newFile_correctedmanifest.odt



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]