office message

Subject: Re: [office] Encryption and data leakage
From: Bob Jolliffe <bobjolliffe@gmail.com>
To: Malte Timmermann <Malte.Timmermann@sun.com>
Date: Wed, 12 May 2010 10:22:08 +0100
On 12 May 2010 10:06, Malte Timmermann <Malte.Timmermann@sun.com> wrote:
> I agree (except for file names, at least OOo doesn't keep file names,
> but in the end the spec doesn't guide what to do anyway).
>
> I would like to avoid way 1+2, but to directly go with way 3.
>
> a) It's IMHO the better approach
> b) Don't introduce interim changes, This breaks compatibility twice, and
> I am even not not sure if somebody would implement them at all.
>
> Another not yet mentioned approach would be to use/allow standard zip
> encryption including directory encryption (instead of way 1+2, but not
> as a replacement for 3).
> But I don't know if this would allow for different algorithms, nor do I
> know if the standard zip encryption is considered to be strong.
> I guess there are reasons that it hasn't be considered for ODF
> encryption from the beginning...

I think there may also be IP concerns about using "standard" zip
encryption.  From the application note:

"X. Incorporating PKWARE Proprietary Technology into Your Product
----------------------------------------------------------------

PKWARE is committed to the interoperability and advancement of the
.ZIP format.  PKWARE offers a free license for certain technological
aspects described above under certain restrictions and conditions.
However, the use or implementation in a product of certain technological
aspects set forth in the current APPNOTE, including those with regard to
strong encryption, patching, or extended tape operations requires a
license from PKWARE.  Please contact PKWARE with regard to acquiring
a license."

Regards
Bob

>
> Malte.
>
> robert_weir@us.ibm.com wrote, On 05/11/10 19:42:
>> The approach we inherited from ODF 1.1 encrypts each file in the ZIP
>> independently.  Although the contents of the files are not viewable due to
>> the encryption, there are bits of information that  potential "leak", such
>> as:
>>
>> 1) The file size
>> 2) The file date
>> 3) The file name
>> 4) The file mime type
>> 5) The hash of the first 1024 bytes of the file
>>
>> For example, even in an encrypted document I could see a file name called
>> "big-secret-takeover-june-3.jpg" and know some information that the person
>> who wrote the encrypted document might be rather surprised to see in the
>> open.
>>
>> Although not required by ODF, an implementation, if it is clever, can
>> avoid some of these leakages.  For example, the timestamp of the file can
>> be turned into the time of encryption rather than the original time stamp.
>>  And the file name can be randomized rather than indicate the original
>> file name.  This might be fine for ODF, since these time stamps and file
>> names are not necessary to be preserved.  So long as as we preserve
>> referential integrity of the package, the names of images are not
>> significant.
>>
>> However we still should be concerned here.  First, the reason we split
>> Part 3 into its own part was the believe that it could be useful for
>> purposes other than just ODF 1.2.  Many of us hoped that it would other
>> uses.  But I don't think we can assume that all uses can ignore the
>> original file names and time stamps.  These might be significant for some
>> uses.
>>
>> Second, even within ODF, especially if we allow package extensions,  we
>> might see items added to packages where the names of files (which may
>> ultimately end user-defined) cannot safely be renamed to random names. For
>> example, there may be referential integrity constraints that a generic ODF
>> processor is not aware of.  Maybe there is RDF that points to a contained
>> image or other package resource.  In any case, the approach is very
>> fragile.
>>
>> Finally, even without extensions, and with the use of randomized names, we
>> still leak information, based on knowing the size and hash of the first
>> 1024 bytes of the file.  For example, if I have a copy of "
>> big-secret-takeover-june-3.jpg" I can easily check to see what encrypted
>> documents also contain that same image.  I can similarly probe for any
>> other resource where I know in advance its size and or contents.
>>
>> There are three ways of getting around this problem.  (Or at least two
>> that come to mind).  One is to keep a "shadow directory" for the ZIP, that
>> contains the original names, time stamps, and sizes of the files.  Encrypt
>> this  "shadow directory" when the document is encrypted.  For example
>> encrypted file, prepend it with some random bytes (not sure what is
>> optimal) in order to prevent data leakage of original size and hash of
>> first 1024 bytes.
>>
>> Another approach is to encode the original full path of the file, appended
>> with its timestamp, using the original derived key, base64 encode that,
>> and then write that out as the full path for the ZIP entry. That way you
>> do not need another file in the ZIP.
>>
>> The other way is to move to a whole-package encryption method, rather than
>> trying to do this file-by-file.
>>
>> -Rob
>>
>> ---------------------------------------------------------------------
>> To unsubscribe from this mail list, you must leave the OASIS TC that
>> generates this mail.  Follow this link to all your TCs in OASIS at:
>> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>>
>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>
>
Follow-Ups:
- RE: [office] Encryption and data leakage
  - From: David LeBlanc <dleblanc@exchange.microsoft.com>
- Re: [office] Encryption and data leakage
  - From: robert_weir@us.ibm.com
References:
- Encryption and data leakage
  - From: robert_weir@us.ibm.com
- Re: [office] Encryption and data leakage
  - From: Malte Timmermann <Malte.Timmermann@Sun.COM>