office message

Subject: Re: [office] Encryption and data leakage
From: Malte Timmermann <Malte.Timmermann@Sun.COM>
To: robert_weir@us.ibm.com
Date: Wed, 12 May 2010 15:44:41 +0200
Added some pros/cons inline.

robert_weir@us.ibm.com wrote, On 05/12/10 15:16:
> So what are our options for #3?
> 
> Option 1) ZIP in a ZIP.  So create document as if it is encrypted, then 
> encrypt that as one file and STORE it in a container ZIP file that has 
> manifest, mimetype and nothing else.  That manifest lists the encryption 
> parameters for the "inner" zip.
> 
> PRO: 
> 
> Data leakage concerns go away. 
> 
> Better interaction with digital signatures.
> 
> Simplifies the specification.  We don't need to talk about pre-compressing 
> before encrypting.  That happens automatically.

And: Not only the signatures, but no other feature at all need to know
anything about encryption (except the encryption feature itself)

> 
> CON: 
> 
> Will this be slower because of the double ZIP?  I'm not quite sure.  I 
> think it might actually be faster because encrypting one big stream should 
> be faster than encrypting many smaller streams.  This is worth testing.

Might need more memory, in case you want to keep the decrypted zip in a
memory stream to avoid writing it to temp

> 
> There is no opportunity for selective encryption.  For example, cannot 
> decide to expose metadata but not content.  But this is not typical.  And 
> if really needed we could allow metadata to be shadowed in the outer 
> container.
> 
> Option 2) Don't have two-levels of ZIP, but maintain a shadow directory 
> that is encrypted along with the concatenation of the files in the stream, 
> maybe using the Unix tar method.
> 
> PRO:  Not sure it has advantages over 1)
> 
> CON: Requires us to specify more, specifically our own conventions for a 
> pre-compression, pre-encryption compound file.

And it doesn't solve the issue with encrypting a singed document

Malte.

> 
> Option 3)  Is there an option 3?
> 
> -Rob
> 
> Malte.Timmermann@Sun.COM wrote on 05/12/2010 05:06:09 AM:
> 
>> From:
>>
>> Malte Timmermann <Malte.Timmermann@Sun.COM>
>>
>> To:
>>
>> robert_weir@us.ibm.com
>>
>> Cc:
>>
>> office@lists.oasis-open.org
>>
>> Date:
>>
>> 05/12/2010 05:10 AM
>>
>> Subject:
>>
>> Re: [office] Encryption and data leakage
>>
>> Sent by:
>>
>> Malte.Timmermann@Sun.COM
>>
>> I agree (except for file names, at least OOo doesn't keep file names,
>> but in the end the spec doesn't guide what to do anyway).
>>
>> I would like to avoid way 1+2, but to directly go with way 3.
>>
>> a) It's IMHO the better approach
>> b) Don't introduce interim changes, This breaks compatibility twice, and
>> I am even not not sure if somebody would implement them at all.
>>
>> Another not yet mentioned approach would be to use/allow standard zip
>> encryption including directory encryption (instead of way 1+2, but not
>> as a replacement for 3).
>> But I don't know if this would allow for different algorithms, nor do I
>> know if the standard zip encryption is considered to be strong.
>> I guess there are reasons that it hasn't be considered for ODF
>> encryption from the beginning...
>>
>> Malte.
>>
>> robert_weir@us.ibm.com wrote, On 05/11/10 19:42:
>>> The approach we inherited from ODF 1.1 encrypts each file in the ZIP 
>>> independently.  Although the contents of the files are not viewable 
> due to 
>>> the encryption, there are bits of information that  potential "leak", 
> such 
>>> as:
>>>
>>> 1) The file size
>>> 2) The file date
>>> 3) The file name
>>> 4) The file mime type
>>> 5) The hash of the first 1024 bytes of the file
>>>
>>> For example, even in an encrypted document I could see a file name 
> called 
>>> "big-secret-takeover-june-3.jpg" and know some information that the 
> person 
>>> who wrote the encrypted document might be rather surprised to see in 
> the 
>>> open.
>>>
>>> Although not required by ODF, an implementation, if it is clever, can 
>>> avoid some of these leakages.  For example, the timestamp of the file 
> can 
>>> be turned into the time of encryption rather than the original time 
> stamp. 
>>>  And the file name can be randomized rather than indicate the original 
> 
>>> file name.  This might be fine for ODF, since these time stamps and 
> file 
>>> names are not necessary to be preserved.  So long as as we preserve 
>>> referential integrity of the package, the names of images are not 
>>> significant.
>>>
>>> However we still should be concerned here.  First, the reason we split 
> 
>>> Part 3 into its own part was the believe that it could be useful for 
>>> purposes other than just ODF 1.2.  Many of us hoped that it would 
> other 
>>> uses.  But I don't think we can assume that all uses can ignore the 
>>> original file names and time stamps.  These might be significant for 
> some 
>>> uses. 
>>>
>>> Second, even within ODF, especially if we allow package extensions, we 
> 
>>> might see items added to packages where the names of files (which may 
>>> ultimately end user-defined) cannot safely be renamed to random names. 
> For 
>>> example, there may be referential integrity constraints that a generic 
> ODF 
>>> processor is not aware of.  Maybe there is RDF that points to a 
> contained 
>>> image or other package resource.  In any case, the approach is very 
>>> fragile.
>>>
>>> Finally, even without extensions, and with the use of randomized 
> names, we 
>>> still leak information, based on knowing the size and hash of the 
> first 
>>> 1024 bytes of the file.  For example, if I have a copy of "
>>> big-secret-takeover-june-3.jpg" I can easily check to see what 
> encrypted 
>>> documents also contain that same image.  I can similarly probe for any 
> 
>>> other resource where I know in advance its size and or contents. 
>>>
>>> There are three ways of getting around this problem.  (Or at least two 
> 
>>> that come to mind).  One is to keep a "shadow directory" for the ZIP, 
> that 
>>> contains the original names, time stamps, and sizes of the files. 
> Encrypt 
>>> this  "shadow directory" when the document is encrypted.  For example 
>>> encrypted file, prepend it with some random bytes (not sure what is 
>>> optimal) in order to prevent data leakage of original size and hash of 
> 
>>> first 1024 bytes.
>>>
>>> Another approach is to encode the original full path of the file, 
> appended 
>>> with its timestamp, using the original derived key, base64 encode 
> that, 
>>> and then write that out as the full path for the ZIP entry. That way 
> you 
>>> do not need another file in the ZIP. 
>>>
>>> The other way is to move to a whole-package encryption method, rather 
> than 
>>> trying to do this file-by-file. 
>>>
>>> -Rob
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe from this mail list, you must leave the OASIS TC that
>>> generates this mail.  Follow this link to all your TCs in OASIS at:
>>> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 
> 
>
References:
- Encryption and data leakage
  - From: robert_weir@us.ibm.com
- Re: [office] Encryption and data leakage
  - From: Malte Timmermann <Malte.Timmermann@Sun.COM>
- Re: [office] Encryption and data leakage
  - From: robert_weir@us.ibm.com