office message

Subject: RE: [office] Encryption and data leakage
From: David LeBlanc <dleblanc@exchange.microsoft.com>
To: "robert_weir@us.ibm.com" <robert_weir@us.ibm.com>,"office@lists.oasis-open.org" <office@lists.oasis-open.org>
Date: Tue, 11 May 2010 10:49:15 -0700
An additional leak comes when embedding additional files. I checked our implementation, and an embedded image will get changed to 'image1.jpg', which doesn't reveal the original file name, but when I embed another document into the base document, it is clear from the package contents there is an embedded document, and the type of document would be apparent.

I also agree that making the hash of the first 1024 bytes of the file public information is a fairly serious flaw.

-----Original Message-----
From: robert_weir@us.ibm.com [mailto:robert_weir@us.ibm.com] 
Sent: Tuesday, May 11, 2010 10:43 AM
To: office@lists.oasis-open.org
Subject: [office] Encryption and data leakage

The approach we inherited from ODF 1.1 encrypts each file in the ZIP independently.  Although the contents of the files are not viewable due to the encryption, there are bits of information that  potential "leak", such
as:

1) The file size
2) The file date
3) The file name
4) The file mime type
5) The hash of the first 1024 bytes of the file

For example, even in an encrypted document I could see a file name called "big-secret-takeover-june-3.jpg" and know some information that the person who wrote the encrypted document might be rather surprised to see in the open.

Although not required by ODF, an implementation, if it is clever, can avoid some of these leakages.  For example, the timestamp of the file can be turned into the time of encryption rather than the original time stamp. 
 And the file name can be randomized rather than indicate the original file name.  This might be fine for ODF, since these time stamps and file names are not necessary to be preserved.  So long as as we preserve referential integrity of the package, the names of images are not significant.

However we still should be concerned here.  First, the reason we split Part 3 into its own part was the believe that it could be useful for purposes other than just ODF 1.2.  Many of us hoped that it would other uses.  But I don't think we can assume that all uses can ignore the original file names and time stamps.  These might be significant for some uses. 

Second, even within ODF, especially if we allow package extensions,  we might see items added to packages where the names of files (which may ultimately end user-defined) cannot safely be renamed to random names. For example, there may be referential integrity constraints that a generic ODF processor is not aware of.  Maybe there is RDF that points to a contained image or other package resource.  In any case, the approach is very fragile.

Finally, even without extensions, and with the use of randomized names, we still leak information, based on knowing the size and hash of the first
1024 bytes of the file.  For example, if I have a copy of "
big-secret-takeover-june-3.jpg" I can easily check to see what encrypted documents also contain that same image.  I can similarly probe for any other resource where I know in advance its size and or contents. 

There are three ways of getting around this problem.  (Or at least two that come to mind).  One is to keep a "shadow directory" for the ZIP, that contains the original names, time stamps, and sizes of the files.  Encrypt this  "shadow directory" when the document is encrypted.  For example encrypted file, prepend it with some random bytes (not sure what is
optimal) in order to prevent data leakage of original size and hash of first 1024 bytes.

Another approach is to encode the original full path of the file, appended with its timestamp, using the original derived key, base64 encode that, and then write that out as the full path for the ZIP entry. That way you do not need another file in the ZIP. 

The other way is to move to a whole-package encryption method, rather than trying to do this file-by-file. 

-Rob

---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
References:
- Encryption and data leakage
  - From: robert_weir@us.ibm.com