OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

opendocument-users message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [opendocument-users] simple OO.org document goes awry in MS Office2007 w/SP2 - what went wrong?


You'll need to do more than replace text with X's to fully anonymize a 
document.  You need to also consider metadata (document author, etc.), 
graphics (which may have integrated text), OLE embeddings, which may have 
both of the above, as well as data that was marked as deleted but was not 
removed from the file (a characteristic of how some apps use OLE 
structured storage).  Also thumbnails, metafiles and other preview data.

It can probably be done, but would need a careful scrub.

What would be cooler and more robust would be if the document instance 
itself (or maybe the schema) declared which elements and attribute values 
contained "user content" as compared to things like font names or ID's. 
Knowing this would be useful in translation, in removing personally 
identifying information, etc.  If we had this, then we could write a 
generic tool that would work for anonymizing all XML documents.  It is 
probably more than just XSLT.  For example, you don't want to remove all 
images, but you would want to replace them by equivalently sized dummy 
images.

-Rob




From:
Jan H Wildeboer <jwildebo@redhat.com>
To:
Sander Marechal <s.marechal@jejik.com>
Cc:
marbux <marbux@gmail.com>, dennis.hamilton@acm.org, John.Cody@cio.ny.gov, 
opendocument-users@lists.oasis-open.org
Date:
06/17/2009 01:11 PM
Subject:
Re: [opendocument-users] simple OO.org document goes awry in MS 
 Office 2007 w/SP2 - what went wrong?



Sander Marechal wrote:

> I'd love to develop something like this. It shouldn't be that hard at 
all.

I would make it a module for odftoolkits validator. So that when you
feed the validator with a document, you can optionally save the stripped
document.

The cool thing is that you could send this output (as it is garantueed
to contain only placeholder content) to an online webservice that checks
for interop.

Jan

-- 
Jan H Wildeboer          |
EMEA Open Source Affairs | Office: +49 (0)89 205071-207
Red Hat GmbH             | Mobile: +49 (0)174 33 23 249
Otto-Hahn-Str.20         | Fax:    +49 (0)89 205071-111
D-85609 Dornach/Munich   | eMail:  jan.wildeboer@redhat.com
_____________________________________________________________________

Reg. Adresse: Red Hat GmbH, Otto-Hahn-Strasse 20, 85609 Dornach bei 
Muenchen
Handelsregister: Amtsgericht Muenchen HRB 153243
Geschaeftsfuehrer: Brendan Lane,Charlie Peters,Michael Cunningham,
Charles Cachera
_____________________________________________________________________

GPG Key:     3AC3C8AB
Fingerprint: 3D1E C4E0 DD67 E16D E47A  9564 A72F 5C39 3AC3 C8AB

---------------------------------------------------------------------
To unsubscribe, e-mail: 
opendocument-users-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: 
opendocument-users-help@lists.oasis-open.org






[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]