[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [opendocument-users] simple OO.org document goes awry in MS Office2007 w/SP2 - what went wrong?
You'll need to do more than replace text with X's to fully anonymize a document. You need to also consider metadata (document author, etc.), graphics (which may have integrated text), OLE embeddings, which may have both of the above, as well as data that was marked as deleted but was not removed from the file (a characteristic of how some apps use OLE structured storage). Also thumbnails, metafiles and other preview data. It can probably be done, but would need a careful scrub. What would be cooler and more robust would be if the document instance itself (or maybe the schema) declared which elements and attribute values contained "user content" as compared to things like font names or ID's. Knowing this would be useful in translation, in removing personally identifying information, etc. If we had this, then we could write a generic tool that would work for anonymizing all XML documents. It is probably more than just XSLT. For example, you don't want to remove all images, but you would want to replace them by equivalently sized dummy images. -Rob From: Jan H Wildeboer <email@example.com> To: Sander Marechal <firstname.lastname@example.org> Cc: marbux <email@example.com>, firstname.lastname@example.org, John.Cody@cio.ny.gov, email@example.com Date: 06/17/2009 01:11 PM Subject: Re: [opendocument-users] simple OO.org document goes awry in MS Office 2007 w/SP2 - what went wrong? Sander Marechal wrote: > I'd love to develop something like this. It shouldn't be that hard at all. I would make it a module for odftoolkits validator. So that when you feed the validator with a document, you can optionally save the stripped document. The cool thing is that you could send this output (as it is garantueed to contain only placeholder content) to an online webservice that checks for interop. Jan -- Jan H Wildeboer | EMEA Open Source Affairs | Office: +49 (0)89 205071-207 Red Hat GmbH | Mobile: +49 (0)174 33 23 249 Otto-Hahn-Str.20 | Fax: +49 (0)89 205071-111 D-85609 Dornach/Munich | eMail: firstname.lastname@example.org _____________________________________________________________________ Reg. Adresse: Red Hat GmbH, Otto-Hahn-Strasse 20, 85609 Dornach bei Muenchen Handelsregister: Amtsgericht Muenchen HRB 153243 Geschaeftsfuehrer: Brendan Lane,Charlie Peters,Michael Cunningham, Charles Cachera _____________________________________________________________________ GPG Key: 3AC3C8AB Fingerprint: 3D1E C4E0 DD67 E16D E47A 9564 A72F 5C39 3AC3 C8AB --------------------------------------------------------------------- To unsubscribe, e-mail: email@example.com For additional commands, e-mail: firstname.lastname@example.org