[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: A few of specific examples
Some specific examples of how and why arbitrary proprietary extensions are evil. Two common concerns with users is the need for privacy and security. The issue of personally-identifying meta-data is increasingly in the news. Some products, like Microsoft Office, have a built-in operation that will remove such information from a Word document. There are also third-party application that will strip such metadata from a document. So, suppose you want to write such an operation for an ODF document.. What do you do? Simple enough, look to meta.xml scrub extension elements under <office:meta>, etc. The places where metadata is stored is deterministic. The standard is clear where they are. But allow arbitrary extensions everywhere, and you have no idea where the metadata is. Your ability to write a generic tool like this is made far more difficult. You can't tell whether an extension contains metadata, content, processing instructions, executable code, or whatever. Similarly, there is the need to scan a document for virus or malicious macros. Remember all the Word viruses from a few years ago? The risk is still there. Antivirus vendors have been somewhat successful in addressing such risks with mail gateway filters which act in part by examining file attachments and scanning them for risky content. As a policy some companies will disallow any external document with a macro to go through their firewall. So how would you do this for an ODF document? Well, ODF says scripts go into the <office:script> element. So the simple solution is to scan for that element and if it exists, to flag the document as a higher risk. But with arbitrary proprietary extensions, how do we know that they don't contain executable content? How does the virus scanner handle arbitrary elements, which may contain metadata, content, processing extensions, scripts or anything? The easiest solution would be to ban documents that contain extensions. Is that what we want? Similarly, a search engine will want to find all text in a document for indexing. Reading the ODF specification it is clear what is content and what is not, so a proper indexer can be written. But with arbitrary proprietary extensions, this task is impossible, I would not know whether the extensions elements should be indexed or not. Also, a program that translates a document from one language to another, preserving all formatting and styles. Reading the ODF spec, I can easily determine what elements are content and which are not and then run machine translation on just the content. But with arbitrary proprietary extensions, I have no idea. I risk doing a partial translation, if the extension elements represent user-visible content. There is also the question of document referential integrity. Suppose I want to write a program that takes a large ODF document and splits it up into chapters, one ODF document per chapter. According to the ODF standard this is easy. I can trace the style dependencies and duplicate what is needed and make several documents from a single ODF document. Similarly, I could take multiple ODF documents and combine them into a single document, merging the styles as needed. But in the presence of arbitrary proprietary extensions I cannot do either of these operations safely, since I do not understand the semantics of these extensions. Now I can imagine a well-thought out extensibility mechanism that would address the above concerns. I'd certainly entertain any such proposals. But merely saying "The X in XML standards for eXtensibility" is not a considered engineering approach. Extensibility requires that we think out issues such as versioning, content negotiation, fall-backs, namespacing, round-tripping, as well as offer clear guidelines for how extensions declare whether they contain translatable text, metadata, executable code, or other categories of importance. The fail-safe approach is to remove this option until such time as we can do it right. If there is sufficient interest to work on this, we could create a new subcommittee on extensibility to work on developing a detailed proposal in this area, obviously for consideration post ODF 1.2. -Rob
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]