office message

Subject: RE: [office] Conformance Clause proposal, Version 8

From: robert_weir@us.ibm.com
To: office@lists.oasis-open.org
Date: Thu, 5 Feb 2009 19:22:27 -0500

Dennis,  my critique is a bit more subtle than that.  The point is that 
extensions cannot be generically processed and that makes many common 
editing and document manipulations unsafe.

Consider some basic ODF:

<text:p text:style-name=?default?>Hello World</text:p>

Now suppose I am an ODF processor, say an editor and I want to split this 
into two.  Maybe the user wants to apply a different style to each word. 
Or I want to insert another word into this paragraph.  Doing all this is 
straightforward. 

Now suppose instead I have this:

<foo:bar my-attr="3221"><text:p text:style-name=?default?>Hello 
World</text:p></foo:bar>


What do I do, in an editor, if the user wants to remove one of those two 
words?  Or copy/paste both of them to another part of the document? Or 
split these two words into two different documents, or combine two 
documents with similar extensions?  What should I do? 

I the user does a copy/paste, do I end up with this?

<foo:bar my-attr="3221"><text:p 
text:style-name=?default?>Hello</text:p></foo:bar>
.
.
.
<foo:bar my-attr="3221"><text:p 
text:style-name=?default?>World</text:p></foo:bar>

That sounds reasonable, and is analogous to copying styled text.  But what 
if my-attr ended up being declared in the extension schema as an XML ID? 
I'd now have duplicate ID's,  Whoops.  Without knowing the extension, how 
do I know when to simply copy something versus when I need to generate a 
unique ID?  And that is the sample case.  More general cases will have 
multiple classes of reference semantics, possibly spanning multiple XML 
documents.

Without knowing the schema of the extensions, and the constraints 
expressed and implied I am at a loss.  Are there reference semantics?.  Is 
there an implied ordering?  Cardinality constraints?  I simply don't know. 
  You can't do generic editing of an unknown schema in a word processor. 
It just doesn't work.

So what happens in practice is one of two things:

A) The document extensions are understood by only the application that 
writes them, and other applications are forced to react by doing one of 
two things:

1) Automatically stripping out all extension markup just to be safe.  This 
obviously benefits no one, especially not the vendor trying to extend ODF, 
and certainly not the user who potentially suffers data loss.

2) Screwing up the document by trying to preserve extensions in a naive 
way, but actually messing up the integrity of the document by not 
understanding fully the constraints of the extensions schema.  This also 
is a disservice to the user.

or B) The document extensions are well-documented by the application 
extending the format, and other vendors implement the additional 
constraints required.  But in this case, if vendors agree on the semantics 
of an extension, then shouldn't this just be made part of the standard?

That is why arbitrary extensions are evil.  They create in general an 
interoperability problem that defies solution. That is why no ODF 
implementation today uses this feature.  And this is why I've recommended 
that it be removed from the standard.

-Rob

"Dennis E. Hamilton" <dennis.hamilton@acm.org> wrote on 02/05/2009 
06:10:08 PM:

> 
> Rob,
> 
> When you listed the evils of unbridled extension, I thought it was
> over-reaching to attach all of those prospects to the presence of 
foreign
> elements-attributes-values.  That is for two reasons:
> 
> 1. If I wanted to attack a consumer and the assets of its user, I would 
not
> do so with foreign e-a-v.   Since a consumer is likely to reduce those 
away,
> it doesn't seem like the most-plausible choice for an exploit.  Of 
course,
> if there is a prominent, widely-deployed consumer that has some 
supported
> foreign e-a-v that is exploitable, that's perhaps more promising. 
> 
> 2. If I wanted to construct an exploit, I would do it the same way it 
was
> done in the past with Word, via the open and unprotected scripting, 
plug-in,
> and macro capabilities.  Promising targets in ODF are fully available as
> part of strictly conforming documents and I would go that route once an
> implementation was widely-deployed enough to provide a profitable 
target. 
> 
> 3. Likewise, if I wanted to connive a covert channel for smuggling
> information or planting scurrilous information in a document, I would do 
it
> using the available provisions of strictly conforming documents.
> 
> 4. My sense of your objection is that poorly-designed foreign e-a-v and
> their defective support by one or more consumers would expose those
> consumers to additional prospects for such difficulties.  I can't argue
> against that, as much as I hope that we are now much smarter about such
> things than we were in the past. 
> 
> 5. I do think that perhaps our efforts might be well-spent giving the 
same
> careful scrutiny to existing exposures in strictly conforming documents 
that
> you identify as important before considering any sort of host-language
> profile:
> 
>     If we want to create a host language profile at some point,
>     then that would also be fine with me, but we would need to 
>     address the kinds of issues I raised in my previous note 
>     regarding identification of executable code, personal content in 
>     documents, document assembly, referential integrity, etc.
> 
>  - Dennis
> 
> PS: I just had a lot of fun searching through the uses of "script" and
> "plugin" in the ODF 1.1 specification and in ODF 1.2 Part 1 draft 8.
> 
> 
> -----Original Message-----
> From: robert_weir@us.ibm.com [mailto:robert_weir@us.ibm.com] 
> http://lists.oasis-open.org/archives/office/200902/msg00061.html
> Sent: Thursday, February 05, 2009 13:49
> To: office@lists.oasis-open.org
> Subject: RE: [office] Conformance Clause proposal, Version 8
> 
> [ ... ]
> 
> In any case my preference remains to stick with a single conformance 
> class, not permitting namespace extensions.  If we want to create a host 

> language profile at some point, then that would also be fine with me, 
but 
> we would need to address the kinds of issues I raised in my previous 
note 
> regarding identification of executable code, personal content in 
> documents, document assembly, referential integrity, etc.
> 
> [ ... ]
>

Follow-Ups:
- RE: [office] Conformance Clause proposal, Version 8
  - From: "Dennis E. Hamilton" <dennis.hamilton@acm.org>

References:
- Conformance Clause proposal, Version 8
  - From: Michael Brauer - Sun Germany - ham02 - Hamburg <Michael.Brauer@Sun.COM>
- RE: [office] Conformance Clause proposal, Version 8
  - From: "Dennis E. Hamilton" <dennis.hamilton@acm.org>
- RE: [office] Conformance Clause proposal, Version 8
  - From: robert_weir@us.ibm.com
- RE: [office] Conformance Clause proposal, Version 8
  - From: "Dennis E. Hamilton" <dennis.hamilton@acm.org>