office message

Subject: Re: [office] Conformance Clauses Proposal
From: robert_weir@us.ibm.com
To: Michael Brauer - Sun Germany - ham02 - Hamburg <Michael.Brauer@Sun.COM>
Date: Fri, 21 Nov 2008 18:07:57 -0500
Thanks for the response.  No need to apologize for the delay.  I am 
certainly late in responding to your initial proposal, and I appreciate 
being able to discuss this further.

-Rob

Michael.Brauer@Sun.COM wrote on 11/21/2008 07:54:43 AM:
> 
> Hi Rob,
> 
> thank you very much for your feedback. It took me unfortunately a little
> bit longer to respond than I hoped it would.
> 
> On 11/17/08 14:14, robert_weir@us.ibm.com wrote:
> > I apologize for not reviewing this earlier.  But since I woke up this 
> > morning at 4am, due to jetlag, I find myself free of meetings for a 
few 
> > hours, so I will make some comments now.  Overall I like the progress 
over 
> > ODF 1.0/1.1.  This is an important part of the standard, and will get 
a 
> > lot of attention during the review in OASIS, as well as in JTC1 when 
we 
> > submit ODF 1.2 for PAS approval.  So I'm hoping we can consider my 
late 
> > comments and perhaps make another iteration over this section.
> > 
> > I reviewed the "sixth iteration" dated 11/11/2008.
> > 
> > I think it would be good if we started the conformance section with a 
> > clear statement of what we will be defining.  Something like, "The 
> > OpenDocument Format standard defines conformance for documents, 
generators 
> > and processors, with two conformance classes called loose and strict." 

> > We'll need something like that for the Scope statement as well. 
Probably 
> > want to synch up on the language.
> 
> Yes, this sounds like a good idea.
> 
> > 
> > I'm a little confused by the term "processor", since it is defined, 
> > somewhat circularly, as "a program that can parse and process 
OpenDocument 
> > document".  What is "to process"?  What is intended, beyond parsing? 
> > 
> > To my thinking, we have three kinds of applications:
> > 
> > 1) generators (or writers or producers)
> > 2) parsers (or readers or consumers)
> 
> This is what is meant by the term "processor".
> What about using the term interpreter rather than parser?
> SVG uses this term:
> 
> http://www.w3.org/TR/SVG11/conform.html#ConformingSVGInterpreters
> 
> The conformance clauses for SVG interpreters state:
> 
> "A Conforming SVG Interpreter must parse any SVG document correctly. It
> is not required to interpret the semantics of all features correctly."
> 
> So, this is actually close to what is called an "ODF processor" in the
> current proposal.
> 

We can call it anything of these things (processor/interpreter/parser) so 
long as we explicitly define what that term means.  The level of 
description from the SVG Recommendation is fine.  My preference would be a 
term that denotes consumption of ODF, similar to how generator implies 
production of ODF.  In fact, Producer/Consumer is a natural pair, as is 
Reader/Writer, or Generator/Parser.  But Generator/Processor does not (to 
my ear) have that kind of equivalence. 

What do others think on this?


> 
> > 3) processors (that both read and write)
> > 
> > I think we may be failing to acknowledge that 3rd kind of program, the 

> > ones that both read and write.  As we know, this brings in additional 
> > questions related to round-tripping, and that this is a thorny issue. 
But 
> > it is a legitimate concern, and we should see if there is some 
language 
> > the TC can agree to for it, especially considering that most ODF 
> > applications fit into that category, and in practice interoperability 
> > would be enhanced by any firmer guidance ODF 1.2 can provide in this 
area.
> 
> I agree. But the requirements for applications that read and write
> documents heavily depend on the purpose of the application itself. We 
> also have to consider that between a read and a write some or all of the 

> information that is contained in the document changes, but that we don't 

> know which information and why it is changed. That makes it very
> difficult to find requirements that are testable.
> 

This is true.  But the question is whether there is a common subset of 
typical uses that is worth giving a label to?

For example, if we take a Processor as something that reads and writes 
ODF, then some obvious requirements might be:

1) It is also a conformant ODF Generator/Producer/Writer
2) It is also a conformant ODF Parser/Reader/Consumer

and maybe

3) It is capable of an identity transform, meaning it is capable take an 
input document and write out an "equivalent" document (however we define 
that).

The idea is that although it may mainly operate by changing the input 
document, it should have a mode where it is capable of not changing the 
input document.  Then we can put shall or should around preserving 
metadata and foreign elements/attributes.

The tricky part is around what "equivalent" means.  Certainly changing 
namespace prefixes, ordering of attributes, style names, etc., should be 
flexible.  We get some of this via reference to the W3C's Canonical XML 
standard, if we want.  But we really need a definition of Canonical ODF to 
define logical equivalence of ODF documents.

So I think I answered my own question here.  There is not much interesting 
we can say about a read/write processor at this point.


> I could actually imagine that this kind of conformance could be much
> better addressed by the OIC TC, where we may bind the requirements to
> profiles.
> 
> > 
> > Here are some specific comments:
> > 
> > Document processing -- 
> > 
> > "Documents that loosely conform to the OpenDocument specification..."
> > 
> > We introduce here the term "loosely conform" but we don't clearly 
indicate 
> > what our conformance classes are.  For example, do we want "loose 
> > conformance" and "strict conformance"?  or plain "conformance"?  Is 
there 
> > a better word than "loose"?  It carries a negative connotation in my 
ears. 
> 
> You are right that we use the term "loosely conform" before we define
> it. This could be solved by moving the document processing section
> behind the conformance clauses.
> 
> The two proposed conformance classes are "conformance" and "loose 
> conformance", but I'm open for other suggestion how to name them. I have 

> chosen the term "conformance" to emphasize that this is actually the 
> type of conformance that applications should aim to achieve. We may also 

> call the classes "strict conformance" and "loose conformance". I'm not 
> in favor of calling them "strict conformance" and "conformance", because 

> this may sound like that "conformance" is what applications should 
> achieve, while "strict conforamance" contains some optional requirements 

> only that may or may not be achieved. That's not what was intended by 
> having two conformance classes.

The thing I don't like about "loose" conformance is that the additional 
allowed data -- the foreign elements and attributes -- have no meaning 
whatsoever.  All we say about them really is that the document must be 
valid after they are removed.  So this isn't "loose" versus "strict" like 
HTML Transitional versus HTML Strict, where the Transition form has 
defined functionality beyond Strict. 

> > 
> > Also, do we need loose conformance at all?  Why not just have a single 

> > conformance class and anything beyond that is not conformant?  Then, 
if 
> > there occurs a commonly-used set of extensions to ODF, in a foreign 
> > namespace, then we can either included that in ODF-Next, or (more 
simply) 
> > define a profile for the extensions.
> 
> I have thought about this myself for a long time. One reason why I have
> added the "loose" conformance class was that we allowed foreign elements
> in ODF 1.0 and 1.1. The other reason was that vendors may have
> the issue that they have to store some information in a file
> immediately, for instance because a customer requested this, and cannot
> wait for the next ODF version. The assumption here of cause would be
> that they propose the extension to the ODF TC, so that it would be used 
> only temporarily. In so far, there may be indeed better solutions than a 

> separate conformance class. I will think about this.
> 

We can't stop vendors from adding extensions like this.  But we are not 
required to add a conformance class for them either.  We could just say 
that such documents are not conformant to ODF 1.2. 

It is always a trade-off and there is no clear "right answer", but at the 
extremes we can have:

A)  A very loose definition of conformance that allows many ODF vendors to 
claim conformance because the mandatory requirements are very few.

or 

B) A tight definition of conformance that may result in fewer conformant 
ODF applications, but the ones that are conformant are more likely to be 
interoperable.

I think we want to move in the direction of B in ODF 1.2.  We may not be 
able to move there all at once or suddenly.  But one step is to avoid 
introducing a conformance class for something that is inherently 
non-portable and not interoperable.

> What seems to be essential here is that we define rules how foreign
> element and attributes are processed. This means, we may remove the
> loose conformance class, but we must not remove the processing rules for
> foreign elements and attributes.
> 

Yes, this could be preserved, even with a single conformance class.


> In any case, I think we should remove the "loosely conformant
> OpenDocument generator" again. An applications that calls itself 
> conformant shall be able to store conforming file.
> 
> 
> > 
> > "Foreign elements and attributes shall not be part of a namespace that 
is 
> > defined within this specification."
> > 
> > "part of a namespace" is rather loose.  The term used by the 
Namespaces in 
> > XML Recommendation is "associate".  So I suggest, "Foreign elements 
and 
> > attributes shall not be associated with a namespace that is defined 
within 
> > this specification."
> 
> Yes, that's a good suggestion.
> 
> > 
> > "If a foreign element has a  or  ancestor element and is a child 
elements 
> > of an element which may include..."   Typo.  Should be "child element" 

> > (singular).  Or do we really mean "descendant element"?
> 
> It is a typo. And "child element" is correct. The case this addresses 
is:
> 
> <text:p>text
>    <text:frame>
>      <text:image ...>
>      <text:new-replacement>My new replacement</text:new-replacement>
>    </text:frame>
> </text:p>
> 
> In this case, "My new replacement" should not be displayed because it
> belongs to a text frame (that is displayed outside the text flow) rather
> than to the text of the paragraph.
> 
> With the current wording, it would not be displayed, because 
> <text:frame> does not allow text content. If we would say "descendant", 
> then the text would be displayed because <text:p> allows text content.
> 

OK.  Thanks.  That is clear now.

> > 
> > "For foreign elements that occur at other locations, conforming 
processors 
> > should not process the element's content, but may only preserve its 
> > content"
> > 
> > We have not defined "process".   Does it include "preserve its 
content"?
> 
> No, it means that the content should not be displayed, analyzed or
> whatever the application does. I think the term "interpret" is better 
here.
> 


The challenge we have with the specification now is that everything is 
either at a very low level -- XML level descriptions -- or at a very high 
level, like "preserving" or "processing".  What would tie it together 
would be something that describes the abstract text document, or 
spreadsheet, after the XML is parsed, after the styles are all resolved 
and the metadata all associated.  This abstract content model would then 
give us something that we can formally describe processing on.

I need to think about that some more. 

> > 
> > Also, why not make preservation of content of foreign elements be a 
> > "should"? Is there any reason why we would not make that 
recommendation?
> 
> I assume you mean the case that the content of a foreign element should 
> be processed (or interpreted)? In that case, an application would 
> interpret the text content anyway, and where would not be any difference 

> to text that is contained in a paragraph but outside a foreign element.
> 
> > 
> > In any case this "Document processing" section seems misplaced.  I 
wonder 
> > if it fits better if put in the Generator or Processor conformance 
> > section, since it is defining conformance.
> 
> Well, maybe. But this would make the conformance clauses even more
> difficult to read. I will update the proposal based on your other 
> suggestions, and we may see how it reads then.
> 
> > 
> > So overall, I find this document processing area thorny and 
troublesome. I 
> > would not be disappointed if it were entirely removed from the 
standard, 
> > or moved to an informative annex on "How to extend ODF".  There is 
little 
> > value to implementors or users in having a conformance class for 
documents 
> > that are extended in nearly-arbitrary ways.  Nothing written here 
really 
> > allows such extended documents to interoperate.  On the one hand, I 
don't 
> > believe that there is anything intrinsically evil with extensions, but 
I 
> > don't think that we need to favor them with the label "loose 
conformance". 
> 
> We have to differ between the processing rules and the "loose
> conformance class". The processing rules are something we should have.
> The "loose conformance" class is maybe something we don't need.
> 
> > 
> > 
> > Do we know what implementations today use foreign elements and 
attributes 
> > according to this section?  Is the usage widespread?
> > 
> > "Conforming OpenDocument Documents" -- this is defined currently as a 
> > condition, if/then.  But I think it should be stated as a requirement: 
"A 
> > conforming OpenDocument document shall adhere to the specification 
> > described in this document...".  Similarly, "An OpenDocument document 
> > which is also an OpenDocument Package shall...". 
> > 
> > Generally, we should use "conform" rather than "adhere" whenever 
possible.
> 
> I have used the SVG conformance definition as basis here, but we may
> reword this.
> 
> > 
> > Do we want to require a specific XML character encoding? 
> 
> XML requires UTF-8 and UTF-16. I would not extend this.
> 

I meant, do we want to leave this open-ended in ODF, that any character 
encoding may be used?  Or should be require a conformant document limit 
itself to one of a smaller set of encodings?  For example, OOXML restricts 
itself to UTF-8 or UTF-16.  This is a good thing, I think.

> > 
> > 
> > 2.1.3 -- "f the XML root is.... then it  shall be valid with respect 
to 
> > the strict schema defined by this specification."  What is "it"?  "XML 

> > root" doesn't make sense?   "Sub document" maybe?  A similar question 
in 
> > 2.1.4.
> 
> A link to the schemas is indeed missing.
> 
> "XML root" shall actually read "XML root element".
> 
> > 
> > "Conforming OpenDocument Generators" has the first conformance 
requirement 
> > stated as "It shall not  create any non-conforming OpenDocument 
document 
> > of any kind."  But this, stated as a negative, is untestable.  How can 
an 
> > implementation prove that it is incapable of producing a 
non-conforming 
> > ODF document?  What, for example, if the power goes out when in the 
middle 
> > of saving a document?  Would that render the entire application 
> > non-conforming?
> 
> That is a good point. Again I have adopted that from the SVG conformance
> clauses. The problem if we don't have this, then an application that
> creates one conforming ODF document and otherwise only documents that
> contain extensions may call itself conformant, too.
> 

OOXML states it as "A conforming producer shall be able to produce 
conforming documents of at least one document
conformance class."

That does have the loophole that you mention, that it might produce only a 
single conformant document.

If anyone has a better way of stating this, in a way that is testable, 
then I'm all ears.

> > 
> > I'd state this requirement more simply (and more testable) as "It 
shall 
> > produce documents which conform to this standard".
> > 
> > 
> > We might factor out the common requirements between normal and loose 
> > conformance rather than repeating material.  For example, we are 
currently 
> > stating the documentation requirement twice.
> 
> I've noticed this than adding the documentation requirement, too, but
> repeated the text because I otherwise had to change the structure of the
> clauses.
> 
> > 
> > My preference would be to make the document requirement be a "shall" 
> > rather than a "should".  I think we're giving implementors a lot of 
rope 
> > to play with by allowing a loose conformance class, a significant 
ability 
> > to extend the standard in incompatible and proprietary way.  As stated 

> > before, I'd be happy if this ability were removed altogether.  But if 
we 
> > do allow it the label of "conforming" than I think we should require 
> > documentation of the extensions, not merely recommend it.
> 
> If we keep the loose conformance, then I have no objections in turning
> this into a shall.
> 

That's an option.  So I think I'm hearing three choices:

A) As we have it now -- Loose and "Strict" document conformance, both with 
recommended documentation requirements

B) Only strict document conformance, but application conformance would 
have document processing rules that would define how foreign 
elements/attributes are processed.  (Although such documents would not be 
conformant documents, we can certainly specify how conformant applications 
treat them.) 

C) Loose and strict document conformance, but loose application 
conformance (applications that write loose documents) would have a 
documentation requirement, not merely a recommendation.

My preference here would be for B.  What do others think?

> > 
> > "Conforming OpenDocument Processors" -- "process" is not defined.
> 
> We should use "interpreter" here, too
> 
> > "It  be able to parse and process OpenDocument documents of one or 
more of 
> > the defined document types (defined by their MIME types) any of which 
are 
> > represented in packages."   I don't think we need to mention MIME 
types 
> > here.  We can just say "able to parse and process OpenDocument 
packages of 
> > one or more of the document types defined by this standard".  Better 
even 
> > if we can give a section reference.
> 
> Yes, this sounds reasonable.
> 
> > Why is the single XML version a "may" rather than at least a "should"? 
 It 
> > is odd to have the single file version be present (and not deprecated) 
if 
> > we do not feel it warrants more than a "may" for support.
> 
> If we make this a should, then application actually should implement
> both variants. That is some effort, and I think for most
> application this is not reasonable. On the other hand, there are 
> situations where providing a single XML file is reasonable, for instance 

> if it should be taken as basis of an XSLT transformation.
> 
> Best regards
> 
> Michael
> > 
> > Regards,
> > 
> > -Rob
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe from this mail list, you must leave the OASIS TC that
> > generates this mail.  Follow this link to all your TCs in OASIS at:
> > https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 

> > 
> 
> 
> -- 
> Michael Brauer, Technical Architect Software Engineering
> StarOffice/OpenOffice.org
> Sun Microsystems GmbH             Nagelsweg 55
> D-20097 Hamburg, Germany          michael.brauer@sun.com
> http://sun.com/staroffice         +49 40 23646 500
> http://blogs.sun.com/GullFOSS
> 
> Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1,
>       D-85551 Kirchheim-Heimstetten
> Amtsgericht Muenchen: HRB 161028
> Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
> Vorsitzender des Aufsichtsrates: Martin Haering
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 
>
Follow-Ups:
- Re: [office] Conformance Clauses Proposal
  - From: Michael Brauer - Sun Germany - ham02 - Hamburg <Michael.Brauer@Sun.COM>
References:
- Conformance Clauses Proposal v4
  - From: Michael Brauer - Sun Germany - ham02 - Hamburg <Michael.Brauer@Sun.COM>
- Conformance Clauses Proposal
  - From: robert_weir@us.ibm.com
- Re: [office] Conformance Clauses Proposal
  - From: Michael Brauer - Sun Germany - ham02 - Hamburg <Michael.Brauer@Sun.COM>