office message

Subject: Re: [office] Re: [office-metadata] Suggested Changes on the Metadataproposal
From: Michael Brauer - Sun Germany - ham02 - Hamburg <Michael.Brauer@Sun.COM>
To: OASIS Office <office@lists.oasis-open.org>
Date: Mon, 02 Jul 2007 13:45:09 +0200
Hi,

I think Lars is right here. I've checked what SVG does: It has the 
following classes of conformance rules:

     * G.1 Introduction
     * G.2 Conforming SVG Document Fragments
     * G.3 Conforming SVG Stand-Alone Files
     * G.4 Conforming SVG Included Document Fragments
     * G.5 Conforming SVG Generators
     * G.6 Conforming SVG Interpreters
     * G.7 Conforming SVG Viewers

They say something about conforming document per se. They say something
about implementations that produce documents. And they say something 
about implementations that interpret and view documents. But they say 
nothing about implementations used to edit documents. Why not?

If we look at the above conformance rule classes they have something in
common. They say something about a single document instance.

In the context of metadata, we are talking about conformance of
applications used to edit documents. Editing involves two documents, the 
original one, and the document that has been edited. So we are talking 
about two document instances. Furthermore, file formats are used in 
general to encode information. When editing a document, the information 
contained in a document is changed. That means, the two documents 
actually contain different information. We, at the time we write our 
specification, don't know what is different. We only know that something 
is different, otherwise the document would not have been loaded and 
saved again.

The question is now: Can we in our conformance definitions make 
assumptions about the relation of two documents that encode different 
information? I don't think so. At least not until we have exactly 
specified what operation has taken place between loading and saving 
documents. But if we define these operations, we actually start to 
define application behavior. That is, we start to define how an 
application shall behave if the user calls a menu entry, etc. I think we 
are in agreement that this does not belong into a standard for a file 
format. File format specification for XML based file formats define what 
the meaning of XML elements and attributes is, but they do not define 
how this elements and attributes get into a document, nor what happens 
to them if a document is modified. This of cause could be standardized, 
too, but that would be a standard for office application behavior, and 
that's not what we are working on.

However, if we cannot resolve the issue using conformance clauses, how 
can we resolve it all?

Maybe we can resolve the issue similar as accessibility SC. The 
accessibility SC also has defined some new features for ODF. They have 
been added to ODF. They are all optional from the specification point of 
view, and the specification itself does not say anything how these 
features are implemented by applications. It only states what 
information they encode. But the accessibility SC is also working on 
guidelines for authors and implementors. And these guidelines are 
exactly the place where it is said how these features should be used in 
implemented in order to achieve that documents get accessible.

So, maybe the solution is to work on such guidelines for metadata, too?

Just my two cents.

Michael





Lars Oppermann wrote:
> Rob,
> 
> This is am important point. The kind of conformance that is dealt with 
> in standards such as HTML is quite different. HTML is mostly authored 
> once and then rendered by various browsers. Interoperbility with 
> optional or even 'forign' elements and attributes in that case means 
> that the UA should still render the page as good as it can and it 
> shouldn't crash. HTML UAs are not round-tripping HTML.
> 
> Office documents are edited by many different people, hence the 
> situation is a very different one.
> 
> I am not saying that this cannot be resolved. However, it is not 
> appropriate to apply HTML-reasoning (one author - multiple readers) to 
> office documents (multiple authors/readers).
> 
> Consider the following example. Given a paragraph with an id (I am not 
> using xml:id here as the example is meant to be more general):
> <text:p my:id="4711">My name is Lars</text:p>
> 
> Assume someone opens this document in a text processor, selects the 
> string "Lars" and changes it to "Hamlet". Or maybe they would select all 
> the text and type "This is not a paragraph"...
> 
> Now, if the specification would mandate the preservation of the my:id 
> attribute at all costs, the document may have become inconsistent; 
> depending on the specific semantics of the my:id attribute.
> 
> Thus, just preserving unknown information while changing the document is 
> not always appropriate. When a paragraph is edited, does it become a new 
> paragraph? When is a document just a revision and when is a different 
> document? In my opinion such questions cannot be answered in the 
> specification. Henceforth we should be very careful about mandating 
> application behavior.
> 
> /Lars
> 
> robert_weir@us.ibm.com wrote:
>>
>> I think you're reading too much into the IETF's definition of MAY.  It 
>> explicitly says that a vendor is permitted to omit the item, though it 
>> must accommodate itself and degrade functionality as necessary.  What 
>> is not permitted is that the application utterly crash when presented 
>> with an item it does not understand.   At least that is the way it 
>> works for the IETF standards I'm familiar with.
>>
>> Although intuitively we want to say, "Preserve metadata unless the 
>> user explicitly intended otherwise," I don't see how to express this 
>> in standards terms.  We can't have a conformance depend on "user 
>> intent".  And reference to a user doesn't help. Documents can be 
>> processed by automation, and I think we would equally be unhappy if 
>> metadata were arbitrarily stripped there.  In any case, I think we 
>> need to work along the lines of "shall be capable of" or "shall allow 
>> at least one mode of operation where" or something like that.   That 
>> would be testable. 
>> You suggested that a devious implementation might makes this mode of 
>> operation hard to find in order to hurt interoperability.   But then I 
>> could also suggest a devious user who arbitrarily deletes metadata in 
>> order to hurt interoperabiity.  I'm not sure a document format 
>> standard can prevent either. 
>> -Rob
>>
>>
>> marbux <marbux@gmail.com> wrote on 07/01/2007 06:31:11 PM:
>>
>>  >
>>
>>  > On 7/1/07, robert_weir@us.ibm.com <robert_weir@us.ibm.com> wrote:
>>  >
>>  > I suppose I should throw in my $.02.
>>  >
>>  > First, we should remember that ODF mandates behavior at several
>>  > levels.  The schema itself encodes requirements in terms of what
>>  > elements or attributes are optional or mandatory, what nesting is
>>  > permitted, what restrictions there are on data types, etc.   And
>>  > then the normative text of the standard, along with external
>>  > normative references, make additional provisions by the use of
>>  > "shall" and "shall not".   >
>>  > But virtually all are undercut by the following sentence in the
>>  > conformance section:
>>  >
>>  > "There are ***no rules regarding the elements and attributes that
>>  > actually have to be supported by conforming applications,*** except
>>  > that applications should not use foreign elements and attributes for
>>  > features defined in the OpenDocument schema."
>>  >   >
>>  > But note that in that case,the provision is only applicable to those
>>  > who implement that feature.  A "shall" concerning the calculation of
>>  > the SUM() spreadsheet function may be totally ignored by someone who
>>  > is implementing a word processor only.  Finally, we have the
>>  > conformance clause, that defines which features and additional
>>  > constraints are required for conformance with the standard.
>>  >
>>  > Today our conformance clause designates requirements for conformant
>>  > documents, conformant applications that read, conformant
>>  > applications that write, and conformant applications that both read
>>  > and write.   >
>>  > We have very few conformance *requirements,* in the sense of
>>  > mandatory requirements. Here is the sum total:
>>  >
>>  > >>>
>>  >
>>  > Documents that conform to the OpenDocument specification may contain
>>  > elements and attributes not specified within the OpenDocument
>>  > schema. Such elements and attributes **must not** be part of a
>>  > namespace that is defined within this specification and are called
>>  > foreign elements and attributes.
>>  >   > ...
>>  >
>>  > Conforming applications either **shall** read documents that are
>>  > valid against the OpenDocument schema if all foreign elements and
>>  > attributes are removed before validation takes place, or **shall**
>>  > write documents that are valid against the OpenDocument schema if
>>  > all foreign elements and attributes are removed before validation 
>> takes place.
>>  >
>>  > ...
>>  >
>>  > Foreign elements may have an office:process-content attribute
>>  > attached that has the value true or false. If the attribute's value 
>> is true
>>  > , or if the attribute does not exist, the element's content should
>>  > be processed by conforming applications. Otherwise conforming 
>> applications
>>  > should not process the element's content, but may only preserve its
>>  > content. If the element's content should be processed, the document 
>> itself ***
>>  > shall*** be valid against the OpenDocument schema if the unknown
>>  > element is replaced with its content only.
>>
>>  > Conforming applications ***shall*** read documents containing
>>  > processing instructions and should preserve them.
>>  >
>>  > <<<
>>  >
>>  > We should also realize that all of those "may" and "optional"
>>  > requirements keywords changed their meaning between ODF 1.0 and 1.1.
>>  > In ODF 1.0, they meant:
>>  >
>>  > >>>
>>  >
>>  > 5. MAY   This word, or the adjective "OPTIONAL", mean that an item
>>  > is truly optional.  One vendor may choose to include the item
>>  > because a particular marketplace requires it or because the vendor
>>  > feels that it enhances the product while another vendor may omit the
>>  > same item. An implementation which does not include a particular
>>  > option MUST be prepared to interoperate with another implementation
>>  > which does include the option, though perhaps with reduced
>>  > functionality. In the same vein an implementation which does include
>>  > a particular option MUST be prepared to interoperate with another
>>  > implementation which does not include the option (except, of course,
>>  > for the feature the option provides.)
>>  >
>>  > <http://www.ietf.org/rfc/rfc2119.txt>. This is the definition used
>>  > by nearly all OASIS standards.
>>  >
>>  > <<<
>>  >
>>  > At ISO's request, that definition changed to:
>>  >
>>  > >>>
>>  >
>>  > The verbal forms shown in Table G.3 shall be used to indicate a
>>  > course of action permissible
>>  > within the limits of the document.
>>  >
>>  > Table G.3 — Permission
>>  >
>>  >
>>  >     Verbal form
>>  >     Equivalent expressions for use in exceptional cases
>>  >     (see 6.6.1.3)
>>  >
>>  > may
>>  >     is permitted
>>  >     is allowed
>>  >     is permissible
>>  >     > need not
>>  >     it is not required that
>>  >     no … is required
>>  >
>>  > Do not use "possible" or "impossible" in this context.
>>  >
>>  > Do not use "can" instead of "may" in this context.
>>  >
>>  > NOTE 1
>>  > "May" signifies permission expressed by the  document, whereas "can"
>>  > refers to the ability of a user of the document or to a possibility
>>  > open to him/her.
>>  >
>>  > NOTE 2
>>  > The French verb "pouvoir" can indicate both permission and 
>> possibility.
>>  > For clarity, the use of other expressions is advisable if otherwise
>>  > there is a risk of misunderstanding.
>>  >
>>  > <<<
>>  >
>>  > <http://72.14.253.104/search?q=cache:DxJI76h9l8QJ:www.iec.
>>  > ch/tiss/iec/Directives-Part2-Ed4.pdf+nnex+H+of+%
>>  > 5BISO/IEC+Directives&hl=en&ct=clnk&cd=1&gl=us >, pg. 62.
>>  >
>>  > So in ODF 1.0 the keywords "may" and "optional" imported a
>>  > requirement of interoperability. In ODF 1.1, that requirement
>>  > disappeared with the stroke of a pen. My reading of the ISO
>>  > directives suggests that we do not have the option of going back to
>>  > the RFC 2119 definitions. But nonetheless it is my understanding
>>  > that the TC did not study the impact of the change in requirements
>>  > keyword definitions before making the change.
>>  >
>>  > For example, the use of the word "may" in the preservation of
>>  > foreign elements and attributes section would at least arguably,
>>  > under the RFC 2119 definition, **require** preservation of foreign
>>  > elements and attributes needed for interoperability purposes whether
>>  > or not an application supported foreign elements and attributes.
>>  >
>>  > But I think it might fly with ISO to use the RFC 2119 definition of
>>  > "may" and "optional" in the conformance section alone and that might
>>  > put us further down the road toward interoperability.
>>  >
>>
>>  > As you may already know, OASIS has added a new requirement for all
>>  > OASIS standards:
>>  >
>>  > "A specification that is approved by the TC at the Public Review
>>  > Draft, Committee Specification or OASIS Standard level must include
>>  > a separate section, listing a set of numbered conformance clauses,
>>  > to which any implementation of the specification must adhere in
>>  > order to claim conformance to the specification (or any optional
>>  > portion thereof) "
>>  >
>>  > I think thisis particularly important because procurement officers
>>  > want to be able to simply specify that a candidate application must
>>  > produce conformant format X. They do not want to, in effect, have to
>>  > write their own file format specifications
>>  >   >
>>  > When we make the changes required for the new OASIS rules, I suggest
>>  > we think about conformance in general, and consider making a more
>>  > substantial statement. For example, we could define things at a more
>>  > granular level:  a conformant ODF spreadsheet shall support
>>  > workbooks of at least a single sheet, with at least 100 rows and 25
>>  > columns and at least the Group 1 spreadsheet functions.  (Just an
>>  > example, not a real proposal).  So we have the opportunity to
>>  > specify multiple levels of conformance, either in the main text, or
>>  > as separate profiles.
>>  >
>>  > +1. I'd add that we should approach such issues with suspicion that
>>  > every option is a potential interoperability breakpoint.
>>  >
>>  > To the specific question at hand, I am concerned with the loose use
>>  > of the word "preserve."  What exactly does that mean?  For example,
>>  > must the xml:id's of the saved document be lexically identical to
>>  > the read document?  Or are looser version of equivalence allowed?  
>>  > For example, if the id originally is "foo" and then it is saved with
>>  > the id "bar" is that permitted, provided that the structure and
>>  > referential integrity of the id and references are maintained?   > 
>> Remember, it will be common for an application to read an XML
>>  > document and convert id's and links into internal runtime
>>  > representations that are not at all similar to the XML.   > 
>> Id/references might be converted into C-language pointer references
>>  > between objects, etc.  Then when writing out the document, new
>>  > unique ID's might be generated on-the-fly, perhaps in sequential
>>  > order.  This might vary from implementation to implementation.   > 
>> Beyond referential integrity, I don't know if there is any
>>  > additional value in saying that a document created in KOffice must
>>  > have identical ID labels when that document is later saved in 
>> OpenOffice.   >
>>  > I do not have the technical knowledge to answer that question.
>>  > However, I request that we approach the issue from recognition that
>>  > a document may pass through many applications before wending its way
>>  > back  to the originating application. From a layman's view, it would
>>  > seem that a shifting vocabulary would interfere with
>>  > interoperability mightily in situations where it is unknown what
>>  > application will be the next to process a document.   >
>>  > We should also note that it is a feature of some programs, such as
>>  > Office 2007, to have a menu item specifically for removing metadata
>>  > from a document, for privacy and security reasons.  I don't think we
>>  > want to prevent such an application from claiming conformance.
>>  >
>>  > Wouldn't an exception for user initiated actions cover this situation?
>>
>>  > So we need to be need to be very careful how we word this.  Perhaps
>>  > something like "Conforming applications that read and write
>>  > documents shall be capable of "preserving" xml:id's, etc."  With the
>>  > proviso that "preserving" needs a better definition, this ensures
>>  > that conforming applications support preservation, while also
>>  > allowing that not every mode of use may actually do so, such as when
>>  > a user deletes content or metadata, etc.
>>  >
>>  > I'm not sure that "capable" helps a lot. E.g., if an application is
>>  > capable of preserving metadata but ships with that option turned off
>>  > and an arcane set of keystrokes to enable the option known only to
>>  > the developers, the app is still "capable" of preserving metadata.
>>  > Maybe call that an Easter Egg optional setting.
>>  >   > While on the subject of the conformance section and requirements
>>  > keywords, we have another problem to deal with. The Notation section
>>  > currently reads: "
>>  > Within this specification, the key words "shall", "shall not", " 
>> should", "
>>  > should not" and "may" are to be interpreted as described in Annex H
>>  > of [ISO/IEC Directives] ***if they appear in bold letters.***
>>  > Between ODF 1.0 and ODF 1.1, many of the keywords lost their
>>  > boldfacing. I suspect that is because we tend to bat language back
>>  > and forth in plain text email, which strips text attributes.
>>  > 1. We could avoid much of that kind of problem in the future if we
>>  > switched to keywords in all cap rather than bold face, since they
>>  > will remain all caps in emails.
>>  > 2. Does anyone know if their are any instances of the keywords that
>>  > should ***not*** be boldfaced (or all caps)? If not, we have a
>>  > simple global search and replace task. If so, we have a tedious
>>  > review ahead of us.
>>  >
> 
> 


-- 
Michael Brauer, Technical Architect Software Engineering
StarOffice/OpenOffice.org
Sun Microsystems GmbH             Nagelsweg 55
D-20097 Hamburg, Germany          michael.brauer@sun.com
http://sun.com/staroffice         +49 40 23646 500
http://blogs.sun.com/GullFOSS

Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1,
	   D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
Follow-Ups:
- Re: [office] Re: [office-metadata] Suggested Changes on the Metadata proposal
  - From: Bruce D'Arcus <bdarcus@gmail.com>
References:
- Re: [office] Re: [office-metadata] Suggested Changes on the Metadataproposal
  - From: robert_weir@us.ibm.com
- Re: [office] Re: [office-metadata] Suggested Changes on the Metadataproposal
  - From: Lars Oppermann <Lars.Oppermann@Sun.COM>