OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [office] Re: [office-metadata] Suggested Changes on the Metadata proposal

On 7/1/07, robert_weir@us.ibm.com <robert_weir@us.ibm.com> wrote:

I suppose I should throw in my $.02.

First, we should remember that ODF mandates behavior at several levels.  The schema itself encodes requirements in terms of what elements or attributes are optional or mandatory, what nesting is permitted, what restrictions there are on data types, etc.   And then the normative text of the standard, along with external normative references, make additional provisions by the use of "shall" and "shall not".  

But virtually all are undercut by the following sentence in the conformance section:

"There are ***no rules regarding the elements and attributes that actually have to be supported by conforming applications,*** except that applications should not use foreign elements and attributes for features defined in the OpenDocument schema."

But note that in that case,the provision is only applicable to those who implement that feature.  A "shall" concerning the calculation of the SUM() spreadsheet function may be totally ignored by someone who is implementing a word processor only.  Finally, we have the conformance clause, that defines which features and additional constraints are required for conformance with the standard.

Today our conformance clause designates requirements for conformant documents, conformant applications that read, conformant applications that write, and conformant applications that both read and write.  

We have very few conformance *requirements,* in the sense of mandatory requirements. Here is the sum total:


Documents that conform to the OpenDocument specification may contain elements and attributes not specified within the OpenDocument schema. Such elements and attributes **must not** be part of a namespace that is defined within this specification and are called foreign elements and attributes.

Conforming applications either **shall** read documents that are valid against the OpenDocument schema if all foreign elements and attributes are removed before validation takes place, or **shall** write documents that are valid against the OpenDocument schema if all foreign elements and attributes are removed before validation takes place.


Foreign elements may have an office:process-content attribute attached that has the value true or false. If the attribute's value is true, or if the attribute does not exist, the element's content should be processed by conforming applications. Otherwise conforming applications should not process the element's content, but may only preserve its content. If the element's content should be processed, the document itself ***shall*** be valid against the OpenDocument schema if the unknown element is replaced with its content only.

Conforming applications ***shall*** read documents containing processing instructions and should preserve them.


We should also realize that all of those "may" and "optional" requirements keywords changed their meaning between ODF 1.0 and 1.1. In ODF 1.0, they meant:


5. MAY   This word, or the adjective "OPTIONAL", mean that an item is truly optional.  One vendor may choose to include the item because a particular marketplace requires it or because the vendor feels that it enhances the product while another vendor may omit the same item. An implementation which does not include a particular option MUST be prepared to interoperate with another implementation which does include the option, though perhaps with reduced functionality. In the same vein an implementation which does include a particular option MUST be prepared to interoperate with another implementation which does not include the option (except, of course, for the feature the option provides.)

<http://www.ietf.org/rfc/rfc2119.txt>. This is the definition used by nearly all OASIS standards.


At ISO's request, that definition changed to:


The verbal forms shown in Table G.3 shall be used to indicate a course of action permissible
within the limits of the document.

Table G.3 — Permission

    Verbal form
    Equivalent expressions for use in exceptional cases

    is permitted
    is allowed
    is permissible
need not
    it is not required that
    no … is required

Do not use "possible" or "impossible" in this context.

Do not use "can" instead of "may" in this context.

"May" signifies permission expressed by the  document, whereas "can"
refers to the ability of a user of the document or to a possibility open to him/her.

The French verb "pouvoir" can indicate both permission and possibility.
For clarity, the use of other expressions is advisable if otherwise there is a risk of misunderstanding.


< >, pg. 62.

So in ODF 1.0 the keywords "may" and "optional" imported a requirement of interoperability. In ODF 1.1, that requirement disappeared with the stroke of a pen. My reading of the ISO directives suggests that we do not have the option of going back to the RFC 2119 definitions. But nonetheless it is my understanding that the TC did not study the impact of the change in requirements keyword definitions before making the change.

For example, the use of the word "may" in the preservation of foreign elements and attributes section would at least arguably, under the RFC 2119 definition, **require** preservation of foreign elements and attributes needed for interoperability purposes whether or not an application supported foreign elements and attributes.

But I think it might fly with ISO to use the RFC 2119 definition of "may" and "optional" in the conformance section alone and that might put us further down the road toward interoperability.

As you may already know, OASIS has added a new requirement for all OASIS standards:

"A specification that is approved by the TC at the Public Review Draft, Committee Specification or OASIS Standard level must include a separate section, listing a set of numbered conformance clauses, to which any implementation of the specification must adhere in order to claim conformance to the specification (or any optional portion thereof) "

I think thisis particularly important because procurement officers want to be able to simply specify that a candidate application must produce conformant format X. They do not want to, in effect, have to write their own file format specifications

When we make the changes required for the new OASIS rules, I suggest we think about conformance in general, and consider making a more substantial statement. For example, we could define things at a more granular level:  a conformant ODF spreadsheet shall support workbooks of at least a single sheet, with at least 100 rows and 25 columns and at least the Group 1 spreadsheet functions.  (Just an example, not a real proposal).  So we have the opportunity to specify multiple levels of conformance, either in the main text, or as separate profiles.

+1. I'd add that we should approach such issues with suspicion that every option is a potential interoperability breakpoint.

To the specific question at hand, I am concerned with the loose use of the word "preserve."  What exactly does that mean?  For example, must the xml:id's of the saved document be lexically identical to the read document?  Or are looser version of equivalence allowed?  For example, if the id originally is "foo" and then it is saved with the id "bar" is that permitted, provided that the structure and referential integrity of the id and references are maintained?   Remember, it will be common for an application to read an XML document and convert id's and links into internal runtime representations that are not at all similar to the XML.   Id/references might be converted into C-language pointer references between objects, etc.  Then when writing out the document, new unique ID's might be generated on-the-fly, perhaps in sequential order.  This might vary from implementation to implementation.  Beyond referential integrity, I don't know if there is any additional value in saying that a document created in KOffice must have identical ID labels when that document is later saved in OpenOffice.  

I do not have the technical knowledge to answer that question. However, I request that we approach the issue from recognition that a document may pass through many applications before wending its way back  to the originating application. From a layman's view, it would seem that a shifting vocabulary would interfere with interoperability mightily in situations where it is unknown what application will be the next to process a document.  

We should also note that it is a feature of some programs, such as Office 2007, to have a menu item specifically for removing metadata from a document, for privacy and security reasons.  I don't think we want to prevent such an application from claiming conformance.

Wouldn't an exception for user initiated actions cover this situation?

So we need to be need to be very careful how we word this.  Perhaps something like "Conforming applications that read and write documents shall be capable of "preserving" xml:id's, etc."  With the proviso that "preserving" needs a better definition, this ensures that conforming applications support preservation, while also allowing that not every mode of use may actually do so, such as when a user deletes content or metadata, etc.

I'm not sure that "capable" helps a lot. E.g., if an application is capable of preserving metadata but ships with that option turned off and an arcane set of keystrokes to enable the option known only to the developers, the app is still "capable" of preserving metadata. Maybe call that an Easter Egg optional setting.
While on the subject of the conformance section and requirements keywords, we have another problem to deal with. The Notation section currently reads: "

Within this specification, the key words "shall", "shall not", " should", "should not" and "may" are to be interpreted as described in Annex H of [ISO/IEC Directives] ***if they appear in bold letters.*** Between ODF 1.0 and ODF 1.1, many of the keywords lost their boldfacing. I suspect that is because we tend to bat language back and forth in plain text email, which strips text attributes.

1. We could avoid much of that kind of problem in the future if we switched to keywords in all cap rather than bold face, since they will remain all caps in emails.

2. Does anyone know if their are any instances of the keywords that should ***not*** be boldfaced (or all caps)? If not, we have a simple global search and replace task. If so, we have a tedious review ahead of us.

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]