office-metadata message

Subject: Re: [office-metadata] Rough notes (I won't call them minutes just yet)

From: Michael Brauer <Michael.Brauer@Sun.COM>
To: patrick@durusau.net
Date: Wed, 07 Feb 2007 16:22:17 +0100

Hi Patrick,

Patrick Durusau wrote:
> Michael,
> 
> Michael Brauer wrote:
> 
>> Hi Patrick,
>>
>> the last sentence is close to my concerns. An application *may* of 
>> cause preserve arbitrary meta data at arbitrary elements. But ODF 
>> would become difficult to implement if we require that applications 
>> *must* preserve or *should* preserve arbitrary meta data at arbitrary 
>> elements. So what we have to do is to identify those elements where we 
>> want to say that applications *should* preserve metadata. For all 
>> others elements, applications *may* preserve metadata.
>>
>> Is that clearer?
>>
> Only just. Sorry.
> 
> Part of my concern is that you seem to assume that an application can 
> distinguish between metadata and I assume non-metadata. If it is truly 
> arbitrary markup that has been added, how does an application make that 
> distinction? It could be metadata or it could be non-metadata. All the 
> application knows is that the additonal material is unknown to it. (In 
> other words it cannot know if it is "metadata" or "non-metadata.")

Well, I think the question is whether we add the meta data attributes 
and elements to the schema or not.

If we add them to the schema, then an application is able to know 
whether a certain attribute is meta data or not.

If we don't add them to the schema, then this is in fact not the case. 
But how do you add meta data support to an application if you don't know 
what meta data looks like and where it may occur?

So, basically what I am requesting is to add the meta data attributes 
explicitly to the schema, and to not make any assumption about 
attributes and elements not defined in the schema, even though they may 
be meta data for some applications.

> 
> Why does ODF become difficult to implement if arbitrary metadata is 
> allowed? How is arbitrary metadata different from arbitrary non-metadata?
> 
> I think this is where we are missing each other. I don't understand how 
> something that is arbitrary can be divided into metadata and non-metadata.

> 
> And if we don't require preservation, how do we handle the "lite" to the 
> "rich" client scenario? Do we just have to depend on having "lite" 
> clients that do preserve what they don't understand? Even though it is 
> not required.
> 
> Obviously you are seeing an issue with preservation that is escaping me. 
> Can you say why preservatiion of arbitrary content (whether metadata or 
> not) is difficult?

There is a technical issue and an usability issue.

The technical issue is: Office application in general, regardless 
whether they are ODF applications or use other XML file formats, don't 
operate on the XML itself, but turn it into their own internal models. 
They convert this internal model back to XML only if the document is 
stored. This means, office application can only preserve data there they 
have a counterpart in their models, and they usually have only 
counterparts for those objects that are defined the file format, and 
that they support. One may  of cause implement that there also 
counterparts of elements or attributes not defined in the schema (we 
suggest that for instance for document meta data by saying it should be 
preserved), but doing that general is a huge effort, in particular 
because you do not only have to keep the data itself, but also have to 
preserve the structure.

The usability issue is that you neither can present the unknown data to 
the user in a way that she or he does understand it, nor do you know 
what shall happen to them if you edit the document. But if one edits 
information where you don't know what it is, the chances are high that 
the result is not reasonable. What's serious about this is that another 
application that does know about the additional content does not know 
whether a certain document is the result of such an editing operation or 
not. It actually does not matter much whether this content is unknown in 
the meaning that it is not defined in ODF, or unknown in the meaning 
that is only not supported. Editing a document in both cases may destroy 
the document. The only solution is to delete the unknown data in doubt. 
But that's a contradiction to saying it should be preserved ...

For this reason: It is fine to say that an application *may* preserve 
arbitrary elements and attributes, and it is of cause reasonable to 
actually preserve these elements in many cases, but it is probably not a 
good idea to say that arbitrary elements and attributes *should* or 
*shall* be preserved in general.

I hope this helps

Michael

> 
> Hope you are having a great day!
> 
> Patrick
> 
>> Michael
>>
>> Patrick Durusau wrote:
>>
>>> Michael,
>>>
>>> Snipping to your last point:
>>>
>>> Michael Brauer wrote:
>>> <snip>
>>>
>>>>> 5. Preservation of all metadata? Means content not understood must be
>>>>>    preserved. 
>>>>
>>>>
>>>>
>>>> We have to careful with this. What works is that we say that RDF-XML 
>>>> streams in the package should be preserved, and that we identify a 
>>>> couple of XML elements where we also say that meta data related 
>>>> attributes have to be preserved. What will not work is to preserve 
>>>> meta data at arbitrary elements.
>>>>
>>> Why not?
>>>
>>> The reason why we discussed this some months ago in SC was to deal 
>>> with the issue of "lite" applications that may not understand 
>>> metadata that would be useful to a "richer" application (realizing 
>>> that "lite" and "rich" are relative and rather vague terms) must 
>>> preserve that metadata.
>>>
>>> However, then the issue is, since the metadata work will allow 
>>> arbitrary metadata (which the SC has avoided defining, working only 
>>> on the mechanism for adding metadata), how do we distinguish what 
>>> must be preserved.
>>>
>>> Yes, saying RDF-XML streams in the package plus attributes on defined 
>>> XML elements would work, but why?
>>>
>>> ODF 1.1 says applications may preserve content that they don't 
>>> understand.
>>>
>>> I would think if preservation of content that is not understood, 
>>> whether metadata or not, "will not work" we would not have permitted 
>>> it in ODF 1.0 and 1.1.
>>>
>>> Granted, that may "not work" with some particular implementation 
>>> strategy but that is not really our concern.
>>>
>>> Close? Or did I miss the issue? Or do you see ODF 1.2 moving towards 
>>> a more restrictive model in terms of everything in the package *must* 
>>> be understood?
>>>
>>> Hope you are having a great day!
>>>
>>> Patrick
>>>
>>
>>
>>
>>
>

Follow-Ups:
- Re: [office-metadata] Rough notes (I won't call them minutes just yet)
  - From: "Bruce D'Arcus" <bdarcus@gmail.com>

References:
- Re: [office-metadata] Rough notes (I won't call them minutes just yet)
  - From: Michael Brauer <Michael.Brauer@Sun.COM>
- Re: [office-metadata] Rough notes (I won't call them minutes justyet)
  - From: Patrick Durusau <patrick@durusau.net>
- Re: [office-metadata] Rough notes (I won't call them minutes just yet)
  - From: Michael Brauer <Michael.Brauer@Sun.COM>
- Re: [office-metadata] Rough notes (I won't call them minutes justyet)
  - From: Patrick Durusau <patrick@durusau.net>