office-metadata message

Subject: Re: [office-metadata] summarizing recent suggestions
From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>
To: Michael Brauer - Sun Germany - ham02 - Hamburg <Michael.Brauer@Sun.COM>
Date: Wed, 28 Feb 2007 08:15:18 -0500
Hi Michael,

On Feb 28, 2007, at 4:04 AM, Michael Brauer - Sun Germany - ham02 - 
Hamburg wrote:

> I have just noticed that the current suggestion all use the meta 
> namespace. Since all other fields that we have are from the "text" 
> namespace, I suggest that we use the "text" namespace for reasons of 
> consistency for this new field, too. I'm sorry that I didn't notice 
> that earlier.
>
> For he same reason, I'm also not sure whether we should add the term 
> "field" to the field's name. We currently do so only for the 
> "user-field",  where "user-field" itself is a term used already by 
> office applications. For all other fields, the element name just says 
> something about the content or purpose of the field.
>
> What about calling the field just "text:meta", or "text:metadata", or 
> "text:metadata-label" (I think the term label was suggested by Bruce)? 
> If the name shall contain the term "field", then "text:meta-field" 
> would be an option.
>
> My personal favorite actually is "text:metadata" or 
> "text:metadata-label".

I really have no strong opinion on this. I just chose the element names 
to have something concrete to discuss. Does anyone else have any 
opinions on the matter?

>>> How does the ODF application 'knows', who is 'responsible' for 
>>> creating the content based on metadata for this field? In our case, 
>>> how do we find the responsible plug-in?
>>> The parsing of all RDF/XML streams seems not a good option from 
>>> sight of an ODF application.
>>> But RDFa or even better a further optional attribute (specifying the 
>>> implementation) might give us a hint about the responsible plug-in 
>>> and would be helpful.
>> I personally think the field should be typed in some way. E.g. 
>> something like:
>> <meta:field xml:id="0874801373 
>> field:type="http://ex.net/Contact";>foo</meta:field>
>
> "Typing" it somehow is a good idea. The concept we have for this 
> already is to use namespaced names (see for instance chart:class 
> attribute described in section 10.2):
>
> This would look like:
>
> <meta:field xml:id="0874801373 xmlns:contact="http://ex.net";
> field:type="contact:Contact">foo</meta:field>
>
> For consistency reason I suggest that we reuse this concept, unless it 
> would be inconsistent with other metadata standards, or otherwise 
> inappropriate.

I didn't realize ODF already supported using shortened names for 
attribute content.

In any case, that's merely a shorthand, after all. 
http://ex.net/Contact == contact:Contact in your example. In that 
sense, we probably shouldn't care whether it's a full URI or a 
namespace-prefixed one?

>> I think Elias proposed that be encoded in the RDF/XML.
>> I have to say for my citation field I'm a little nervous about 
>> leaving all of the logic for the RDF/XML.
>
> Me too. For two reason:
>
> 1. I believe that someone who implements let's say bibliographic 
> support does not want to care about contact information, or any other 
> metadata that a document may contain, and vice versa.

And another issue is that fields of this sort are pretty basic: they 
consist of a pointer to some object (usually done via some ID; in our 
case a URI) and optional parameters. If we fix the referencing in the 
schema and allow foreign attributes to serve as those parameters (along 
with an xml:id; see below), then we provide a good balance of 
flexibility and predictability.

> 2. We shouldn't make much assumption how a field actually is updated 
> (that is, how often the field value is recalculated and how), but we 
> have to make sure that this can be done efficiently. I therefore think 
> it should be possible for an application to figure out who (for 
> instance what plug-in) may provide the field value from what is stored 
> in the content.xml, and the plug-in should be able to get any 
> additional data it required efficiently, too.
>
> A type as suggested by Bruce seems to be a good solution for this. We 
> may extend this by an optional URI that links to the RDF/XML stream 
> that contains additional data (but that's only a suggestion).

Right. If we give the field an xml:id, then it provides an extension 
point to augment the description if needed.

> In any case, a solution that requires that all RDF/XML streams are 
> read to be able to update a field has the high risk that it introduces 
> performance issues. Office applications for instance for performance 
> reasons read images and embedded objects on demand only (that is, when 
> they are displayed or edited). We should allow a similar behavior for 
> metadata, too. A type plus maybe an IRI should allow that, but 
> probably is not the only solution to this problem.
>
>> My alternative would be sometning like:
>> <field:field field:type="http://ex.net/Citation"; xml:id="0874801373">
>>   <field:source>
>>     <meta:link meta:resource="urn:isbn:98239809" cite:pages="23"/>
>>     <meta:link meta:resource="http://ex.net/1"/>
>>   </field:source>
>>   <field:body>
>>     (Doe, 1999: 23; Smith, 2004)
>>   </field:body>
>> </field:field>
>> I think it's just a practical matter how much the field should 
>> contain to best enable document portability, including across file 
>> formats (say OOXML; which looks more like the above).
>
> It's an interesting idea. For other text fields, the field description 
> itself contains all data that describes what is displayed, but not the 
> value that is displayed. Your idea seems to go into that direction. On 
> the other hand, for metadata we assume that specialized implementation 
> provide that string that is displayed. The data that describes what is 
> displayed therefore is of value only for this specialized 
> implementation. I therefore could also image that we actually move it 
> to the RDF/XML streams that contain the actual metadata. The only 
> thing we have to make sure is that it is easy to actually locate that 
> data (see my comment above).

The above is a generalization of the citation field, adapted for the 
new metadata support. It is also similar to how MS does this, BTW. What 
I encode in two meta:link elements, they encode in a single dumb 
attribute (which would be really ugly to process with XML tools, BTW).

>>> In this context, you forgot to mention Elias comment about the 
>>> datatype attribute. RDFa gets the content as a XMLLiteral not as a 
>>> string. Elais offered the datatype="plaintext" to be able to receive 
>>> only text from the ODF element. Any link on this, Elias?
>> Yes, I left that out, but agree it should be in, and would support 
>> Elias' suggestion on the datatyping.
>
> ODF has already a (limited) type support for strings, doubles, 
> date/times and durations. See section 6.7.1. This support is based on 
> xsd datatypes and already provides a support for data styles. If we 
> add type support for metadata, I suggest that we define them based on 
> what we have already (the current metadata draft actually says so 
> already).

Yes, makes sense. But do we need to add a plaintext type then?

>>> Usually we offered in the specification an own element to emphasize 
>>> such a scenario, therefore I still suggest an own element for the 
>>> first scenario.
>>> Although we would not define how an Office should handle such 
>>> sensible data, but we would at least give the ODF application a 
>>> chance to do it.
>> I think your word choice of "sensible" here is not quite right. Am 
>> not quite sure what you're trying to say here.
>
> Actually, I have no objections to allowing metadata attributes also 
> for let's say paragraphs or other elements, provided that either all 
> (in particular the about and property attributes) have to appear the 
> simultaneously.

No problem for me.

> I only have a few concerns regarding attaching metadata attributes to 
> <text:span> elements instead of defining a new element for metadata 
> that appears within a paragraph. Why?
>
> <text:span> currently is defined as follows:
>
> "The <text:span> element represents portions of text that are 
> attributed using a certain text style or class. The content of this 
> element is the text that uses the text style."
>
> That means, their purpose is to attach style information to text. This 
> means that new <text:span> elements get added and may be removed if 
> style information is changed. More important, for style information it 
> actually does not matter how many <text:span> elements are used to 
> attach a style to a piece of text. That is,
>
> <text:span text:style-name="T1">Michael Brauer</text:span>
>
> has the same semantic as
>
> <text:span text:style-name="T1">Michael </text:span><text:span 
> text:style-name="T1">Brauer</text:span>
>
> This would be different if we add metadata attributes to <text:span> 
> element. When doing so, we would alter the way <text:span> element are 
> used. For this reason (and only for this reason), I would prefer to 
> introduce a new element.

So, for example, putting the attributes on table cells would be fine?

Bruce
Follow-Ups:
- Re: [office-metadata] summarizing recent suggestions
  - From: Michael Brauer - Sun Germany - ham02 - Hamburg <Michael.Brauer@Sun.COM>
References:
- summarizing recent suggestions
  - From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>
- Re: [office-metadata] summarizing recent suggestions
  - From: Svante Schubert <Svante.Schubert@Sun.COM>
- Re: [office-metadata] summarizing recent suggestions
  - From: "Bruce D'Arcus" <bdarcus@gmail.com>
- Re: [office-metadata] summarizing recent suggestions
  - From: Michael Brauer - Sun Germany - ham02 - Hamburg <Michael.Brauer@Sun.COM>