office-metadata message

Subject: Re: [office-metadata] Re: [office] Re: [office-metadata] Focus on model
From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>
To: Michael Brauer - Sun Germany - ham02 - Hamburg <Michael.Brauer@Sun.COM>
Date: Mon, 18 Dec 2006 09:37:42 -0500

Hi Michael,

On Dec 18, 2006, at 8:55 AM, Michael Brauer - Sun Germany - ham02 - 
Hamburg wrote:

>> But the issue is, while styles may be (often, but actually not 
>> always!) defined separately from the content in ODF, they are applied 
>> *to* content. E.g. <span style="foo">bar</span>.
>> So if we want to consistent, we'd in fact allow the same with 
>> metadata.
>
> I'm not sure. I'm not a metadata expert, but it is my understanding 
> that metadata is information about resources.

Right.

> The resources are objects of an OpenDocument file, like paragraphs, 
> frames, etc.

Right, but also the things that those objects themselves reference.

Maybe that is a useful distinction to keep in mind.

> For that reason, it seems to be reasonable to me the the metadata 
> references the resources, and not vice versa.

I agree, IF we're only referencing the document itself, which I think 
unlikely.

> But I'm not against referencing from the objects to their metadata if 
> this is common practice.

OK.

...

>> Elias is going to show us a demo on Wednesday that involves a Calc 
>> spreadsheet, where every cell is a property. I think that will 
>> demonstrate why we shouldn't be too prescriptive about where the 
>> metadata is.
>
> Okay. Will it be in the SC meeting, or on the mailing list?

It'll be during the SC meeting; maybe there'll be some info ahead of 
time?

> Okay. I think I begin to start to understand what the issue is. What 
> exactly do you mean with "resources other than the document". Do you 
> mean resources that are contained in the document (but not the 
> document itself), or do you mean resources that are neither the 
> document, nor contained in it? Is the later within the scope of the 
> SC?

I mean references to resources within the document that are not the 
document per se.

Think of something like:

<text:link
	meta:resource="http://ex.net/people/1";
	meta:property="http://ex.net/client";>Jane Doe</text:link>

That the author wants to reference a client is not an intrinsic aspect 
of the document per se (as an artifact).

I guess the issue is this: we right now have all sorts of hard-coded 
document objects that we can hang metadata off. An image or a table is 
really straight-forward that way.

The challenge here is how to allow users to define their own content 
objects to be referenced, and also to be able to tag properties of 
those objects.

>> And to best separate out the majority of the metadata proper (the 
>> bibliographic source metadata, the diagnosis description, the vCard 
>> for the person), and also to provide selectable metadata objects 
>> within the application (user right-clicks on some span to get further 
>> metadata), it's important to have some kind of metadata in content. 
>> Likewise, it
>
> Do we really need metadata in the content? Or do we need some way to 
> visualize metadata in the content, similar to the metadata fields 
> described in section 6.4 of the ODf spec?

Probably more the latter.

> In the later case, adding new field types actually would be an option. 
> I don't know whether this actually would be a better option than 
> having the metadata in the content, but is an option.

My suggestion is we add at least one generic metadata field. E.g. if we 
take the citation field and generalize it, we can end up with something 
like:

  <text:structured-field
meta:class="http://opendocument.xml.org/fields/citation";>
    <text:link
         meta:property="http://purl.org/dc/elements/1.1/source";
         meta:resource="urn:isbn:23980912"/>
    <text:link
         meta:property="http://purl.org/dc/elements/1.1/source";
         meta:resource="urn:isbn:92130926" cite:pages="23"/>
    <text:body>(Doe, 1999: 23; Smith, 2000)</text:body>
  </text:structured-field>

Whether we reuse text:link there or invent some new element -- maybe
called meta:reference -- dosen't matter much to me. But you'll note 
there I'm advocating using something like RDFa to encode the 
referencing.

Not note one thing about the above, though: I cannot select an author 
string (say "Doe") as a metadata-enhanced object. My granularity is 
only the main referenced resource.

It becomes more clear when you consider John Madden's medical use 
cases. Say I have a statement like "Dr. John prescribes Mary with X 
drug."

That statement is in essential a metadata resource (a prescription) not 
conceptually unlike my citation.

I think one of the reasons John is interested in this is because he is 
imagining being able to select "X drug" and maybe being able to get 
additional information about it.

Now, it might be we say we don't want to allow that level of 
granularity in 1.2. But that is what this discussion is about.

>> seems to me, for copy-and-paste.
>
> Copy-and-paste is an application feature, and (office) application 
> usually don't operate on the XML itself. It therefore does not make a 
> difference whether the metadata is part of the content or not.

OK.

>> So for my use case, I except those URIs to be in-content. I expect all
>
> Which URIs?

The ones that identify the resources that the citation refers to. For 
example "urn:isbn:23983487".

>> the source metadata to be in the package. I expect if there is a
>
> Me too.
>> separate bibliographic application, that the editor sends them the 
>> URIs (through an API say), and it just returns the source, without 
>> needing any knowledge of the document or ODF.
>
> I'm not sure if I do understand that.

In other words, application gets the IDs (the URIs) and returns the 
metadata; that's it. They need know nothing about ODF. Those IDs can't 
be local document IDs, but have to be global ones that an application 
might actually be able to use to locate the metadata.

>> Likewise, if someone adds a client to their content, I'd expect the 
>> URI that identifies that client to be in the content along with 
>> another URI which says that she is, in fact, a client, and their full 
>> contact metadata (encoded in vCard, say) embedded in the package 
>> optionally.
>
> But if you add a client to the content, isn't what you are doing to 
> add the full client information next to the (office) content and to 
> reference a piece of the meta-data from the content? So, yes, it may 
> be required to reference from the content back to the metadata to 
> display them, but does that mean that the metadata is in the content?

Only in the sense that the reference itself may be understood as 
metadata.

>>> I further believe that metadata support is easier to implement if 
>>> metadata markup gets separated from the content markup, and if the 
>>> only link between the two are IRIs (including fragment identifiers 
>>> or xpointers).
>> The question is, what URIs?
>
> The URIs that reference from metadata to the content.
>
>> In the approach Elias and I are advocating, we are saying we need a 
>> handful of metadata attributes to be (optionally) attached to content 
>> nodes. Like xml:id, they will also be URIs. In essence, we are saying 
>> xml:id is good, but it's not sufficient.
>
> Well, I personally don't know whether it is sufficient to add a single 
> attribute. I could imagine that we need more, and I have no objections 
> to do so. I'm only not sure if we need metadata in the content.

OK.

>> What I take to be the current Sun position is that using xml:id alone 
>> is sufficient. But this will make the association mechanism to the 
>> content more complex, and also non-standard from an RDF standpoint.
>
> I don't think that there is a "Sun" position.

Not officially, of course; I'm just hearing consistent positions from 
Hamburg :-)

>> So the technical question I'd like someone from Sun to address is: 
>> what are the specific objections to the first approach? What would be 
>> wrong with defining a small number of metadata attributes? Or to turn 
>> it
>
> In my point of view nothing if they are used to identify the objects 
> that have metadata assigned. If they are used to add the metadata into 
> the content, I would like to understand why this is required.

OK.

>> around, why do you say that it is "easier to implement"?
>
> From the implementor's perspective, I think it would be best if the 
> application would have to maintain IDs and similar things, but not the 
> metadata itself, because it doesn't know anything about it.

OK, so the key is the last point, and if we can explain why more than 
one attribute, then that might be fine.

Thanks,
Bruce
References:
- Focus on model
  - From: "Florian Reuter" <freuter@novell.com>
- Re: [office-metadata] Focus on model
  - From: Lars Oppermann <Lars.Oppermann@Sun.COM>
- Re: [office] Re: [office-metadata] Focus on model
  - From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>