office-metadata message

Subject: Re: [office] Re: [office-metadata] Focus on model
From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>
To: office-metadata@lists.oasis-open.org
Date: Mon, 18 Dec 2006 07:39:56 -0500

Hi Michael,

On Dec 18, 2006, at 5:47 AM, Michael Brauer - Sun Germany - ham02 - 
Hamburg wrote:

> My understanding of this example is that the metadata shall be within 
> another package stream than "content.xml". In this case, the relative 
> IRI path for content.xml seems to be missing. If the metadata is 
> contained in a stream next to content.xml, this would result in
>
> <rdf:RDF>
>   <rdf:Description rdf:about="context.xml#A">
>     <dc:author>John Smith</dc:author>
>   </rdf:Description>
> </rdf:RDF>

Correct.

> The resolution of relative IRI paths within packages is already 
> defined by the ODF specification. The only thing that is new is the 
> fragment identifier that references an element within the stream, but 
> this seems to be a common and well-understood XML technique.

+1

> I may be wrong, but I always thought that this is exactly the way how 
> meta data is assigned to XML elements in general (it might be that we 
> would have to use the rdf:ID attribute within the content.xml, but 
> this shouldn't be an issue either).
>
> It seems to me that another item that is discussed is whether the 
> metadata should be within the content.xml or not. Well, since ODF 
> already makes a separation between styles, content and metadata, I 
> think it would follow the existing design principles to have it 
> separate.

But the issue is, while styles may be (often, but actually not always!) 
defined separately from the content in ODF, they are applied *to* 
content. E.g. <span style="foo">bar</span>.

So if we want to consistent, we'd in fact allow the same with metadata.

>  But there is also a technicals reason why I think metadata at least 
> optionally should be separate.

None of us dispute that metadata often should be separate. Certainly 
for citations, most of it would be.

Elias is going to show us a demo on Wednesday that involves a Calc 
spreadsheet, where every cell is a property. I think that will 
demonstrate why we shouldn't be too prescriptive about where the 
metadata is.

> Metadata could be assigned to documents after they have been created.

Certainly it should be possible, and even easy.

> This in my opinion should be possible without altering the 
> content.xml, provided that content.xml already contains IDs for those 
> objects, that should get metadata assigned. Altering the content.xml 
> for assigning metadata seems not only to be difficult, it may also 
> break existing signatures.

Hmm ... I think this would be metadata about the document. For example, 
annotations and such. But it doesn't work so well when you're 
referencing other things, which is often what documents do.

So if you remember that the RDF model is a directed graph, the xml:id 
approach will always be pointing into the document. The hybrid approach 
will allow going the other direction.

If a user adds content to the document -- I add a citation, someone 
else adds a medical diagnosis, still someone else client information -- 
then they are adding metadata while modifying the content, and that 
metadata is about resources other than the document.

And to best separate out the majority of the metadata proper (the 
bibliographic source metadata, the diagnosis description, the vCard for 
the person), and also to provide selectable metadata objects within the 
application (user right-clicks on some span to get further metadata), 
it's important to have some kind of metadata in content. Likewise, it 
seems to me, for copy-and-paste.

So for my use case, I except those URIs to be in-content. I expect all 
the source metadata to be in the package. I expect if there is a 
separate bibliographic application, that the editor sends them the URIs 
(through an API say), and it just returns the source, without needing 
any knowledge of the document or ODF.

Likewise, if someone adds a client to their content, I'd expect the URI 
that identifies that client to be in the content along with another URI 
which says that she is, in fact, a client, and their full contact 
metadata (encoded in vCard, say) embedded in the package optionally.

> I further believe that metadata support is easier to implement if 
> metadata markup gets separated from the content markup, and if the 
> only link between the two are IRIs (including fragment identifiers or 
> xpointers).

The question is, what URIs?

In the approach Elias and I are advocating, we are saying we need a 
handful of metadata attributes to be (optionally) attached to content 
nodes. Like xml:id, they will also be URIs. In essence, we are saying 
xml:id is good, but it's not sufficient.

What I take to be the current Sun position is that using xml:id alone 
is sufficient. But this will make the association mechanism to the 
content more complex, and also non-standard from an RDF standpoint.

So the technical question I'd like someone from Sun to address is: what 
are the specific objections to the first approach? What would be wrong 
with defining a small number of metadata attributes? Or to turn it 
around, why do you say that it is "easier to implement"?

I realize it might introduce some processing complexity in the content 
(while making the package much simpler), but might it not be as simple 
as writing some simple functions to handle this; like, say, when coming 
across a property URI in an object, resolve its subject?

Bruce
Follow-Ups:
- Re: [office-metadata] Re: [office] Re: [office-metadata] Focus on model
  - From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>
References:
- Focus on model
  - From: "Florian Reuter" <freuter@novell.com>
- Re: [office-metadata] Focus on model
  - From: Lars Oppermann <Lars.Oppermann@Sun.COM>