office-metadata message

Subject: summing up my requirements
From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>
To: office-metadata <office-metadata@lists.oasis-open.org>
Date: Fri, 11 Aug 2006 10:10:27 -0400
So we didn't really come to any conclusions about the requirements 
discussion yesterday, but at a more appropriately high-level, I think 
the core of my argument is metadata in ODF (I guess defined as 
descriptions of document content, as opposed to maybe more custom 
processing solutions) must be:

1) extensible

Here, rather than rewrite, I'd just cite the RSS 1.0 spec:

>  Namespace-based modules allow compartmentalized extensibility. This 
> allows RSS to be extended:
>
>     * without need of iterative rewrites of the core specification
>     * without need of consensus on each and every element
>     * without bloating RSS with elements the majority of which won't 
> be used in any particular arena or application
>     * without naming collisions

Right.

2) allow ad-hoc mixing

This is sort of a subset of 1 in that true extensibility implies this 
to me, but the idea is that core metadata (which might be just included 
modules) needs to be able to be mixed with extension metadata in 
unforeseen ways.

Example: I include contact records encoded in vCard. A bibliographic or 
document description must be able to refer to those metadata items 
without any particular programming logic.

Can we agree on these goals? Really, they're pretty critical. It'll be 
hard to move forward without some consensus on this.


On RDF:

It's true these could be achieved without it, but you end up needlessly 
reinventing it (a la MS). You have to have a spec, then, to identify 
descriptions, another one to describe extension models, and another one 
to do linking.

This is a BIG task, and going through all that would buy us nothing, I 
argue, over RDF.

Let me just take the smallest of examples, using the above as use case:

vcard:

<v:VCard rdf:about="urn:x:1">
   <v:fn>Jane Doe</v:fn>
   <v:n>
     <v:Name>
       <v:given-name>Jane</v:given-name>
       <v:family-name>Doe</v:family-name>
     </v:Name>
   </v:n>
</v:VCard>

Basic contact record in vCard. You can imagine a full record yourself, 
complete with address, etc.

OK, let's say we want to refer to that from the main document metadata; 
it's as simple as:

<dc:creator rdf:resource="urn:x:1"/>

[note: there's some issues with whether this is good practice with 
dc:creator, but let's leave that aside]

The critical point to understand here is that resources are atomic bits 
of metadata, identified with URIs. They in turn consist of simple lists 
of (namespaced) properties: the triples.

This is the "directed graph" that people talk about, and it's really 
better to think of it visually, as a graph of relationships.

If you like, you can think of these as relational databases of sorts, 
but in XML.

The two of these together mean that you achieve the two goals above 
(extensibility and ad hoc mixing/linking).

 From a processing standpoint, it really doesn't make any difference if 
you do the above, or instead just use anonymous containment.

<dc:creator>
   <v:VCard>
     <v:fn>Jane Doe</v:fn>
     <v:n>
       <v:Name>
         <v:given-name>Jane</v:given-name>
         <v:family-name>Doe</v:family-name>
       </v:Name>
     </v:n>
   </v:VCard>
</dc:creator>

 From the standpoint of RDF querying, for example, they are the same, 
and that's only because of the consistent model.

And because of that consistent model, you can also account for it with 
XML tools.  Indeed, I have a generic XSLT function I borrowed from Norm 
Walsh that resolves links this way.

So this is what the two goals above look like in RDF.

The bottom line is that good metadata support is very difficult. The 
more help we get by reusing existing standards, the better.

Anyway, will try to post an example later.

Bruce