office-metadata message

Subject: content tagging requirements and rdf/a
From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>
To: office-metadata <office-metadata@lists.oasis-open.org>
Date: Fri, 1 Sep 2006 15:06:30 -0400
Re: the micro-metadata I mentioned in the previous note, I was looking 
again at the RDF/A documents for the content-tagging use case.

<http://www.w3.org/TR/xhtml-rdfa-primer/>

A good practical example that could relate to office documents is this 
one of embedded vCard metadata:

     <p class="contactinfo" about="http://example.org/staff/jo";>
         My name is
         <meta property="contact:fn">
             Jo Smith
         </meta>.
         I'm a
         <meta property="contact:title">
             distinguished web engineer
         </meta>
         at
         <a rel="contact:org" href="http://example.org";>
             Example.org
         </a>.
         You can contact me
         <a rel="contact:email" href="mailto:jo@example.org";>
             via email
         </a>.
     </p>

If you extract this as a set of triple statements, you get:

	<http://example.org/staff/jo>
		contact:fn		"Jo Smith";
		contact:title		"distinguished web engineer";
		contact:org		<http://example.org>;
		contact:email		<mailto:jo@example.org> .

This is (more-or-less) the N3 syntax for representing RDF. An untyped 
resource with four properties, two of them literals, and two of them 
resources (identified with URIs).

So what's to note about the mechanism for making these statements and 
associating it with document content?

For the most part, they use existing element (a, p, div) to carry the 
content, and a small handful of attributes to convey the semantics.

Plain literals are represented by text nodes, and linked resource 
objects with an href attribute (a URI).

There's some controversy about their use of QNames in attributes, but 
they did this clearly to a) make authoring easier, and b) to make 
validation and extension easy.

For the longest time I struggled with imagining how to practically 
implement this is ODF applications, and wondered whether it really 
offered any advantages to storing the metadata separately, but the more 
I think about, the more I could imagine it.

Like, I could imagine highlighting a paragraph and having a contextual 
menu to enter the "about" URI, and then applying properties to spans 
using styles (or maybe a dedicated "meta" tag).

There is also an example from the HXTML 2 spec (which uses RDF/A) of 
the sort of extrinsic linking Rob was mentioning:

<http://www.w3.org/TR/xhtml2/mod-meta.html#s_metamodule>

   <html xmlns:dc="http://purl.org/dc/elements/1.1/";>
     <head>
       <link about="#q1" rel="dc:source" href="urn:isbn:0140449132" />
     </head>
     <body>
       <blockquote id="q1">
         <p>
           'Rodion Romanovitch! My dear friend! If you go on in this way
           you will go mad, I am positive! Drink, pray, if only a few 
drops!'
         </p>
       </blockquote>
     </body>
   </html>

So there the paragraph gets an id, which external metadata (in this 
case the link in the head) can then reference. I think in that case I'd 
do it the other way around (have the link embedded in the quote to 
reference outboard metadata), but it's not that different.

What this says to me for this use case is we need to add the following 
requirements:

1)  need to define mechanisms to associate content with metadata 
descriptions

These mechanisms may include attributes and/or elements which can 
assign metadata properties or URIs to document content, or which can 
serve as anchor points to make extrinsic statements about document 
content. Metadata statements may be embedded in the content file or 
stored externally.

Bruce
Follow-Ups:
- Re: [office-metadata] content tagging requirements and rdf/a
  - From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>