office-metadata message

Subject: MeSH example, extensibility
From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>
To: office-metadata <office-metadata@lists.oasis-open.org>
Date: Wed, 30 Aug 2006 22:44:24 -0400
Patrick raised the issue of MeSH today for subject tagging of 
documents, so I looked into it a little more. Here's what I found ...

MeSH subject identifiers do have some sort of ID (not sure if they are 
represented as URIs also), and there is in fact a DCTERMS property 
specifically for encoding it. See here, towards the bottom:

	<http://dublincore.org/documents/dcmi-terms/>

This seems relevant too:

	<http://www.nlm.nih.gov/tsd/cataloging/metafilenew.html>

E.g. a MeSH subject heading could presumably be represented using 
something like this:

	<dcterms:mesh>D000005</dcterms:mesh>

Not sure I have the details right (that id looks wrong for example), 
but you get the idea.

The above could then be included in the document meta.xml file for 
indexing (and is, coincidentally, a valid RDF property).  For purposes 
of indexing and search, then, that would go a long way without even 
needing to embed any source. It would work quite elegantly with the 
approach I've been advocating.

I can see the same thing using the RDF-based SKOS vocabulary.

	<http://www.xml.com/pub/a/2005/06/22/skos.html>

In that case, I'd imagine also that the source descriptions would 
likely not typically get embedded in the ODF file (though of course 
*could*), but the identifiers would be the critical bit.

To be clear, then, *this* is what I mean by extensibility: the ability 
to add foreign properties to a resource description that correspond to 
a common model so that *tools know what they are.*

It would be absolutely counterproductive in my view to allow developers 
to throw out the ODF metadata entirely and use their own schema: a kind 
of all-or-nothing view of extensibility.

It is NOT, then, a notion of documents and schemas, and really gets to 
Florian's discussion of the abstract API.

E.g. if we think of the model in oo terms, then a document object 
(citation, table, image, etc.) gets described by something like a 
Resource object, which includes:

	-	a (optional) uri identifier
	-	one or more (optional) types
	-	an array of Property objects

There are then two kinds of Property objects:

	-	Literal (the simple property-value case above)
	-	linked Resource

This is simple, but very powerful.

If we don't use a common model (e.g. "rules for extension"), then we'll 
paint ODF into a rather tight corner, and we'd introduce more problems 
than we'd solve.

Bruce

PS - I managed to write an XSLT in about 30 minutes that converted a 
MeSH example I found to this valid RDF. Really not hard.

<DescriptorRecord
	xmlns="http://nih.org/mesh/";
	xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
	rdf:ID="D000005">
    <descriptorName>Abdomen</descriptorName>
    <annotation>region &amp; abdominal organs</annotation>
    <preferredConcept>
       <Concept>
          <conceptUI>M0000005</conceptUI>
          <conceptName>Abdomen</conceptName>
          <scopeNote>That portion of the body that lies
         between the thorax and the pelvis.</scopeNote>
          <term>
             <Term>
                <print>Y</print>
                <termUI>T000012</termUI>
                <value>Abdomen</value>
                <dateCreated>1999-01-01</dateCreated>
             </Term>
          </term>
          <permutedTerm>
             <Term>
                <lexicalTag>NON</lexicalTag>
                <termUI>T000012</termUI>
                <value>Abdomens</value>
             </Term>
          </permutedTerm>
       </Concept>
    </preferredConcept>
</DescriptorRecord>