office message

Subject: [Fwd: ODF Metadata]
From: Michael Brauer - Sun Germany - ham02 - Hamburg <Michael.Brauer@Sun.COM>
To: OpenDocument Mailing List <office@lists.oasis-open.org>
Date: Mon, 19 Dec 2005 09:45:23 +0100
Hi,

Gary asked my to forward the following mail.

Michael
-------- Original Message --------
Subject: 	ODF Metadata
Date: 	Sun, 18 Dec 2005 23:14:02 -0800
From: 	Gary Edwards <gary.edwards@OpenStack.us>
To: 	Michael Brauer <Michael.Brauer@Sun.COM>, office@lists.oasis-open.org


Hi ODF TC,

Some of you might remember Bruce D'Arcus.  When Duane Nickull (Adobe)
first submitted the XMP metadata format for consideration, i contacted
Bruce for help in understanding RDF, and the relationship between RDF
and XML.  Bruce then proceeded to take the issue to a group of
XML::RDF experts for comment.  Who can forget the fury, velocity, and
sheer expertise of the discussion that followed?

Anyway, Bruce has continued to wrestle with the issue in collaboration
with Florian, Adobe, and many others.  Much of this discussion - work
has been taking place at the Foundation, and through direct eMail.
Now that Bruce will be joining the ODF TC perhaps we can pull together
the loose ends, even those that reach deep into the W3C.

I've been asked to post this brief introduction to Bruce.  Also
included is a summary of the metadata project work.  Hopefully his
joining the TC will help us to pull the many ideas together, and come
up with a solution that truly unlocks the enormous potential of XMP
and ODF.  I sense great things ahead, with Google in particular
benefiting enormously from the ODF metadata work.

Bruce is available for the Monday morning conference call, but i
wonder if his Foundation membership application can be processed in
time.  Is there a contingency routine for this situation?  Just
wondering.

~ge~

----- From Bruce D'Arcus -------------

Hi All,

I'm in the process of getting signed up for the TC, but am not sure if
it will be official or not for the Monday call.  I'd very much like to
get involved in the metadata discussion ASAP.


Background
============

For those that don't know me, I am co-project lead for the OpenOffice
Bibliographic Project. I am also a professional scholar, and originally
came to this work because of frustration with existing tools (and, in
retrospect, their incredibly limited metadata support).

Since then, I have become an expert on the intersections of XML,
metadata, and increasingly, RDF. I have been an active member of the
XML metadata community around the Library of Congress, for example,
where I not only learned a lot from library metadata experts, but also
contributed towards the evolution of their Metadata Objects Description
Schema (MODS). Likewise, I am part of a group sponsored by the Nature
Publishing Group which consists mostly of people from the academic
publishing industry.

I also have some background with OpenDocument.  I worked with Daniel
Vogelheim on the proposal (approved by the TC last year) to
dramatically improve the coding for citations in OpenDocument. That in
many ways prefigures this metadata conversation, as that new citation
coding consists of a pointer to external metadata. While we had not
settled on what that metadata would look like, we had from the
beginning assumed it would be stored apart from the content file in the
wrapper. It just so happens that this fits perfectly the current
metadata discussion.

I feel I have, then, both the background and the practical use case
(citations and bibliographic metadata) to help inform this discussion.


On XMP, RDF and OpenDocument
===========================


Phase 1 proposal
----------------

First, let me comment on the mapping proposal that Lars and Florian put
together. I think it's in good shape, and the only issues to resolve in
my mind are:

1) I wonder if meta:keyword should be deprecated in favor of dc:subject?

Here's the definition of the latter:

"Typically, Subject will be expressed as keywords, key phrases or
classification codes that describe a topic of the resource. Recommended
best practice is to select a value from a controlled vocabulary or
formal classification scheme."

The "recommended" practice is thus, in RDF, to do:

         <dc:subject
rdf:resource=3D"http://example.net/subjects/Software"/=
<http://example.net/subjects/Software%22/=>
  >

You might then use SKOS, say, to richly describe those subjects.
Indeed, that's what I'd do with my bib data.  But the definition of
dc:subject certainly doesn't preclude using string literals, and that
seems better than using meta:keyword.

2)  You can use Qualified DC to replace ODF-specific properties; both
dcq:created and dcq:update.

The reason why this is important to me long-term is not just to
deprecate two elements in favor of more commonly used ones (though this
is in itself important), but because DCQ adds most of what we need on
top of DC for bibliographic metadata (see below).  For example, the
dcq:isPartOf relation is really, really critical to being able to
model, say, journal articles or book chapters (or chapters in ODF!).

See < http://www.dublincore.org/documents/2000/07/11/dcmes-qualifiers/>
for more.

3)  What's the purpose of "meta:initial-creator"?  Just to mark who
saved the file?

4) The big question: is it time to get rid of the user-defined
elements? E.g. if you allow the sort of rich extensibility Adobe offers
with XMP, then that would be a more robust solution than the generic
property/value pairs (which are not identified with URIs) that you
currently have. Perhaps it is not worth worrying about now, but rather
just flag this as an issue, and decide it as part of phase 2?

I think if you take care of the above, you're done.

Phase 2/3
---------

Alan Lillich did a great job laying out the broader issues in his list
post.

First, let me point you to three responses to that post, from me, Leigh
Dodds (who is a well-respected expert in both XML and RDF, and
engineering manager at Ingenta, a major academic metadata and fulltext
provider), and Bob DuCharme (similar background as Leigh; works for
Lexis-Nexis):

< http://www.ldodds.com/blog/archives/000263.html>
< http://netapps.muohio.edu/blogs/darcusb/darcusb/archives/2005/12/09/
odf-and-xmp-comments
<http://netapps.muohio.edu/blogs/darcusb/darcusb/archives/2005/12/09/odf-and-xmp-comments>>
< http://www.snee.com/bobdc.blog/2005/12/using_or_not_using_adobes_xmp.html>

For me the conclusion is really that rather than start with XMP, the TC
should start with the existing metadata support, which is already very
close to RDF. While I believe interoperability with XMP should be an
important goal, I see it as a separate issue from what is best for
OpenDocument.

What ODF needs is actually fairly simple:

1) To broaden the metadata support beyond the level of the document, so
that a consistent approach is used for all metadata needs in
OpenDocument. I can provide the bibliographic use case, but there are
others.

2) To deepen the metadata support to include:

        i.      additional default support for all of Dublin Core, and part=
s of Qualified Dublin Core

        ii.     a mechanism for extension (that goes beyond the current
user-defined fields)

I've already written a RELAX NG schema that formalizes the above. It's
not hard to do technically at the level of the XML schema [1]. This
approach is sort of like training wheels for RDF, where -- recognizing
that RDF tools are not yet as widely supported as XML tools -- you
constrain the XML syntax so that it can be easily processed both by RDF
and XML tools.

[For an interesting discussion of issues with the RDF/XML syntax and
tools, see Dan Brickley's post:
< http://danbri.org/words/2005/09/28/137>]

Also, ODF already has a solid packaging mechanism, so storage of the
RDF metadata need only exploit that existing support, where one can
indicate a metadata file by just using the text/xml+rdf mediatype.

The above approach would add a lot of power to ODF, but with fairly
minimal changes.

The only other detail to work out is linking from document content to
the RDF descriptions.  Again, I don't think this will be that
difficult, and the new citation coding already points the way to what
that might look like.

Finally, if the TC is interested, I am more than willing to work on a
formal proposal over the next few weeks to present for consideration,
and would be happy to work with others on this.

Bruce

[1] See two further blog posts of mine for how I was thinking about
this a couple of months ago (the details have changed, but not the
general approach):

<http://netapps.muohio.edu/blogs/darcusb/darcusb/archives/2005/10/01/
opendocument-mixing-metadata
<http://netapps.muohio.edu/blogs/darcusb/darcusb/archives/2005/10/01/opendocument-mixing-metadata>>
< http://netapps.muohio.edu/blogs/darcusb/darcusb/archives/2005/10/30/
opendocument-and-rdf-storing-what-metadata-where
<http://netapps.muohio.edu/blogs/darcusb/darcusb/archives/2005/10/30/opendocument-and-rdf-storing-what-metadata-where>>