office message

Subject: Re: [office] metadata and XMP (was: [office] metadata (OpenDocument TC Meeting Minutes ...))

From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>
To: office@lists.oasis-open.org
Date: Thu, 12 Jan 2006 09:51:41 -0500

On Jan 12, 2006, at 5:36 AM, Lars Oppermann wrote:

> I am also not saying, that XMP should be the framework. However, I 
> belive, that the default metadata model can be mapped to an RDF model 
> that is consistent with the subset of RDF that can be modeled by XMP.

OK, sounds good.

[...]

> One of the things that I like about XMP is, that it is out there and 
> actually being used.

Sure, and I wouldn't discount that.

OTOH, most of the RDF experts I've been talking to are also 
implementing real world projects and standards based on RDF. Leigh 
Dodds, for example, was involved in the development of RSS/RDF and is 
now overseeing Ingenta's move to a 200 million triple RDF backend, the 
SIMILE project at MIT is behind the PiggyBank Firefox extension, the 
Nature Publishing Group has been heavily exploiting RSS/RDF to 
distribute metadata, etc.

Granted, they are mostly coming from an internet-based perspective, and 
Adobe (and perhaps most of the TC) from a traditional application 
perspective, but I think there is some common ground between them.

> So as I stated above: while I don't think that XMP as it is today 
> should be the normative framework for OpenDocument metadata, I would 
> very much like to see the XMP framework to be able to use as much of 
> the metadata in an OpenDocument file as possible.

Sure.

> E.g. if a document management system has build in support for XMP, you 
> can use the XMP data in your PDF documents stored in that system. If 
> the XMP framework was able to use the metadata (at least the part that 
> is attached to the whole document) in an OpenDocument file, you could 
> use OpenDocument as a storage format in the system just like you would 
> do with PDF.

I'm glad you brought up this example.

How many XMP-based document management systems are there really, and 
how widely are they deployed?  I've personally never heard of one, 
though don't doubt they exist.

I'm interested in promoting server-based solutions around this issue, 
and as I look around at the current software and standards landscape, I 
see an explosion of applications based on agile development frameworks 
tied to relational databases (LAMP, Ruby on Rails, etc.), and wide 
deployment of syndication standards like RSS and (now) Atom.

XMP, by contrast, is a closed standard that is -- beyond Adobe -- not 
widely deployed.  I note, for example, that Flickr offers RSS and Atom 
feeds, but not XMP export. Apple's new professional photo application 
Aperture has rich metadata support, but does not support XMP (I think 
they should, but that's another matter).

All of this is to say that this space is still unsettled, and that 
there are other legacy and interoperability issues to account for 
beyond XMP (including ODF's existing metadata support).

Let me outline briefly the limitations of XMP as they apply to 
OpenDocument:

1. It does not support the RDF class system (e.g. rdf:type or typing 
nodes), so that while you can say that a given item has a title, you 
cannot say what that item is (a book, document, spreadsheet, figure, 
etc.). That seems a problem in a context like OpenDocument.

2. It does not support the RDF linking system. To me this is probably 
the most important limitation, because they've basically thrown own the 
relational part of the RDF model.

3. It does not support XML content in literals (easily done with RDF by 
adding a rdf:parseType="Literal" attribute on a property), except 
through the highly problematic practice of using escaped content.

4. XMP insists on repeated properties being wrapped in outdated 
structures like rdf:Alt and rdf:Bag. I understand this provides some 
convenience from a GUI perspective, but it also introduces problems in 
other areas.

5. Following from 4, XMP abuses core Dublin Core properties like 
dc:creator (which is defined as a literal that represents the name of 
the agent who created a resource) or dc:subject by *always* embedding 
an rdf collection element within. Bob DuCharme noted this problem here:

<http://www.xml.com/pub/a/2004/09/22/xmp.html>

There is also a fairly new structure in RDF -- 
rdf:parseType="Collection" -- that could be useful, but which XMP does 
not support.

> A user could then use OpenDocument for files that are in the 
> edit-cycle and use PDF for published documents. Plus, the metadata of 
> an OpenDocument file wo0uld also be usable in its PDF version.

As I said before, I think interoperability (if not full compliance) 
with XMP is highly desirable, and I really wish that Adobe would 
consider revising XMP, and perhaps opening up its development. The 
impression I got from Alan's post, however, is they are not interested 
in doing that. That doesn't fill me with much confidence.

Like I said, this is a complicated issue, which is why I wonder if it 
wouldn't be better to tackle these details last?

For the bibliographic project, we really need standardized metadata 
support below the document level. Would be nice if we could agree on 
that now.

Bruce

Follow-Ups:
- Re: [office] metadata and XMP (was: [office] metadata (OpenDocument TC Meeting Minutes ...))
  - From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>

References:
- OpenDocument TC Meeting Minutes 2006-01-09 and 2005-12-19
  - From: Lars Oppermann <Lars.Oppermann@Sun.COM>
- metadata (OpenDocument TC Meeting Minutes ...)
  - From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>
- Re: [office] metadata and XMP (was: [office] metadata (OpenDocument TCMeeting Minutes ...))
  - From: Lars Oppermann <Lars.Oppermann@Sun.COM>