office message

Subject: RDF metadata as an extension mechanism
From: Michael Stahl <Michael.Stahl@Sun.COM>
To: office@lists.oasis-open.org
Date: Fri, 13 Feb 2009 16:18:06 +0100

Hi all,

in this mail i'd like to share some thoughts on using RDF metadata as an 
extension mechanism.
First, i would like to address an example that Doug Mahugh has posted:

Doug Mahugh wrote:
<quote>
It's worth noting that the ODF metadata mechanisms don't allow for the use 
of a private/custom schema to tag content within a document.  And that use 
case has value to many users.  So if we decide that ODF won't be able to 
support those types of scenarios, for whatever reason, we should not be 
surprised to find that users who need such capabilities will look elsewhere.

Consider the trivial example of a pre-existing document, created years 
ago, which needs to be logged in to a content management system that 
requires an abstract to be identified for each document.  If the format of 
the document is HTML, then a div with class="abstract" can be used to tag 
the appropriate paragraph(s) as the abstract.  If the format of the 
document is DOCX, a customXml element with element="abstract" can be used 
for the same purposes.  In both cases the document content remains valid 
HTML or WordprocessingML, while the user adds the custom semantics 
required for their purpose.  The custom semantics can be (and should be) 
ignored by others.  The user is free to innovate quickly, and does not 
have to think in terms of a tradeoff between strict compliance and 
flexibility/business value.  They can, and do, have the best of both 
worlds in such scenarios: strict compliance to a standard, and freedom to 
innovate quickly for their own specialized purposes.
</quote>


It seems quite simple to implement this with the current RDF metadata 
support, as specified by ODF 1.2.
Basically, we need to identify the paragraph in content.xml that contains 
the abstract, and annotate it with an RDF property that expresses its, 
umm, "abstract-ness". We assume that our hypothetical CMS wants to keep 
all of the metadata that it is interested in in a separate RDF graph; at 
least, that is what i would recommend. This graph, stored in the stream 
"mymetafile.rdf" in the ODF package, contains the RDF statement which 
annotates the paragraph. In order for the CMS to find its RDF graph, we 
list it in the "manifest.rdf", and declare it to be of a user-defined type 
that the CMS understands.
Spelling things out explicitly, this gives:

Doug's example, implemented with RDF metadata:
(RDF examples are in N3 syntax, because RDF/XML is ... unintuitive)

in manifest.rdf:

<mymetafile.rdf> rdf:type pkg:MetadataFile   -- the file contains metadata
<mymetafile.rdf> rdf:type myns:CMSAnnotations
    -- the file is of interest to my CMS

in content.xml:

<text:p xml:id="id42">In this treatise we discuss the fooness of 
bars.</text:p>

in mymetafile.rdf:

<content.xml#id42> myns:isAbstract xs:true   -- identify the abstract


Now, something like this SPARQL query gets the URI of the element 
containing the abstract from the RDF graphs:

SELECT ?node
WHERE {
     GRAPH <...baseURI.../manifest.rdf> { ?g rdf:type myns:CMSAnnotations }
     GRAPH ?g { ?node myns:isAbstract xs:true }
}


Imho, given that ODF already has a quite powerful mechanism for extension 
(RDF), any other proposed extension mechanisms should be closely 
scrutinized as to whether they actually add some expressive power, or 
merely add additional complexity by introducing different ways of doing 
the same thing. Or, even if they add some additional expressive power, 
whether that _addition_ is really worth the added complexity.

Having just read a couple of the weblog articles that Doug has posted here 
in another mail about the customXml feature of OOXML, i do not see 
anything that would be obviously impossible to do with RDF metadata.
The main difference between the two approaches is the data model:
with customXml, the metadata is an XML tree, while with RDF, it is an RDF 
graph. Granted, some data is expressed more easily with a flexible graph, 
and other data more easily with a strict hierarchy, but i don't think that 
would be a deal-breaker either way.

One interesting feature of customXml is the 2-way data binding, with the 
data to be bound specified by an XPath expression. i assume this binding 
mechanism is standardized, yes?
In ODF we currently have text:meta-field, which is a field whose contents 
are given by RDF metadata, but we do not specify in any way how the 
content of such a field is generated from the metadata, or even from which 
_particular_ metadata, except for the prefix/suffix properties.
(i do not know why that is so, because most of the RDF metadata stuff in 
ODF was designed before i got involved.)
Maybe we could use a SPARQL query for bindings...

So, it seems to me that customXml and RDF metadata aim to solve (with 
different mechanisms) problem areas that have significant overlap, so much 
that i would have serious doubts about having both in the same document 
format. (But of course, i am not sufficiently knowledgeable about 
customXml to be certain about this.)


Furthermore:
RDF metadata allow us to specify not only data that describes data (i.e. 
metadata), but we can iterate this another time to get data that describes 
data that describes data.
Allow me to illustrate a potential solution to the (imho very serious) 
problem raised by Rob, namely, how can an application tell whether it is 
possible to copy the non-standard properties that are attached to a 
standard ODF element, and whether these non-standard properties may be 
invalidated by edits to the element's content.

The solution is to define a couple of standard properties that can be used 
to describe user-defined properties. These descriptions (or 
meta-properties) must be put into a RDF/XML file that is referenced from 
the ODF document's manifest.rdf.


my:property copyable [boolean]

Assume an ODF processor copies an element for which a statement of the 
form   <content.xml#id42> my:property "foo"   exists in some RDF graph.

If copyable is true, then copying that element will cause the inserted 
element to have an unique xml:id attribute, say id24, and the following 
statement inserted in the same RDF graph as the other statement:
<content.xml#id24> my:property "foo"


my:property isDigest [boolean]

Assume an ODF processor modifies the content of an element for which a 
statement of the form   <content.xml#id42> my:property "foo"   exists in 
some RDF graph.

If isDigest is true, then modifying that element will cause the statement 
to be removed (assuming that semantics of my:property is not understood by 
the processor in question, of course).

I am not aware of a way to do the equivalent with arbitrary XML elements 
and attributes. Thus, i would claim that using RDF metadata as an 
extension mechanism has the potential for improving interoperability.

Of course, these meta-properties i just made up here are currently not 
standardized. But the ODF TC has the power to do that, right?

regards,
michael (not the one you are used to, we've got more than one here :) )


-- 
Michael Stahl            mailto:michael.stahl@sun.com
http://www.sun.de        OpenOffice.org/StarOffice Writer
Sun Microsystems GmbH    Nagelsweg 55, 20097 Hamburg, Germany
-----------------------------------------------------------------------
Sitz der Gesellschaft:
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
Follow-Ups:
- Re: [office] RDF metadata as an extension mechanism
  - From: Jirka Kosek <jirka@kosek.cz>