office-comment message

Subject: On RDF in ODF

From: Søren Roug <soren.roug@eea.europa.eu>
To: "office-comment@lists.oasis-open.org" <office-comment@lists.oasis-open.org>
Date: Sat, 27 Feb 2010 18:50:52 +0100

Dear ODF committee,

I’ve reviewed the RDF metadata part in the ODF 1.2 draft after it has sent out for public comment. I do this because we’re building a Semantic Web search engine, and I’d like it to be able to understand ODF. Since it was somewhat unclear to me what the intention with the <text:meta> and <text:meta-field> elements are, I went to Hamburg to talk to Svante Schubert and Michael Stahl to learn about these elements.

Now after having reflected, I have some concerns about the elements and some of the paradigms behind them.

For <text:meta-field> I learned that it is meant to display data from an RDF file external to content.xml. The exact way in which the content of the <text:meta-field> is generated from RDF metadata is unspecified and requires a plug-in. From my search-engine viewpoint, this is fine, as the element does not produce any triples to import. The search engine can ignore all elements of this type.

The second thing I learned is that the <text:meta> element is patterned after RDFa, but is to be used only for triples that have a literal as the object in a triple. This means that the RDFa attributes @typeof, @resource etc. are not part of the specification. The argument being that these triples are much better stored in an external RDF file. They did not mention if the user could choose to store a triple with a literal object in RDF. My question is; why stop there? If the application has to store some triples (such as rdf:type) in an RDF file, why not all of them, and then display a triple from the RDF file? <text:meta> would not exist. There would instead be a <text:meta-get> that behaves like <text:user-field-get>, but for metadata. I have to invent a new element, because none of the two existing can display a metadata property without plugins.

Before I argue further for <text:meta-get>, I want to mention that; as I’m not privy to all the use-cases collected by the ODF authors, I can’t know if I’ve accidentially made some important use-cases impossible. Consider <text:meta-get> a wish.

If we scrap the <text:meta>, then semantic web systems don’t have to parse content.xml for triples, because there won’t be any. As described earlier the other <text:meta-field> doesn’t generate triples. I consider that a good thing.

<text:meta-get> would have four attributes: text:resource, text:property, style:data-style-name and text:display. Just like for <text:user-field-get> the content will be a copy of the object with the data style applied.

<text:p text:style-name="Standard">
Albert Einstein was born <text:meta-get
style:data-style-name="N81"
text:resource="people.rdf#Einstein"
text:property="schema.rdf#born">14 March 1879</text:meta-get>.
</text:p>

The <text:meta-get> would also be able to display values from meta.xml. You just give it the URL of the document and the URL of the property. To display the title of the document:

<text:meta-get text:resource=""
text:property="http://purl.org/dc/elements/1.1/title">My document</text:meta-get>

This construction is functionally identical to <text:title>. In fact, there are about 23 XML elements you can remove in ODF-Next. As Svante said; A standard isn’t done until there is nothing you can take away. I think adding <text:meta-get> to take away 24 elements is a generous trade.

But <text:meta-get> can do even more. What would happen if the text:resource points to a URL on the Internet? It would behave like when the xlink:href attribute in <draw:image> points to an Internet resource – load the value from the Internet. To show Einstein’s birthdate, declared on DBPedia.org, would work like this:

<text:meta-get
text:resource="http://dbpedia.org/resource/Albert_Einstein"
text:property="http://dbpedia.org/ontology/birthDate">14 March 1879</text:meta-get>

But unlike <draw:image>, if the resource is not reachable, it will just show the old content. In effect, an ODF document can now treat the World Wide Semantic Web as a giant database of structured data.

Finally, you might have noticed that I’ve not used CURIEs in my examples. They come from RDFa, and I don’t think they are appropiate for ODF. Why shorten URLs that are going to be zip-compressed anyway? It just makes software applications more complex.

Best regards,

Søren Roug