office-metadata message

Subject: Re: [office-metadata] RDF/XML and XPath
From: Michael Brauer - Sun Germany - ham02 - Hamburg <Michael.Brauer@Sun.COM>
To: Elias Torres <eliast@us.ibm.com>
Date: Thu, 08 Feb 2007 11:59:43 +0100
Elias, All,

thank you very much for the many examples. They are very helpful.

The conclusion I draw from them actually is not that XForms + RDF/XML 
won't work at all, but that we have to make some restrictions on the 
XML/RDF that we include if we want that it is accessible from XForms. 
Making these restriction is valid, because the purpose of the SC is not 
to add RDF/XML (or RDF, RDFa), but to provide a solution for certain 
meta data related use cased.

When I look at your examples, I furthermore believe (at least for the 
moment) that we have to make similar restriction for the RDF/XML+RDFa 
approach. Why? Let's assume we don't do so, and you have a certain RDF 
triple in the content.xml, using RDFa. What do you have to do to make 
sure that this triple is in sync with the triples in the RDF/XML 
streams, or how do you provide a user interface to add additional 
triples to content.xml? You have to parse all RDF/XML files, and have to 
convert them into a RDF model. This may not only be time consumptive, if 
RDF/XML has as many variants as you say, it may also be a huge effort to 
implement all of them. I therefore suggest that we restrict the RDF/XML 
support to those things that we need to implement the use cases, instead 
of opening the door so wide that we get a huge flexibility, but run into 
the risk that we don't get a running implementation.

Below you say

 > RDF/XML is trying to represent a graph,
 > not a tree, like HTML. RDFa takes advantage of the tree properties of
 > HTML
 > to define an extraction mechanism but only of triples than then
 > together
 > will make up a graph.

I don't doubt that RDFa fits well into HTML, but we have to consider 
that ODF is not HTML, and that ODF is used in different ways than HTML.

Authors that create/edit HTML documents usually have a basic HTML 
knowledge, and I assume that's in particular the case if they add 
semantic information to documents.

Authors that create/edit ODF documents usually don't have a knowledge 
about ODF. The just edit office documents, and an office document is 
what they see in the office application's GUI. They are not aware of the 
hierarchical structure of an ODF document, and office application don't 
expose that structure to the user. That's a large difference to how HTML 
documents are edited. So, while making use of the hierarchical structure 
of documents may work in the HTML case, I have severe doubts that this 
will work well in the ODF case. However, this does not mean that RDFa, 
or RDFa + RDF/XML may not work for ODF at all, but we have to adapt it 
to the office document world, even if these means that we have some 
restrictions regarding the variants we are supporting. Once again, that 
is a valid solution, because it is not the purpose of the SC to RDF/XML 
or RDFa support in general, but to provide a solution for certain meta 
data related use cased.

How do we proceed? My suggestion is that whose who are in favor of the 
RDF/XML+XForms approach work an a detailed proposal for their approach, 
and that those who are in favor of the RDF/XML + RDFa approach work on a 
detailed proposal for their approach as well. If we have both, I suggest 
that we use the excellent experience of this group to identify the (meta 
data and office document related) gaps and issues of the two approaches, 
with the goal to resolve them and to harmonize the two. I further 
suggest that we take Bernd's example document as a test case for our use 
cases.

Does that sound reasonable?

Best regards

Michael

Elias Torres wrote:
> Svante,
> 
> I'll go through as many problems as I can point out with thinking that
> XPath is feasible with just a few variations on the RDF/XML serializations
> of an RDF model.
> 
> During the call I heard you mentioned "walking up the tree" looking for the
> top most @rdf:about. I just want to point out how that itself wouldn't work
> using examples from the RDF/XML specification.
> 
> I'm going to assume that to find out the subject of ex:fullName you'd go up
> and look for @rdf:about which points to
> <http://www.w3.org/TR/rdf-syntax-grammar>
> 
> <?xml version="1.0"?>
> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
>          xmlns:dc="http://purl.org/dc/elements/1.1/";
>          xmlns:ex="http://example.org/stuff/1.0/";>
>   <rdf:Description rdf:about="http://www.w3.org/TR/rdf-syntax-grammar";
>                dc:title="RDF/XML Syntax Specification (Revised)">
>     <ex:editor ex:fullName="Dave Beckett" />
>   </rdf:Description>
> </rdf:RDF>
> 
> In N3:
> 
> <http://www.w3.org/TR/rdf-syntax-grammar> dc:title "..." ;
>       ex:editor [
>             ex:fullName "Dave Beckett" .
>       ] .
> 
> Meaning that ex:fullName "Dave Beckett" is a property on an anonymous node
> and NOT on the <http://www.w3.org/TR/rdf-syntax-grammar> itself. There's an
> extra node in the graph w/o a name, so you can't assume the "highest"
> rdf:about is the one that pertains. RDF/XML is trying to represent a graph,
> not a tree, like HTML. RDFa takes advantage of the tree properties of HTML
> to define an extraction mechanism but only of triples than then together
> will make up a graph.
> 
> Let's take another variation on the previous one.
> 
> <?xml version="1.0"?>
> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
>          xmlns:dc="http://purl.org/dc/elements/1.1/";
>          xmlns:ex="http://example.org/stuff/1.0/";
>         xml:base="http://www.w3.org/TR/";>
>   <rdf:Description rdf:about="rdf-syntax-grammar"
>                dc:title="RDF/XML Syntax Specification (Revised)">
>     <ex:editor ex:fullName="Dave Beckett" />
>   </rdf:Description>
> </rdf:RDF>
> 
> Notice how I've added xml:base to the document. Now your XPath would break
> because you were matching for the full URI.
> 
> Another short variation....
> 
> <?xml version="1.0"?>
> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
>          xmlns:dc="http://purl.org/dc/elements/1.1/";
>          xmlns:ex="http://example.org/stuff/1.0/";
>         xml:base="http://www.w3.org/";>
>   <rdf:Description xml:base="TR/" rdf:about="rdf-syntax-grammar"
>                dc:title="RDF/XML Syntax Specification (Revised)">
>     <ex:editor ex:fullName="Dave Beckett" />
>   </rdf:Description>
> </rdf:RDF>
> 
> Notice yet another xml:base declaration in rdf:Description. Remember, all
> of these serializations generate an isomorphic RDF model.
> 
> Also note how I have not yet even used RDF/XML vs RDF/ABBREV, completely
> throwing out the argument of using a "simplified/constrained" RDF/XML
> serialization.
> 
> Here's another example that shows the love/hate relationship between XPath
> and RDF.
> 
> Notice how ex:fullName property is expressed using an attribute and then in
> the second version, it's done using an element.
> 
> <?xml version="1.0"?>
> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
>          xmlns:dc="http://purl.org/dc/elements/1.1/";
>          xmlns:ex="http://example.org/stuff/1.0/";>
>   <rdf:Description rdf:about="http://www.w3.org/TR/rdf-syntax-grammar";
>                dc:title="RDF/XML Syntax Specification (Revised)">
>     <ex:editor>
>       <rdf:Description ex:fullName="Dave Beckett">
>       <ex:homePage rdf:resource="http://purl.org/net/dajobe/"; />
>       </rdf:Description>
>     </ex:editor>
>   </rdf:Description>
> </rdf:RDF>
> 
> <?xml version="1.0"?>
> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
>          xmlns:dc="http://purl.org/dc/elements/1.1/";
>          xmlns:ex="http://example.org/stuff/1.0/";>
>   <rdf:Description rdf:about="http://www.w3.org/TR/rdf-syntax-grammar";
>                dc:title="RDF/XML Syntax Specification (Revised)">
>     <ex:editor rdf:parseType="Resource">
>       <ex:fullName>Dave Beckett</ex:fullName>
>       <ex:homePage rdf:resource="http://purl.org/net/dajobe/"/>
>     </ex:editor>
>   </rdf:Description>
> </rdf:RDF>
> 
> Here's another one showing a completely different structure. Notice the
> channel1 and channel2 references.
> 
> <?xml version="1.0"?>
> <rdf:RDF
>    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
>    xmlns="http://testuri.org#";>
> <tv-guide>
>    <channels>
>       <rdf:List>
>          <rdf:li>
>             <channel rdf:ID="channel1"/>
>          </rdf:li>
>          <rdf:li>
>             <channel rdf:ID="channel2"/>
>          </rdf:li>
>       </rdf:List>
>    </channels>
> </tv-guide>
> </rdf:RDF>
> 
> <?xml version="1.0"?>
> <rdf:RDF
>    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
>    xmlns="http://testuri.org#";>
> <tv-guide>
>    <channels>
>       <rdf:List>
>          <rdf:li rdf:resource="#channel1"/>
>          <rdf:li rdf:resource="#channel2"/>
>       </rdf:List>
>    </channels>
> </tv-guide>
> <channel rdf:ID="channel1"/>
> <channel rdf:ID="channel2"/>
> </rdf:RDF>
> 
> Here's one showing long and abbreviated format examples:
> 
> <?xml version="1.0" encoding="iso-8859-1"?>
> 
> <rdf:RDF
> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
> xmlns:dc="http://purl.org/dc/elements/1.0/";>
> <rdf:Description rdf:about="http://www.mcgill.ca/libraries-techserv/";
>       dc:title="McGill University Libraries Library Technical Services Home
> Page"
>       dc:creator="Karen Jensen"
>       dc:subject="..."
>       dc:description="..."
>       dc:publisher="McGill University Libraries. Library Technical
> Services."
>       dc:contributor="..."
>       dc:date="2003-03-03"
>       dc:type="Text"
>       dc:format="text/html"
>       dc:identifier="http://www.mcgill.ca/libraries-techserv/";
>       dc:language="en"/>
> </rdf:RDF>
> 
> and ... not rdf:Description and again everything went from attributes to
> elements.
> 
> <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.0/";
>     xmlns:log="http://www.w3.org/2000/10/swap/log#";
>     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";>
> 
>     <Text xmlns="file:/tmp/"
>         rdf:about="http://www.mcgill.ca/libraries-techserv/";>
>         <dc:contributor>...</dc:contributor>
>         <dc:creator>Karen Jensen</dc:creator>
>         <dc:date>2003-03-03</dc:date>
>         <dc:description>...</dc:description>
>         <dc:format>text/html</dc:format>
> 
> <dc:identifier>http://www.mcgill.ca/libraries-techserv/</dc:identifier>
>         <dc:language>en</dc:language>
>         <dc:publisher>McGill University Libraries. Library Technical
> Services.</dc:publisher>
>         <dc:subject>...</dc:subject>
>         <dc:title>McGill University Libraries Library Technical Services
> Home Page</dc:title>
>     </Text>
> </rdf:RDF>
> 
> I hope these examples help you see why using XPath is just not a good idea
> to link between ODF and RDF/XML serializations of our metadata.
> 
> -Elias
> 


-- 
Michael Brauer, Technical Architect Software Engineering
StarOffice/OpenOffice.org
Sun Microsystems GmbH             Nagelsweg 55
D-20097 Hamburg, Germany          michael.brauer@sun.com
http://sun.com/staroffice         +49 40 23646 500
http://blogs.sun.com/GullFOSS
References:
- RDF/XML and XPath
  - From: Elias Torres <eliast@us.ibm.com>