office message

Subject: Re: [office-metadata] Multiple content nodes representing on RDFsubject
From: Michael Brauer - Sun Germany - ham02 - Hamburg <Michael.Brauer@Sun.COM>
To: "Bruce D'Arcus" <bdarcus@gmail.com>
Date: Wed, 03 Jan 2007 14:03:12 +0100
Hi Bruce,

Bruce D'Arcus wrote:
> 
> Hi Michael,
> 
> On Dec 28, 2006, at 12:09 PM, Michael Brauer wrote:
> 
>> the intention behind Svante's example was to show that there is no
>> one-to-one relation between XML elements and RDF subjects. An RDF 
>> subject actually may be composed of several XML elements. This is at 
>> least the case if the subject is a text fragment. That's the reason 
>> why Svante proposed to add something like a <content> element that 
>> combines the individual <text:span> elements into a single RDF subject.
> 
> Yes, I understand the problem (I think). I just don't agree with the 
> solution :-)
> 
> The way that one binds statements to a subject in RDF is through the 
> names you give the subjects; the URIs.
> 
> For example, from the RDF perspective, this ....
> 
> <rdf:Description rdf:about="x">
>   <ex:foo>one</ex:foo>
> </rdf:Description>
> 
> <rdf:Description rdf:about="x">
>   <ex:bar>two</ex:bar>
> </rdf:Description>
> 
> ... is fully equilalent to this:
> 
> <rdf:Description rdf:about="x">
>   <ex:foo>one</ex:foo>
>   <ex:bar>two</ex:bar>
> </rdf:Description>
> 
> The reason is because you're using the name/URI to do the association, 
> and each property is -- from the model standpoint -- a discrete 
> statement about that resource. It doesn't matter where it is.

I agree, but how does this relate to my example where the subject (in RDF terminology)
is existing in the ODF document, and consists of several XML elements?

I'm currently not referring to any use cases where the subject is not existing in the ODF 
document, because this in my point of view is a different class of use cases.

In general, I think we have three classes of use cases:

a) Use cases where the subject is existing in the document (it could be a piece of text, a 
text box, a table cell), and where the objects are not existing in document.
b) Use cases where only the objects are existing in the document.
c) Use cases where both, subjects and objects are existing in the document.

Right now, I'm only talking about use case a), because it is a very simple case, but it 
helps us to understand what kind of "ODF objects" (I'm using the term "ODF objects" not in 
RDF terminology here, but to denote things like paragraphs, text boxes, table cells etc.) 
we have in office documents that may become subjects (and maybe also objects) in RDF.

> 
> Svante was wanting to use a completely different mechanism (XPointer) to 
> do association that will not mix well here.

I'm not sure where Svante wants to use XPointer in the example I'm referring to. But since 
RDF subjects are identified by IRIs, they at least syntactically permit the use xpointers. 
I personally could imagine that using xpointers to reference subjects is in particular 
useful if the same metadata is applied to multiple subjects. A real example for this could 
be marking text as important, where one probably does not want to assign an id to all 
pieces of text that are important.

In general, to resolve use case a) we have to
a1) define what the possible subjects in an ODF document are,
a2) make sure that these subjects can be referenced by the IRIs that are contained
in RDF's rdf:about attribute.

Is that correct or have I overseen something?

To solve task a2), for most ODF objects its probably sufficient to add an id attribute to 
the XML element that represents the ODF object. But this for technical reasons does not 
work for text, because for text, it may happen that the text is distributed to several 
elements, and that we cannot combine them in a single one. That's exactly the situation 
Svante and myself were talking about.


> 
>> I agree that in his example it would be valid to interpret the three 
>> <text:span> elements as individual subjects, but I don't think this is 
>> valid in all cases. If the meta data is for instance a description of 
>> some text (for instance for accessibility purposes), then you must not 
>> break the text into pieces, because this changes the meaning.
> 
> I think the practical use case examples here are useful. Can you explain 
> this a bit more?

Some examples follow below.

> 
> When I wrote the accessibility use case, I was thinking of fairly simple 
> examples, like an embedded image includes a fuller RDF/XML description 
> in the package that can be used to present further information to the user.
> 
> An analog for the in-content case might be if a document has a link to a 
> client resource. So maybe we have:
> 
> <text:p xml:id="1">Some client <text:link 
> meta:resource="http://ex.net/people/1"; meta:property="ex:client">Jane 
> Doe</text:link>

Just to make sure I do understand that example. Here, the RDF subject is 
"http://ex.net/people/1";, the predicate is "ex:client", and the object is "Jane Doe"?

Which means, this is case b) of my classification above?

> 
> There might be further metadata in the package that can be useful for 
> accessibility purposes to present further information about the resource 
> named <http://ex.net/people/1> (and labeled "Jane Doe").
> 
> So the question is, how to identify the subject, such that one can make 
> further statements about that subject elsewhere in the document.
> 
> One answer is meta:about.

I agree. If my understanding of this example is correct, then the content.xml of an office 
document defines properties of the resource "http://ex.net/people/1";. Technically that's 
possible. But in practice: Would it be the office document that defines these properties? 
Or would they exist already somewhere else, and the office document would either contain a 
local copy of them, or would just reference them? And wouldn't it be better to have all 
information about "http://ex.net/people/1"; in one place within the office document instead 
of having them spread over the content.xml?

ODF has a package concept, so there is no need to mix the meta data with the content for 
the purpose of having them in one physical file. And ODF has text fields which allow to 
display (typed) information defined somewhere within the content. So, I actually don't 
know yet whether we need in-content meta information or not, but we should consider the 
existing possibilities of ODF then modeling the use cases where a document contains meta 
data for subjects not defined in the content.xml.

> 
>> Other cases are annotations that may be added to arbitrary text the 

>> user selects, and which may include paragraph breaks. And I'm sure 
>> there are other use cases where an RDF subject must consists of 
>> several XML elements. Please let me know if you would like to have 
>> more detailed examples for this, and I will provide them next year.
> 
> See above. Some more detailed examples of the problem would be helpful.

Example 1:

Take the source code

<rdf:Description rdf:about="content.xml#x">
   <ex:foo>one</ex:foo>
</rdf:WrongDescription>

In ODF, this would be

<text:p text:style-name="Example">&lt;rdf:Description rdf:about="content.xml#x"&gt;</text:p>
<text:p text:style-name="Example">&lt;ex:foo&gt;one&lt;/ex:foo&gt;</text:p>
<text:p text:style-name="Example">&lt;/rdf:WrongDescription&gt;</text:p>

There are three paragraphs, which require three <text:p> elements.

Now, let's assume there is an annotation feature that is implemented based on metadata. A 
user reads the document and notices that the start tag does not match the end tag, and 
therefore adds an annotation "start tag does not match end tag". This annotation is only 
meaningful if applied to the three paragraphs simultaneously.

Example 2:

Let's assume a document contains the name "Micheal Brauer" (please note the spelling 
error). "Micheal Brauer" has a text style and meta data assigned. The meta data in this 
example is only provided as "[meta data]":

<text:p><text:span text:style-name="Name" [meta data]>Micheal Brauer</text:span></text:p>

Now, the spelling error gets corrected with change tracking enabled. Without caring about 
meta data, this example typically would turn into

<text:p><text:span text:style-name="Name">Mich</text:span><text:change-start 
text:change-id="ct28274720"/><text:span 
text:style-name="Name">ae</text:span><text:change-end 
text:change-id="ct28274720"/><text:change text:change-id="ct10771728"/><text:span 
text:style-name="Name">l Brauer</text:span></text:p>

This means, the <text:span> element gets splited, and the same would happen to the meta 
data. This could be avoided, but this would make implementations more difficult, and 
calculating appropriate <text:span> elements is already difficult. The example above in 
fact is a simple one. The text may also contain different styles, hyperlinks, cross 
references, etc. Which means, its probably not impossible to require that a piece of text 
that occurs within a paragraph is contained in a single XML element, but this means that 
other element needs to splitted, and this may an impact on these elements, too. So, from 
the implementation perspective, its much easier if text that has meta data attached can be 
splitted, and this required for the above example anyway.



> 
> On annotations, for example, I would expect to do:
> 
> <text:p xml:id="1">...</text:p>
> 
> ... and:
> 
> <rdf:Description rdf:about="content.xml#x">
>   <ex:foo>one</ex:foo>
> </rdf:Description>
> 
> IF we want to allow the user to select text that spans paragraphs and 
> then to annotate THOSE (do we??), then I agree we might have to do some 

Yes, I think we do so.

> gymnastics. But I'd prefer to keep them standard RDF (and get some 
> feedback from the RDF experts on it).
> 
> Not sure, but maybe something like:
> 
> <text:p xml:id="1">...<text:span xml:id="2">beginning, and 
> ...</text:span></text:p>
> <text:p xml:id="3"><text:span xml:id="4">end.</text:span>...</text:p>
> 
> <rdf:Description rdf:about="content.xml#2">
>   <ex:foo>one</ex:foo>
> </rdf:Description>
> 
> <rdf:Description rdf:about="content.xml#4">
>   <owl:sameAs rdf:resource="content.xml#2"/>
> </rdf:Description>

<owl:sameAs> means that #4 is the same as #2, but that's not the case.

> 
> As I said, I'd want some feedback from people like Elias and Dan on 
> this, but I do think there are ways to address it.
> 
>> I think we have to consider this, regardless whether the preferred 
>> solution is RDF/A or RDF/XML files in a package.
> 
> Agreed.
> 
>> Earlier in this thread you wrote
>>
>> > Note that there's no way to derive this statement from what you
>> > presented above. All you are doing above in the XML is identifying the
>> > node (as a *possible* subject).
>>
>> I personally think to have the possibility to add IDs (or similar 
>> attributes) to identify *possible* subjects is important, because this 
>> allows to add meta data to a document without touching the content.xml 
>> stream.
> 
> Correct.

So we are in agreement that a reasonable solution for the use case class a) above is to 
put the meta data in a separate stream in the package and to reference the subjects in the 
content.xml using URIs?

Where we are not in agreement are the use case classes b) and c)?

> 
>> That's why I personally think that we must have the possibility to 
>> assign meta data to subjects in the content.xml by IDs. Please note 
>> that this does not mean that we may not have something like RDF/A's 
>> inline attributes in addition, if we find use cases where they are 
>> advantageous or required (that's something we should discuss 
>> separately), but only that I think that RDF/A's inline attributes are 
>> not sufficient. If I did understand Elias correctly, then RDF/A 
>> includes the possibility to assign meta data using IDs.
> 
> Yes. The xml:id identifies the content node. The meta:about would 
> identify other subjects that are described with the content node. In his 
> example, you would use the about attribute to identify the row.
> 
> Think also of:
> 
> <text:p xml:id="1" meta:about="http://ex.net/patients/1";>...</text:p>
> 
> ... and:
> 
> <rdf:Description rdf:about="content.xml#1">
>   <ex:status rdf:resource="http://ex.net/status/Important"/>
> </rdf:Description>
> 
> <v:VCard rdf:about="http://ex.net/patients/1";>
>   <v:fn>Jane Doe</v:fn>
> </v:VCard>
> 
> Does that make sense?

 From the metadata perspective this example makes sense. However, I'm not sure if one in 
practice wouldn't prefer to have all data about http://ex.net/patients/1 in one place, 
which means to have a link from "http://ex.net/patients/1"; meta data to the paragraph with 
id "1":

<text:p xml:id="1" ...</text:p>

and

<rdf:Description rdf:about="http://ex.net/patients/1";>
   <ex:data-yxz rdf:resource="content.xml#1"/>
</rdf:Description>

I don't know whether this is proper RDF or not, but in my point of view, this approach has 
several advantages:

- The data about "http://ex.net/patients/1"; would be in one place; One could find all data 
  that is available without scanning the content.xml
- The object of "ex:data-yxz" could be updated by updating the URI in 
"http://ex.net/patients/1"; record only
- The only extension to schema for content.xml and therefore the only required 
implementation are IDs. meta data and content get separated.

Another approach would be to have the content of the paragraph with id "1" in the metadata 
itself, and to display it with a text field. However, this would mean to duplicate the data.

In other words: I don't doubt that your example provided a valid solution for the problem 
you are describing, but there are probably other solutions, that we should consider, too.


Best regards

Michael