office-metadata message

Subject: Re: [office] Re: [office-metadata] [issue] split object literals

From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>
To: Michael Brauer <Michael.Brauer@Sun.COM>
Date: Wed, 10 Jan 2007 09:38:01 -0500

On Jan 10, 2007, at 9:06 AM, Michael Brauer wrote:

...

> ... the case I'm talking about is that I want to say something about 
> the text "Some Title", for instance, add the author or an additional 
> description. Maybe I have overseen something, but isn't the subject in 
> this case the text "Some title"? If so, how do you represent this?

Practically speaking, with the example of a title, the subject is 
either the current document, or some other document; it would not be 
the span of text. Moreover, that string is an object.

Aside: in RDF, there is a generic rdf:value property which can serve as 
a display value if needed. The SIMILE project at MIT uses this in their 
PiggyBank browser. We could say a subject without an explicit property 
is an rdf:value property, but I really dislike that idea. If there is 
no property explicitly denoted in the content, then there is no triple.

>>> What do you mean by "worry"? That one doesn't have to care about 
>>> splitted property nodes, or that there will be a solution?
>> I mean it is not possible. RDF/XML is a more structured syntax. If 
>> you have two ex:foo nodes, you have two properties, without 
>> exception.
>
> I trust you that two ex:foo nodes result in two properties in RDF/XML. 
> But in my example, I only have on <ex:title> element. I therefore have 
> one property. Or what is wrong with this example?

My only point was that you cannot split objects in RDF/XML, so why I 
was focusing on this as a problem of mixing the metadata with the 
content.

...

>>> In the in-package case the metadata fragment would appear in some 
>>> stream next to the content.xml. For the in-content case one could 
>>> simply move the metadata fragment into the content.xml and adapt the 
>>> "about" URI. It is probably also possible to mix the metadata markup 
>>> with the content (I assume that is what RDFa does), but I don't know 
>>> how this will look like. Bruce, can you provide an example for this?
>> Yes, the example above does this. In RDFa, your example would not
>
> So, do you think it is a reasonable example, if we take aside that it 
> is not RDFa?

Yes.

>>> Anyway, I like the idea of identifying the spans that belong to a 
>>> certain object by a single id as it is the case in your option 1:
>>>
>>> <office:meta-subject xml:id="xyz"/>
>>>
>>> <text:p>
>>>   <text:span office:belongs-to="xyz">Some </text:span>
>>> </text:p>
>>> <text:p>
>>>   <text:span office:belongs-to="xyz">Title</text:span>
>>> </text:p>
>>>
>>> The attribute that defines the id (in terms of XML) is xml:id. The 
>>> office:belongs-to attributes are references to this id only. 
>>> Although only a single id is used here, we cannot omit the 
>>> <office:meta-subject> element, because we need to define the id that 
>>> has to be unique.
>> Why this last requirement?
>
> XML ids have to be unique.

Yes, but why do need the id?

> That is, there must not be more than one attributes of datatype ID 
> that have the same value. We may of course take some attribute 
> datatype, but then, the fragment identifier do now work anymore, 
> because they operate on attributes of type ID only
> .

Correct, but at least in RDF, one would never reference a property; not 
part of the model. One references resources (subjects).

>>> The advantage this example has is that the application thats saves 
>>> the document does not need to know in advance how many <text:span> 
>>> elements make up a subject. ...
>> Yes, but I'm not clear why we need the separate subject structure. 
>> Isn't it enough to know that two nodes share the same id?
>
> For the office application it is enough, because it knows that the 
> spans belong together. But a RDF application that processes the 
> document takes the two spans as different properties (correct me if 
> I'm wrong, but that's what you said above). So it would interpret the 
> metadata differently than intended.

I would expect that an RDF application would process an ODF file like 
this:

1) read the manifest and look for RDF/XML files; load them up directly
2) read the content file for the triples there, and load them in 
(merging happens automatically via the URIs)

On 2, if it comes across a property node with the attribute we've been 
talking about, it knows it first has to merge the literal with the 
other node(s) sharing the same attribute value.

In either case, we basically have to define the processing mechanism. 
But the solution I'm suggesting here has the virtue that it solves 
these problems in a simple way that is friendly to both RDF and regular 
XML tools.

Bruce

Follow-Ups:
- Questions on the RDFa split solution..
  - From: Svante Schubert <Svante.Schubert@Sun.COM>

References:
- [issue] split object literals
  - From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>
- Re: [office-metadata] [issue] split object literals
  - From: Svante Schubert <Svante.Schubert@Sun.COM>
- Re: [office-metadata] [issue] split object literals
  - From: "Bruce D'Arcus" <bdarcus@gmail.com>
- Re: [office] Re: [office-metadata] [issue] split object literals
  - From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>