office-metadata message

Subject: Re: [office-metadata] Focus on model
From: Lars Oppermann <Lars.Oppermann@Sun.COM>
To: Elias Torres <eliast@us.ibm.com>
Date: Fri, 15 Dec 2006 17:00:11 +0100
Hello Elias,

Some more thoughts from me...

So it looks, like for the indexing applications, which want to store 
information about documents, there needs to be a defined 'function', 
that describes how to extract the meta-data-model from a given ODF 
document, so that statements made about specific parts of the document 
can be related back to the document as well as the specific parts which 
they relate to. The syntax, by which this model is represented inside 
the document is largely (if not at all) irrelevant, as long as a 
function exists, that can extract it. This function should also be 
implementable with reasonable effort on a wide range of platforms.

I can imagine both, a function that does the 'special' resource 
resolution which I used in my RDF/XML examples as well as another 
function, which pulls resources that are marked as such directly from 
the content. Both functions would have the same result.

Now, if we approach the matter from the generation side rather than from 
the consumption side, it should also be easy to a) create the meta-data 
content while working on the content and b) preserve the meta-data's 
integrity when the content is modified.

It looks to me, like all approaches which we looked at seem equally 
suited to defining a function that pulls the meta-data from the document.

Is this understanding consistent with how you other think about this 
matter, or are there other aspects that should have an influence on this?

Bests,
Lars

Elias Torres wrote:
> Lars.Oppermann@Sun.COM wrote on 12/14/2006 12:17:42 PM:
> 
>> Thanks Elias, very helpful example...
> 
> You are welcome.
> 
>> What is (or what are) the standard way(s) for RDF tools to resolve
>> resources? If we assume that #somethingElse can be resolved how is it
>> done. If the metadata is to be useful, being able to resolve the objects
>> is certainly vital.
> 
> I might have used the wrong word here: resolve. I didn't meant resolve in
> the resolution/locating/etc. I meant more interpreting. In RDF, a resource
> is just that a resource, that can be the subject, predicate, object of any
> number of triples. The basic processing of it, means, look for other
> triples that contain it. The suggestion of linking to resources that are
> just pointers to literals without indicating so in the model is not
> straight-forward and to a certain extent undefined.
> 
> 
>> For my example, a tool would need to be able to 'xpointer' into an ODF
>> document an retrieve the resource based on the fragment identifier.
> 
> I think we need to not think of xpointer in discussions relating to the
> model because pointing to data structures as opposed to named resources
> breaks rather quickly.
> 
>> Resolving resources is no longer a problem, if they are kept as
>> literals. However, how do we avoid redundancy? I see how RDFa or an
>> RDFa-like approach can be one solution to this. However whether an
>> external ODF tool extracts metadata statements from the content of an
>> ODF file or the values of resources that need to be dereferenced doesn't
>> seem to be a terribly different thing - it's both about extracting xml
>> from an xml file in a zip archive. I might very well oversimplify here;
>> please let me know.
> 
> Absolutely, at some point we have to process the XML files in the package
> to extract a logical model. I'm just suggesting we do the least amount of
> processing to satisfy the requirements.
> 
> Logically, I see this as follows:
> 
> - Using our own defined method of extracting RDF from content.xml
> (RDFa-like for example) we end up with a RDF in-memory model.
> - We then just *read* entries in the manifest file for other metadata
> resources of type RDF/XML and merge them (based on formal RDF mechanisms
> not defined by us and already implemented).
> 
> Our result will be a complete model of the ODF package represented as RDF
> ready to be queried by plugins in a unified way.
> 
> If we were to do the xml:id only approach, we would have to do extra
> processing on each RDF/XML file to resolve literals from the content, in a
> yet unspecified way. It seems to me like not worth the effort, especially
> when we have outlined advantages of storing metadata in-context within the
> content.xml and no model-level disadvantages to doing so.
> 
> BTW, I'm really pleased with this level of the discussion. I think Florian
> is genius for pointing out the flaw in our discussions focusing on syntax
> and not on the model.
> 
> -Elias
> 
>> Bests,
>> Lars
>>
>> Elias Torres wrote:
>>>> What use-cases can't be implemented by this simplistic approach? Why
> is
>>>> there a need to introduce anything beyond a mechanism for referring to
>>>> fragments into the content?
>>> This is exactly what I detailed in a previous email. The problems are:
>>>
>>> - We are stepping outside of RDF standard processing and assuming that
> the
>>> person knows that #B is not a real resource but in fact a special type
> of
>>> resource called OD element that needs to be de-reference and it's
> content
>>> turned into a literal.
>>> - We lose the ability for the RDF to be processed stand-alone by other
> RDF
>>> tools.
>>> - If you look at the example below, you don't have a way to
> differentiate
>>> from RDF "resources" from OD resources
>>> - Not all ontologies allow us to substitute literals with resources
>>> - We lose some copy and pasteability because it's not a trivial process
> to
>>> extract a subset of the graph from the meta(s).xml files.
>>>
>>> <rdf:RDF>
>>>    <rdf:Description rdf:about="#A">
>>>      <dc:author rdf:resource="#B">
>>>      <dc:date rdf:resource="#C">
>>>     </rdf:Description>
>>>    <rdf:Description rdf:id="#C">
>>>      <rdf:type rdf:resource="#SomethingElse">
>>>     </rdf:Description>
>>> </rdf:RDF>
>>>
>> --
>> Lars Oppermann <lars.oppermann@sun.com>               Sun Microsystems
>> Software Engineer                                         Nagelsweg 55
>> Phone: +49 40 23646 959                         20097 Hamburg, Germany
>> Fax:   +49 40 23646 550                  http://www.sun.com/staroffice
>
Follow-Ups:
- Re: [office-metadata] Focus on model
  - From: Elias Torres <eliast@us.ibm.com>
- Re: [office-metadata] Focus on model
  - From: Bruce D'Arcus <bdarcus@gmail.com>
References:
- Re: [office-metadata] Focus on model
  - From: Elias Torres <eliast@us.ibm.com>