office-metadata message

Subject: Re: [office-metadata] RDFa model and xml:id

From: Svante Schubert <Svante.Schubert@Sun.COM>
To: Elias Torres <eliast@us.ibm.com>
Date: Thu, 14 Dec 2006 02:57:04 +0100

Elias Torres wrote:
> Patrick Durusau <patrick@durusau.net> wrote on 12/13/2006 03:43:55 PM:
>
>   
>> Elias,
>>
>> Elias Torres wrote:
>>
>> <snip>
>>
>>     
>>>> I would have to ask one of the engineers what the cost of traversing
>>>>         
> the
>   
>>>> DOM tree would be versus simply having the required data in a metadata
>>>> statement.
>>>>
>>>>
>>>>         
>>> Sure. I'm one of them :). I'd hope that this is not an issue since
>>>       
> anybody
>   
>>> dealing with XML has to do for a living, it's called the XML DOM API
>>>       
> after
>   
>>> all. I believe it's a good thing to ask whether this is a *major*
>>>       
> concern
>   
>>> or not, but I think this is very low-level detail.
>>>
>>>
>>>
>>>       
>>>> Noting that one of the tradeoffs would be that if all the RDF triples
>>>> are in one or more metadata files, you don't have to process the
>>>> content.xml file unless you have some compelling reason to do so. Nor
>>>>         
> do
>   
>>>>         
>>> You have to traverse the DOM no matter what, because you need to look
>>>       
> for
>   
>>> xml:ids.
>>>
>>>
>>>
>>>       
>> Err, that is one use case where I am processing the document instance
>> for editing/viewing.
>>
>> Another use case is that I am processing all the metadata files only and
>> not the content.xml files.
>>
>> Trivial example: All patient records are stored as ODF and the metadata
>> for those files should include snomed:birthdate and snomed:age metadata
>> statements, plus I assume snomed:insurer (I assume there is in the
>> snomed vocabulary. Sorry John, could not resist.)
>>
>> In other words, if this data is actually missing from the file, the
>> metadata properties don't either. I don't have to process content.xml to
>> discover these errors.
>>
>> Depending on what metadata you store in the metadata files, like Bruce's
>> bibliographic data, you could extract all that data in RDF without ever
>> touching the content.xml files.
>>
>> Simply a question of how much overhead you think you will incur in
>> processing a set of documents. Doing one to ten documents is probably
>> trivial with either solution. Doing 100,000 documents or more, well, I
>> think there would be performance differences.
>>
>> Granted that you and Bruce are arguing that people can choose one or the
>> other in terms of representation. On the other hand, I don't see any
>> tangible benefit to the choice. If we can indeed do with one what can be
>> done with the other, my instincts say go with the one that we know is
>> likely to scale.
>>     
>
> I see a tangible benefit and that is content-duplication. We tried
> explaining this on the call, but I guess we didn't make progress on that.
> Let me repeat this again. RDF by nature deals very well with specifying
> metadata externally from the content, so technically I can't argue with an
> external only approach. However, content-duplication is something very
> important that Svante,Barnd, John and others have expressed concerns. I'm
> not sure who else, but at the moment you are the only one stating is not a
> problem.
>
> Svante wants to avoid content duplication but I believe he is not
> necessarily for RDFa, so I'll look forward to see how he solves the problem
> in meta.xml of duplicating content.
>
>   
I believe everyone would like to avoid duplication, as it is equivalent 
to the risk of inconsistency.
Although Elias might say - as so often - that as the application will 
handle the data and not a human will edit it by himself, the risk of 
inconsistency between the duplicated data of content and metadata is 
about zero, we have to be aware that there is a risk, as soon we have 
duplication.
On the other hand as soon we are starting to deal with metadata, there 
is always a risk that RDF subject and RDF object are no longer 
consistent. For example the name of the person in the content might be 
changed and no longer belong to the vcard data in the metadata.

In the end we have to compare and weight use cases to get to our desired 
design.

Have a nice night,
Svante

Follow-Ups:
- Content dublication and ODF related RDF vocabulary
  - From: Svante Schubert <Svante.Schubert@Sun.COM>
- Re: [office-metadata] RDFa model and xml:id
  - From: Elias Torres <eliast@us.ibm.com>

References:
- Re: [office-metadata] RDFa model and xml:id
  - From: Elias Torres <eliast@us.ibm.com>