office-metadata message

Subject: Re: [office-metadata] RDFa model and xml:id

From: Patrick Durusau <patrick@durusau.net>
To: Elias Torres <eliast@us.ibm.com>
Date: Wed, 13 Dec 2006 15:43:55 -0500

Elias,

Elias Torres wrote:

<snip>

>>I would have to ask one of the engineers what the cost of traversing the
>>DOM tree would be versus simply having the required data in a metadata
>>statement.
>>    
>>
>
>Sure. I'm one of them :). I'd hope that this is not an issue since anybody
>dealing with XML has to do for a living, it's called the XML DOM API after
>all. I believe it's a good thing to ask whether this is a *major* concern
>or not, but I think this is very low-level detail.
>
>  
>
>>Noting that one of the tradeoffs would be that if all the RDF triples
>>are in one or more metadata files, you don't have to process the
>>content.xml file unless you have some compelling reason to do so. Nor do
>>    
>>
>
>You have to traverse the DOM no matter what, because you need to look for
>xml:ids.
>
>  
>
Err, that is one use case where I am processing the document instance 
for editing/viewing.

Another use case is that I am processing all the metadata files only and 
not the content.xml files.

Trivial example: All patient records are stored as ODF and the metadata 
for those files should include snomed:birthdate and snomed:age metadata 
statements, plus I assume snomed:insurer (I assume there is in the 
snomed vocabulary. Sorry John, could not resist.)

In other words, if this data is actually missing from the file, the 
metadata properties don't either. I don't have to process content.xml to 
discover these errors.

Depending on what metadata you store in the metadata files, like Bruce's 
bibliographic data, you could extract all that data in RDF without ever 
touching the content.xml files.

Simply a question of how much overhead you think you will incur in 
processing a set of documents. Doing one to ten documents is probably 
trivial with either solution. Doing 100,000 documents or more, well, I 
think there would be performance differences.

Granted that you and Bruce are arguing that people can choose one or the 
other in terms of representation. On the other hand, I don't see any 
tangible benefit to the choice. If we can indeed do with one what can be 
done with the other, my instincts say go with the one that we know is 
likely to scale.

I was reminded of the need to plan long term by a public television show 
recently that had a researcher looking at plague records from the 1656 
plague in London. Can't always know what people will want to do or how 
many records they will need to process. (In this particular case, if he 
could have tracked everyone in London during the various disease 
outbreaks and related them to living descendants I suspect he would have 
done so. That would be a lot of metadata to process and even more 
content.xml.)

Hope you are having a great day!

Patrick

PS: I am about to bounce into a multi-hour conference call and have to 
go to a Christmas party connected with my wife's employer tonight. I 
will pick this back up tomorrow.

>>you have to walk the DOM tree. We are, afterall, specifying the rules
>>and if we don't want to allow syntax that could cause us to hunt for the
>>about attribute, we are not obligated to do so.
>>    
>>
>
>But we hunt for xml:id.
>
>  
>
>>Hope you are having a great day!
>>
>>Patrick
>>
>>    
>>
>
>
>
>
>  
>

-- 
Patrick Durusau
Patrick@Durusau.net
Chair, V1 - Text Processing: Office and Publishing Systems Interface
Co-Editor, ISO 13250, Topic Maps -- Reference Model
Member, Text Encoding Initiative Board of Directors, 2003-2005

Topic Maps: Human, not artificial, intelligence at work!

Follow-Ups:
- Re: [office-metadata] RDFa model and xml:id
  - From: Elias Torres <eliast@us.ibm.com>

References:
- Re: [office-metadata] RDFa model and xml:id
  - From: Elias Torres <eliast@us.ibm.com>