office-metadata message

Subject: Re: [office-metadata] RDFa model and xml:id

From: Elias Torres <eliast@us.ibm.com>
To: patrick@durusau.net
Date: Wed, 13 Dec 2006 15:57:34 -0500

Patrick Durusau <patrick@durusau.net> wrote on 12/13/2006 03:43:55 PM:

> Elias,
>
> Elias Torres wrote:
>
> <snip>
>
> >>I would have to ask one of the engineers what the cost of traversing
the
> >>DOM tree would be versus simply having the required data in a metadata
> >>statement.
> >>
> >>
> >
> >Sure. I'm one of them :). I'd hope that this is not an issue since
anybody
> >dealing with XML has to do for a living, it's called the XML DOM API
after
> >all. I believe it's a good thing to ask whether this is a *major*
concern
> >or not, but I think this is very low-level detail.
> >
> >
> >
> >>Noting that one of the tradeoffs would be that if all the RDF triples
> >>are in one or more metadata files, you don't have to process the
> >>content.xml file unless you have some compelling reason to do so. Nor
do
> >>
> >>
> >
> >You have to traverse the DOM no matter what, because you need to look
for
> >xml:ids.
> >
> >
> >
> Err, that is one use case where I am processing the document instance
> for editing/viewing.
>
> Another use case is that I am processing all the metadata files only and
> not the content.xml files.
>
> Trivial example: All patient records are stored as ODF and the metadata
> for those files should include snomed:birthdate and snomed:age metadata
> statements, plus I assume snomed:insurer (I assume there is in the
> snomed vocabulary. Sorry John, could not resist.)
>
> In other words, if this data is actually missing from the file, the
> metadata properties don't either. I don't have to process content.xml to
> discover these errors.
>
> Depending on what metadata you store in the metadata files, like Bruce's
> bibliographic data, you could extract all that data in RDF without ever
> touching the content.xml files.
>
> Simply a question of how much overhead you think you will incur in
> processing a set of documents. Doing one to ten documents is probably
> trivial with either solution. Doing 100,000 documents or more, well, I
> think there would be performance differences.
>
> Granted that you and Bruce are arguing that people can choose one or the
> other in terms of representation. On the other hand, I don't see any
> tangible benefit to the choice. If we can indeed do with one what can be
> done with the other, my instincts say go with the one that we know is
> likely to scale.

I see a tangible benefit and that is content-duplication. We tried
explaining this on the call, but I guess we didn't make progress on that.
Let me repeat this again. RDF by nature deals very well with specifying
metadata externally from the content, so technically I can't argue with an
external only approach. However, content-duplication is something very
important that Svante,Barnd, John and others have expressed concerns. I'm
not sure who else, but at the moment you are the only one stating is not a
problem.

Svante wants to avoid content duplication but I believe he is not
necessarily for RDFa, so I'll look forward to see how he solves the problem
in meta.xml of duplicating content.

-Elias

>
> I was reminded of the need to plan long term by a public television show
> recently that had a researcher looking at plague records from the 1656
> plague in London. Can't always know what people will want to do or how
> many records they will need to process. (In this particular case, if he
> could have tracked everyone in London during the various disease
> outbreaks and related them to living descendants I suspect he would have
> done so. That would be a lot of metadata to process and even more
> content.xml.)
>
> Hope you are having a great day!
>
> Patrick
>
> PS: I am about to bounce into a multi-hour conference call and have to
> go to a Christmas party connected with my wife's employer tonight. I
> will pick this back up tomorrow.
>
> >>you have to walk the DOM tree. We are, afterall, specifying the rules
> >>and if we don't want to allow syntax that could cause us to hunt for
the
> >>about attribute, we are not obligated to do so.
> >>
> >>
> >
> >But we hunt for xml:id.
> >
> >
> >
> >>Hope you are having a great day!
> >>
> >>Patrick
> >>
> >>
> >>
> >
> >
> >
> >
> >
> >
>
> --
> Patrick Durusau
> Patrick@Durusau.net
> Chair, V1 - Text Processing: Office and Publishing Systems Interface
> Co-Editor, ISO 13250, Topic Maps -- Reference Model
> Member, Text Encoding Initiative Board of Directors, 2003-2005
>
> Topic Maps: Human, not artificial, intelligence at work!
>
>

Follow-Ups:
- Re: [office-metadata] RDFa model and xml:id
  - From: Svante Schubert <Svante.Schubert@Sun.COM>
- Re: [office-metadata] RDFa model and xml:id
  - From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>

References:
- Re: [office-metadata] RDFa model and xml:id
  - From: Patrick Durusau <patrick@durusau.net>