[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [office-metadata] RDFa model and xml:id
Patrick Durusau <patrick@durusau.net> wrote on 12/13/2006 03:43:55 PM: > Elias, > > Elias Torres wrote: > > <snip> > > >>I would have to ask one of the engineers what the cost of traversing the > >>DOM tree would be versus simply having the required data in a metadata > >>statement. > >> > >> > > > >Sure. I'm one of them :). I'd hope that this is not an issue since anybody > >dealing with XML has to do for a living, it's called the XML DOM API after > >all. I believe it's a good thing to ask whether this is a *major* concern > >or not, but I think this is very low-level detail. > > > > > > > >>Noting that one of the tradeoffs would be that if all the RDF triples > >>are in one or more metadata files, you don't have to process the > >>content.xml file unless you have some compelling reason to do so. Nor do > >> > >> > > > >You have to traverse the DOM no matter what, because you need to look for > >xml:ids. > > > > > > > Err, that is one use case where I am processing the document instance > for editing/viewing. > > Another use case is that I am processing all the metadata files only and > not the content.xml files. > > Trivial example: All patient records are stored as ODF and the metadata > for those files should include snomed:birthdate and snomed:age metadata > statements, plus I assume snomed:insurer (I assume there is in the > snomed vocabulary. Sorry John, could not resist.) > > In other words, if this data is actually missing from the file, the > metadata properties don't either. I don't have to process content.xml to > discover these errors. > > Depending on what metadata you store in the metadata files, like Bruce's > bibliographic data, you could extract all that data in RDF without ever > touching the content.xml files. > > Simply a question of how much overhead you think you will incur in > processing a set of documents. Doing one to ten documents is probably > trivial with either solution. Doing 100,000 documents or more, well, I > think there would be performance differences. > > Granted that you and Bruce are arguing that people can choose one or the > other in terms of representation. On the other hand, I don't see any > tangible benefit to the choice. If we can indeed do with one what can be > done with the other, my instincts say go with the one that we know is > likely to scale. I see a tangible benefit and that is content-duplication. We tried explaining this on the call, but I guess we didn't make progress on that. Let me repeat this again. RDF by nature deals very well with specifying metadata externally from the content, so technically I can't argue with an external only approach. However, content-duplication is something very important that Svante,Barnd, John and others have expressed concerns. I'm not sure who else, but at the moment you are the only one stating is not a problem. Svante wants to avoid content duplication but I believe he is not necessarily for RDFa, so I'll look forward to see how he solves the problem in meta.xml of duplicating content. -Elias > > I was reminded of the need to plan long term by a public television show > recently that had a researcher looking at plague records from the 1656 > plague in London. Can't always know what people will want to do or how > many records they will need to process. (In this particular case, if he > could have tracked everyone in London during the various disease > outbreaks and related them to living descendants I suspect he would have > done so. That would be a lot of metadata to process and even more > content.xml.) > > Hope you are having a great day! > > Patrick > > PS: I am about to bounce into a multi-hour conference call and have to > go to a Christmas party connected with my wife's employer tonight. I > will pick this back up tomorrow. > > >>you have to walk the DOM tree. We are, afterall, specifying the rules > >>and if we don't want to allow syntax that could cause us to hunt for the > >>about attribute, we are not obligated to do so. > >> > >> > > > >But we hunt for xml:id. > > > > > > > >>Hope you are having a great day! > >> > >>Patrick > >> > >> > >> > > > > > > > > > > > > > > -- > Patrick Durusau > Patrick@Durusau.net > Chair, V1 - Text Processing: Office and Publishing Systems Interface > Co-Editor, ISO 13250, Topic Maps -- Reference Model > Member, Text Encoding Initiative Board of Directors, 2003-2005 > > Topic Maps: Human, not artificial, intelligence at work! > >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]