[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [office-metadata] Our discussion on the Wiki example
Hi Elias, Elias Torres wrote: It surprises me that your statement sounds like RDFa or nothing.I would like to remind us all that we don't have much time left and only meeting weekly doesn't help either. I don't mind answering any questions on the issues and am willing to work with the rest of the task force on tweaking/changing/fixing the proposal presented here by Bruce and I. Especially as RDFa is not a standard. Even if it will be once on the way, it will be certainly subject of changes. OASIS /ISO won't be able to reference it yet. But what really bothers me is that it was designed for XHTML being a flat format. RDFa is about embedding the meta data. ODF is compound and anybody correct me if I am wrong, I was sure everybody figured out that these should be separated (without redundancy). Could you express the type of scenario, which is not possible.However, I would like to say that the only reason why we suggest an RDFa-like is because we believe it has the chance of solving a large percent of the use case requirements in our charter without resolving to custom XML schemas for each of the scenarios. If the suggestions/questions presented to us indicate a fundamental enough change in direction (like some of Svante's thoughts on structure and grammars) I would like to request that those be presented as separate proposals to the group so we can discuss separately. I'm worried that we will spend our little time discussing points in emails as opposed to a specific approach or draft and never get anything accomplished, I've seen it happened before. Elias, here you state it is not productive to work on >1 proposals, later you ask to write a different proposal.Also, I'm hoping that we can make a decision as a group towards which proposal we select moving forward, because I don't think is productive to work on 2 or more proposals at the same time in order to reach a single specification. Unless of course, the current proposal is flawed or insufficient in respect to our use case and requirements. We might as well step back and define the scenarios, we would like to show in Wiki as examples: For instance:
Analyzing their dis/advantages and choose one. Does not sound to complicated nor time consuming. As Bruce was so kind to start with one example, I commented it asked for changes. I see no delay with this process. Long story short, I said do not request 'urn:uuid' for an internal reference between content and meta, as you have used it in the example.Now onto Svante's email. "Bruce D'Arcus" <bdarcus@gmail.com> wrote on 12/01/2006 04:37:57 PM:On Dec 1, 2006, at 1:53 PM, Svante Schubert wrote:Instead of "meta:about="urn:uuid:fe107eb0-7704-11db-9fe1-0800200c9a66", we might use in our case meta:id="citation". It's mnemonic and the value of the meta:id (which is not a xml:id as it does not have to be unique, when expressing a type) would be offered during meta data creation by the ODF application component, which is responsible for this type of meta data.A good point you make here is the fact that xml:id must be unique within an XML document and that's not necessarily what we are after in our RDFa (or should we call it ODFa? :D) approach. The meta:about attribute we are suggesting is simply to denote the subject of the relationship we are trying to establish within the content. My experience with metadata tells me we have two basic options: we either embed the metadata in the content or we store separately and link to it somehow. RDFa is addressing the first and our requirements and environment suggests we look at the second. I think we need to tream them separately somewhat. Let me explain why. Approach #1 In content.xml <meta about="foo.jpg" property="dc:creator">Elias Torres</meta> In meta.xml Nothing. Approach #2 In content.xml: <img src="foo.jpg"/> In meta.xml <rdf:Description about="foo.jpg"> <dc:creator>Elias Torres</dc:creator> </rdf:Description> As you can see, they really don't have much in common. In #1 we need a way to model to express our metadata needs and we draw from RDFa one of the ways to doing that (about, rel, rev, property, content, datatype attributes). In #2 we only need a way to identify objects/resources within the document and leave it up to the meta.xml to contain all of the possible information. The main reason why I like #1 is because there's a lot of data already in the document that we would like to avoid duplication, except that I don't believe we can avoid it 100% of the time (e.g. the content issue). I think I failed to separate these two approaches enough on telecons but I hope we can get back on track. Anyways, back to the question: does meta:about/xml:id needs to be mnemonic? My answer is no. Let me show by example: <link about="http://torrez.us/who#elias" rev="dc:creator" href="foo.jpg"/> yields <foo.jpg> dc:creator <http://torrez.us/who#elias> As you can see the about attribute has nothing to do with mnemonics or anything of that nature. It's about uniquely identifying resources in both a closed and open world. The only reason why xml:id came into the discussion is because we want to leverage things already identified in our current documents such as <table table:name="table1">...</table>. In RDFa and other HTML approaches we make use of both @id and @name to locate things within the same document. <meta about="table1" property="dc:creator">Elias Torres</meta> <body> <x /> <y /> <table name="table1"> ... </table> </body> Also, is ODF content/source copy and paste a requirement for our metadata proposal? I didn't think it was. I hope we are not expecting people to hand write ODF (e.g. no need for mnemonics). Mnemonic approach is helpful for the writer, should be recommended, but is and can not requested. I prefer - as already stated - the approach of attribute references between content.xml and metadata, which is not one of your approaches above - not #1 nor #2. In content.xml: ============ .. <text:p meta:class="date"> <text:span meta:class="month">May</text:span> <text:span meta:class="day">8th</text:span> at <text:span meta:class="time">10am</text:span> </text:p> .. [NOTE: I changed meta:id to meta:class to avoid the impression, that meta:id is unique. The naming 'meta:class' is not important for now. And the value of meta:id is just an arbitrary string. But here only provided as mnemonic default string by a brave plugin programmer. ] In meta package: ============ something RDF compatible This is a very simple approach. Everything seems to be accomplished by it, what advantage have #1 or #2? I will explain the linking to meta data more verbose in a different mail as this has little to do with the rest.But how is that it's "mnemonic" intrinsically valuable? I don't think it is.By this it is imaginable that even the implementation of metadata is being exchanged in meta.xml, without a byte changing in the content.xml. Imagine implementations like vcard vs. hcard.By us supporting both #1 and #2 we support people exchanging metadata using different schemas or ontologies.Am not following here. Can you restate?Second someone would like to link to meta data.Again, by using URIs people can link to meta data. If use xml:id="short" we have no way of linking to resources (e.g. external documents or web pages). Drafted in two sentence in general is my hopeful wish in linking the following: I would like to be able to create a link to a document pointing to a certain semantic not to a structure. Like pointing to the node set of all XML nodes having a certain class of meta data like Bruce's citation. Interesting, it worth to discuss this separately.I still don't know what this means.Instead of referencing to a certain structure (e.g. third paragraph of the body) a link to the type of meta data in the package is closer on the desired.BTW, I'm not advocating we reference structure. Referencing structure sucks (e.g. 2nd paragraph, 3rd table after the 1st paragraph, etc). I'd like reference objects/resources.Sorry, again, am not understanding. Been a long day I guess.Although I have no fool proof implementation by hand (XPointer?), would such approach solve the problem of changing structure.Can you explain what you mean by "changing structure"? Is this is the split-paragraph example?I'm not sure how linking to the type of metadata in the package solves the linking problem at all. BTW, I have tried extensively to deal with annotations in Office documents in a product we built for Life Sciences organizations. I wrote plugins for Word, Excel and Powerpoint and just linking to structure never worked. I even tried this on HTML using my own XPointer of implementations but it simply does not work for changing documents. It might for read-only versions of documents, but that's about it. This is not too complicated.This approach might exist aside of new introduced xml:id, which could be generated by the user when the document is ready to publish. xml:id should be stable similar to the API / interface of a software and therefore handled with care. And finally, when our goal is to weave arbitrary metadata into ODF in a most simple, generic way, I was distracted by @content - as Bernd as well before <meta property="cal:dtstart" content="20060508T1000-0500"> May 8th at 10am </meta> There is detailed redundant information in the attribute and as well there is a blob of data.I would like to help us see that it's simply not feasible today to expect all human-entered data to be machine readable. It's definitely the case for dates, one of the most complicated pieces of data we deal in computers today. If I'm not making any sense, think about dates in Japan where they don't use a Gregorian calendar and use Emperors' reign for their years. Anyways, let me try a few more examples to see if "more" data could be extracted from the human-entered text. <span property="amount" content="1000000">one million</span> <span property="dtstart" content="15:05">5 past 3</span> This year's <span property="net" content="-1600000">loss 1.6 million dollars.</span> I hope this is enough to understand that we should not be in the business of removing the @content attribute, except for noting that it might be best to re-use as much as content from the text as possible. Mapping meta data to one another should be a common problem, which is (more or less easily) solved. At least the mapping of a Gregorian calendar to the dates in Japan is quite simple. And mapping the logic from written numbers to the decimal system is no rocket science, either. Anything in particular in mind?Why is this a problem? One is for machines, and one for people.In general I would rather prefer something like: <text:p meta:id="date"> <text:span meta:id="month">May</text:span> <text:span meta:id="day">8th</text:span> at <text:span meta:id="time">10am</text:span> </text:p>No, Svante. That's certainly not how you'd do it in RDFa. The ID is just that: an id that allows one to then associate something else with it (a link, metadata descriptions, etc.). It indicates no semantics at all. Using dumb strings of text for semantics is no more useful than just using styles.I would like to look past a few of the minor issues with Svante's example. I'll stay away from the naming issue, since I hope I've addressed that earlier in my email. However, I would like to note a much more important point that I would like everyone to study closely. Svante, I think you are thinking too much about the structure of the data as opposed to specifying metadata in a very granular way. In our RDFa date example we are NOT focused on the structure. Let me give you an example: <p about="event1"> The party will start at <span property="cal:dtstart" content="2006-12-12T15:05Z">5 past 3 on saturday</span> at <span property="location">my house.</span> </p> In RDF we are not focused on "structure", we are interested in the statements made in the model. In the example above, we have two statements being asserted. <#event1> cal:dtstart "2006-12-12T15:05Z" . <#event1> cal:location "my house" . The statements stand completely on their own and they are not necessarily part of a greater structure. We can even try to solve the really-really-really hard moving content problem. Here we go: <p about="event1"> The party will start at <span property="cal:dtstart" content="2006-12-12T15:05Z">5 past 3 on saturday</span>. </p> <p about="somethingelse"> Some text about something else.... ... and before I forget the party will be at <span about="event1" property="location">my house.</span> </p> If you notice, we moved the content around but in this case we were able to maintain the metadata because the cut completely encompassed the <span> element. There are specific reasons for us wanting to use RDF and RDFa. In RDFa, triples are only when encountered with either a rel,rev or property attribute. This property allows us to make statements about resources in different parts of the document without having to worry about maintaining structure because the model extracted from the content.xml in the "changed" example is isomorphic to the first one. I know there's an equivalent scenario for Svante's scenario if done right (e.g. we use properties ex:month, ex:day, ex:time). However, I think in his case, he meant "date" to denote (let's say the dtstart of the document). This means that month, day and time are dependent on their document location and any movement of the content would absolutely destroy our metadata hopes. I guess we could introduce some "merge" rules to try to solve this, but the higher-level structures, the more complicated the rules can become, think something along the lines of XML diff/merge, all within the same document.or shorter using default namespace (and none for the attribute) as <text:p s="date"> <a s="month">May</a> <a s="day">8th</a> at <a s="time">10am</a> </text:p> By doing so, other software aside of the correct plugin, would have a chance to interpret the data.I think that as you look at the work that Dan Connolly has done in the area. You'll believe me that having this information: <#event1> cal:dtstart "2006-12-12T15:05Z" . <#event1> cal:location "my house" . any plugin can interpret that data and do as it pleases (e.g. convert it to some other format for display). Wiki is a good point, we should keep this up.OK, I think you need to step back and ask what problem are you trying to solve here? It seems to me you want to be able to bind a GUI to data, and then to particular application behavior (which could be a plug-in). E.g. say someone wants to add custom content processing; how do you do that? How does a plug-in know which content and which metadata to deal with? Is that right? If yes, the manifest can also be used for this, as well as some similar typing on custom fields.For example the fall-back plugin of an ODF application (which assist the user in showing / editing meta data, when the correct plugin is not installed / found), would be able to assist the user. Even more when the the possible set of data (e.g. all month) is defined in an embedded grammar,OK, remember, what you are calling an "embedded grammar" is exactly what RDFa provides.further features would be possible. For instance 'auto completion' or 'drop down list' for the content of such a field are thinkable for the future even for the fall-back plugin ( but most likely not in it's first version).Sure, though where the metadata is is not significant; is it? BruceI hope I have given you decent answers/arguments to your questions proposals. I touched on a few of the reasons why we are suggesting RDF to address the requirements of this task force. We are not trying to invent a new unproven mechanism to embed structure in the documents because that would reduce to embedding any XML within ODF XML. We want to focus on the requirements and believe we have a solution for 80% of the requirements in a standards-based proposal. We'll continue showing examples that address the current use cases and requirements. For sake of time, I would like to see any new use cases and requirements added to the wiki and more formal proposals that go along with it, in order to guarantee progress. Best regards,-Elias Svante |
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]