office-metadata message

Subject: Re: [office-metadata] xml:id and attributes
From: Bruce D'Arcus <bdarcus@gmail.com>
To: patrick@durusau.net
Date: Wed, 13 Dec 2006 09:34:11 -0500

On Dec 13, 2006, at 8:58 AM, Patrick Durusau wrote:

...

>> It does, with the caveat, of course, that you are only able to label 
>> resources within the content, and that those resources are about the 
>> document proper (no referencing of external resources as subjects 
>> within the document).
>>
>> In other words, you do lose granularity with this approach.
>>
> Sorry, you lose me with the loss of granularity argument.
>
> I think we are both assuming that some markup is going to be placed in 
> content.xml to delimit the text that is the focus (to avoid saying 
> subject) of the metadata. Yes?

Yes.

> Suppose that I want to attach metadata to some arbitrary part of text 
> in a <p> element. If I use <span>someText</span> to enclose it, how 
> have I lost granularity?

Because that is not identified as a property (a patient, a drug, a 
title, author, etc.). The xml:id only identifies the resource (say a 
diagnosis).

>> Yes, this is the granularity I mentioned. For example, say John has 
>> some references to prescriptions, diagnoses, and patients in his 
>> document. It might be valuable for him to right-click on one of those 
>> pieces of content (the patient "Jane Doe") and be able to get 
>> additional metadata about them.
>>
>> I'm not really sure how you'd do that without the inline information? 
>>  This is the critical question, really. How would you propose that 
>> would work Patrick?
>>
> Just as having notes in a separate file works. There is a marker in 
> content.xml that is the "target" of the metadata held in a separate 
> file, just as we do notes now.
>
> That the metadata is held in a separate file is an implementation 
> detail and not something that would be apparent to the user.

I'm saying you have to be specific here. The xml:id (or meta:about) 
approach allows you to attach properties to resources, but you cannot 
attach properties to properties. And a lot of the granular content that 
John has been interested in is exactly that sort of content.

For comparison, think about objects in OO programming. The xml:id 
approach is like an object identifier, which then allows you to 
reference it. It makes no sense to add an identifier to an attribute; 
you can't do it. The same here.

>>> Moreover, xml:id supports our use case of separation of metadata 
>>> from the content.xml file.
>>>
>>> Note that I would suggest that we only use this mechanism for adding 
>>> metadata to elements.
>>
>>
>> So you are advocating option 1 above. That's fine. Let's discuss the 
>> compromises in the call.
>>
>>> Consider John's use cases again: We want to sweep all the files for 
>>> metadata added by physicians. If we allow users to choose where that 
>>> metadata is going to be located, the search might have to occur in 
>>> two locations: content.xml and the meta file. Really more efficient 
>>> to simply decide that the metadata will *always* be in the meta 
>>> file, which avoids the overhead of processing content.xml and 
>>> provides an opportunity to write software specifically for that 
>>> purpose.
>>
>>
>> The way I think of it, inline metadata is by definition about visible 
>> content, so analogous to styles. That ought to not to be required to 
>> be removed.
>>
> Hmmm, but styles are removed, yes? Stored only in styles?

I mean the tagged content.

> Not really sure about your visible/invisible argument. Why should 
> presentation make a difference in terms of the internal structure of 
> the document?

Because if it's visible, why should a user care if the metadata is 
stripped? They've already exposed it.

>> But certainly invisible metadata ought to be, and so ought to be 
>> separate from the content.
>>
>> Also, I think the citation case (the field) shows the value of some 
>> metadata being in-content.
>>
> Can you be a bit more explicit? Is this the issue of entering 
> information twice?
>
> If so, that to me is again a question of presentation. Whether the 
> citation content is stored as metadata and "presented" to the user as 
> inline content or no, really is a presentation issue. The user only 
> has to enter it once and get the display they want. What more could 
> they ask?

I updated the example page to have this:

   <text:structured-field 
meta:class="http://opendocument.xml.org/fields/citation";>
     <text:link
          meta:property="http://purl.org/dc/elements/1.1/source";
          meta:resource="urn:isbn:23980912"/>
     <text:link
          meta:property="http://purl.org/dc/elements/1.1/source";
          meta:resource="urn:isbn:92130926" cite:pages="23"/>
     <text:body>(Doe, 1999: 23; Smith, 2000)</text:body>
   </text:structured-field>

My logic is:

Those property links are intrinsic to the citation, and a user needs to 
be able to copy-and-paste these intact across documents. It's also 
important that the method for encoding this information be carefully 
documented, and preferably validated in the schema,

That's not to say that can't be stored in the package (per previous), 
but it seems awfully fragile to me?

>> So I don't think there's anything yet that logically excludes option 
>> 2.
>>
> Well, no, but I wasn't arguing that "logic" excludes option 2. And 
> while not an implementation requirement or even a "new" requirement, I 
> do think we should consider the impact of our proposals on processing 
> of ODF documents.

Certainly.

> Having mixed models for metadata (some inline, some in meta.xml) seems 
> to be a particularly bad choice to me. Granted we may encounter cases 
> where that cannot be avoided but I haven't seen one, yet.

Hopefully my explanation above shows it. The RDFa examples do too.

> I was suggesting some form of xml:id as a way to avoid messing around 
> with adding attributes to content.xml whatever metadata we want to 
> associate with elements or content. Just seems like a clean solution 
> to me. Opinions will no doubt differ. ;-)

As I said, it *is* a clean solution, but it's also more limited. The 
RDFa attributes aren't some random accident (except for maybe rel and 
rev); they are there because they need to be for their use cases and 
requirements.

> I realize that RDFa is supposed to make it easy to use RDF with XHTML 
> but I don't think models for authoring XHTML are really relevant for 
> ODF applications. Different format and different considerations in 
> play.

Umm, yes and no. The only significant difference between the formats is 
ODF is compound. But you still can't present users information if it's 
not there.

Bruce
Follow-Ups:
- Re: [office-metadata] xml:id and attributes
  - From: Patrick Durusau <patrick@durusau.net>
References:
- xml:id and attributes
  - From: Patrick Durusau <patrick@durusau.net>
- Re: [office-metadata] xml:id and attributes
  - From: Bruce D'Arcus <bdarcus@gmail.com>
- Re: [office-metadata] xml:id and attributes
  - From: Patrick Durusau <patrick@durusau.net>