office-metadata message

Subject: Re: [office-metadata] xml:id and attributes
From: Patrick Durusau <patrick@durusau.net>
To: Bruce D'Arcus <bdarcus@gmail.com>
Date: Wed, 13 Dec 2006 10:04:14 -0500
Bruce,

Bruce D'Arcus wrote:

>
> On Dec 13, 2006, at 8:58 AM, Patrick Durusau wrote:
>
> ...
>
>>> It does, with the caveat, of course, that you are only able to label 
>>> resources within the content, and that those resources are about the 
>>> document proper (no referencing of external resources as subjects 
>>> within the document).
>>>
>>> In other words, you do lose granularity with this approach.
>>>
>> Sorry, you lose me with the loss of granularity argument.
>>
>> I think we are both assuming that some markup is going to be placed 
>> in content.xml to delimit the text that is the focus (to avoid saying 
>> subject) of the metadata. Yes?
>
>
> Yes.
>
>> Suppose that I want to attach metadata to some arbitrary part of text 
>> in a <p> element. If I use <span>someText</span> to enclose it, how 
>> have I lost granularity?
>
>
> Because that is not identified as a property (a patient, a drug, a 
> title, author, etc.). The xml:id only identifies the resource (say a 
> diagnosis).
>
Ah, I was afraid we were completely missing each other.

No, xml:id only provides the linking mechanism to the metadata.

But, at the metadata you can identify the property (a patient, a drug, a 
title, author, etc.) and you can say whatever you like.

Note that I am not objecting to the use of RDF in the metadata but 
trying to separate out the issue of linking from metadata.

I think confusing xml models with RDF/RDFa is a serious mistake.

>>> Yes, this is the granularity I mentioned. For example, say John has 
>>> some references to prescriptions, diagnoses, and patients in his 
>>> document. It might be valuable for him to right-click on one of 
>>> those pieces of content (the patient "Jane Doe") and be able to get 
>>> additional metadata about them.
>>>
>>> I'm not really sure how you'd do that without the inline 
>>> information?  This is the critical question, really. How would you 
>>> propose that would work Patrick?
>>>
>> Just as having notes in a separate file works. There is a marker in 
>> content.xml that is the "target" of the metadata held in a separate 
>> file, just as we do notes now.
>>
>> That the metadata is held in a separate file is an implementation 
>> detail and not something that would be apparent to the user.
>
>
> I'm saying you have to be specific here. The xml:id (or meta:about) 
> approach allows you to attach properties to resources, but you cannot 
> attach properties to properties. And a lot of the granular content 
> that John has been interested in is exactly that sort of content.
>
> For comparison, think about objects in OO programming. The xml:id 
> approach is like an object identifier, which then allows you to 
> reference it. It makes no sense to add an identifier to an attribute; 
> you can't do it. The same here.
>
Does my separation of linking from the metadata help here?

All I was proposing for xml:id was to be the "tie that binds" ;-) to the 
metadata. You can do whatever you like, well, within the metadata model 
we specify, such as properties for properties, but in the metadata.

>>>> Moreover, xml:id supports our use case of separation of metadata 
>>>> from the content.xml file.
>>>>
>>>> Note that I would suggest that we only use this mechanism for 
>>>> adding metadata to elements.
>>>
>>>
>>>
>>> So you are advocating option 1 above. That's fine. Let's discuss the 
>>> compromises in the call.
>>>
>>>> Consider John's use cases again: We want to sweep all the files for 
>>>> metadata added by physicians. If we allow users to choose where 
>>>> that metadata is going to be located, the search might have to 
>>>> occur in two locations: content.xml and the meta file. Really more 
>>>> efficient to simply decide that the metadata will *always* be in 
>>>> the meta file, which avoids the overhead of processing content.xml 
>>>> and provides an opportunity to write software specifically for that 
>>>> purpose.
>>>
>>>
>>>
>>> The way I think of it, inline metadata is by definition about 
>>> visible content, so analogous to styles. That ought to not to be 
>>> required to be removed.
>>>
>> Hmmm, but styles are removed, yes? Stored only in styles?
>
>
> I mean the tagged content.
>
>> Not really sure about your visible/invisible argument. Why should 
>> presentation make a difference in terms of the internal structure of 
>> the document?
>
>
> Because if it's visible, why should a user care if the metadata is 
> stripped? They've already exposed it.
>
>>> But certainly invisible metadata ought to be, and so ought to be 
>>> separate from the content.
>>>
>>> Also, I think the citation case (the field) shows the value of some 
>>> metadata being in-content.
>>>
>> Can you be a bit more explicit? Is this the issue of entering 
>> information twice?
>>
>> If so, that to me is again a question of presentation. Whether the 
>> citation content is stored as metadata and "presented" to the user as 
>> inline content or no, really is a presentation issue. The user only 
>> has to enter it once and get the display they want. What more could 
>> they ask?
>
>
> I updated the example page to have this:
>
>   <text:structured-field 
> meta:class="http://opendocument.xml.org/fields/citation";>
>     <text:link
>          meta:property="http://purl.org/dc/elements/1.1/source";
>          meta:resource="urn:isbn:23980912"/>
>     <text:link
>          meta:property="http://purl.org/dc/elements/1.1/source";
>          meta:resource="urn:isbn:92130926" cite:pages="23"/>
>     <text:body>(Doe, 1999: 23; Smith, 2000)</text:body>
>   </text:structured-field>
>
> My logic is:
>
> Those property links are intrinsic to the citation, and a user needs 
> to be able to copy-and-paste these intact across documents. It's also 
> important that the method for encoding this information be carefully 
> documented, and preferably validated in the schema,
>
> That's not to say that can't be stored in the package (per previous), 
> but it seems awfully fragile to me?
>
Not any more fragile that what you propose.

Would mean that a copy operation is defined as including any metadata 
about the content being copied. Likely to be meaningful only when 
pasting to an ODF conformant application or a Word application that can 
import ODF content.

Having it inline doesn't add or subtract from robustness. I suppose you 
could say it does with XHTML but that is a red herring. We are not 
obligated to support pasting into XHMTL documents.

>>> So I don't think there's anything yet that logically excludes option 2.
>>>
>> Well, no, but I wasn't arguing that "logic" excludes option 2. And 
>> while not an implementation requirement or even a "new" requirement, 
>> I do think we should consider the impact of our proposals on 
>> processing of ODF documents.
>
>
> Certainly.
>
>> Having mixed models for metadata (some inline, some in meta.xml) 
>> seems to be a particularly bad choice to me. Granted we may encounter 
>> cases where that cannot be avoided but I haven't seen one, yet.
>
>
> Hopefully my explanation above shows it. The RDFa examples do too.
>
>> I was suggesting some form of xml:id as a way to avoid messing around 
>> with adding attributes to content.xml whatever metadata we want to 
>> associate with elements or content. Just seems like a clean solution 
>> to me. Opinions will no doubt differ. ;-)
>
>
> As I said, it *is* a clean solution, but it's also more limited. The 
> RDFa attributes aren't some random accident (except for maybe rel and 
> rev); they are there because they need to be for their use cases and 
> requirements.
>
I haven't said you could not have them, but have suggested they not be 
in content. I will pass over the notion of accident in favor of 
resolving issues we are more likely to agree upon. ;-)

>> I realize that RDFa is supposed to make it easy to use RDF with XHTML 
>> but I don't think models for authoring XHTML are really relevant for 
>> ODF applications. Different format and different considerations in play.
>
>
> Umm, yes and no. The only significant difference between the formats 
> is ODF is compound. But you still can't present users information if 
> it's not there.
>
What do you mean by "not there?" In what sense is information stored 
separately from content.xml not "there?" It certainly seems to be 
judging from the other cases where that happens in ODF.

Hope you are having a great day!

Patrick

> Bruce
>
>
>
>

-- 
Patrick Durusau
Patrick@Durusau.net
Chair, V1 - Text Processing: Office and Publishing Systems Interface
Co-Editor, ISO 13250, Topic Maps -- Reference Model
Member, Text Encoding Initiative Board of Directors, 2003-2005

Topic Maps: Human, not artificial, intelligence at work!
References:
- xml:id and attributes
  - From: Patrick Durusau <patrick@durusau.net>
- Re: [office-metadata] xml:id and attributes
  - From: Bruce D'Arcus <bdarcus@gmail.com>
- Re: [office-metadata] xml:id and attributes
  - From: Patrick Durusau <patrick@durusau.net>
- Re: [office-metadata] xml:id and attributes
  - From: Bruce D'Arcus <bdarcus@gmail.com>