office-metadata message

Subject: Re: [office-metadata] Finding a common proposal..

From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>
To: Svante Schubert <Svante.Schubert@Sun.COM>
Date: Tue, 5 Dec 2006 18:34:15 -0500

On Dec 5, 2006, at 5:55 PM, Svante Schubert wrote:

>> I certainly don't think it's in the scope of our work to reconsider 
>> the requirements. The whole point of that process was to come to a 
>> final agreement.
> These are fine and should not be touched.
> I am just looking for something to weight different design ideas.
> Evaluation might be done by comparing the different low-level 
> requirements or scenarios, which each design fulfills/enables.
> Otherwise how shall we decide, if and how much we should separate 
> content from meta data?

I think our requirements already allow us to decide that. See below ...

>>> Agreed Design Decisions:
>>>   * RDF compatible (is this agreed, any protest?)
>>>
>>> Uncertain Design Decisions:
>>>   * No redundancy by referencing content used as meta data (no
>>>      repetition of data from the content in the meta data)
>>>   * Content.xml should contain all text (content) to be viewed
>>>   * As much meta data as possible (apart of the metadata being shown)
>>> should be stored in a package aside
>>
>> I don't like the "no redundancy" requirement (e.g. in the spec "there 
>> shall be no redundancy") at all. By that logic, the citation field 
>> could not have an author name or date (e.g. in-text content of "(Doe, 
>> 1999)"),
> Indeed, no data blobs should be allowed. When parts of the blob are 
> meta data pieces there is no chance to validate them against the 
> content (aside of parsing the blob). No machine is able to see 
> (easily) if the text is still consistent with the meta data.

Why would the "validation" you note be a requirement for us? I mean, if 
I say "personal names shall be represented with given names 
initialized" then the rendered text will deviate from the metadata. 
It's not only not a bad thing to allow that, but a good thing. Or 
consider a blind user; how do you validate that sound is equivalent to 
text?

If you enforce this no-content restriction, the rendered text will 
still have to be put somewhere. In my case, you'll be forcing me to add 
a property like this to every citation (nevermind the bibliographic 
entry) in the RDF/XML:

	<rdfs:label>(Doe, 1999; Smith, 2004:23)</rdfs:label>

... and for us to say in the spec "RDF/XML metadata must include an 
rdfs:label property to represent the rendered content."

Do you really want to do that? If yes, for what benefit?

>> and I see that kind of restriction as counter-productive. Moreover, 
>> ODF already has many structures which include both presentation and 
>> machine-oriented content (links, fields, etc.).
>>
>> You know my view on the second point. Maybe John's promised medical 
>> example can shed further light here.
> I am looking forward for John's proposal as well. But I strongly 
> advise to clarify basic design decisions in parallel.

OK, so long as we are not adding new requirements (and in your list, 
you are). But see below ...

>> But I'm actually fine with the last point as a best practices design 
>> suggestion (though wouldn't want to try to mandate it in the spec). 
>> In fact, I think it a good idea that metadata in general be stored in 
>> the package.
> It is not sufficient to simply say that you or anybody think it is a 
> good idea to store it in the package.
> Why do you think it is a good idea, what is the improvement by doing 
> so?

Because it fulfills one of our requirements: that the metadata be 
capable of being separately processed, removed, etc.

If I may, let me focus in on this statement of your's:

> By separating meta data as much as possible from the content, we ease 
> the transformation of meta data and encapsulate it into a different 
> stream.

... and perhaps suggest that we might clarify how this relates to the 
4.5 requirement I mention above. It seems to me this might be the crux 
of the matter.

To my mind, the metadata that definitely must belong in the content.xml 
file is that which is associated with some displayed content. It is the 
statements that say this piece of content has this relationship to the 
document; "this is an event I am hosting" or "this a citation" or "this 
is a medical diagnosis."

Everything else can be put in the package.

But let's be clear: when you move the metadata out of the content, you 
lose other functionality; for example, the ability for the user to be 
able to access specific chunks of content as metadata-enhanced; imagine 
being able to hover over a span of metadata-enhanced text that 
represents a client -- "Jane Doe" -- and having the application display 
additional information about them.

Also, I take it Elias would argue that from an API or even user 
perspective, it really shouldn't matter where the metadata is. One adds 
statements to either to the document or to content within the document 
and then accesses them. Where they're serialized won't matter much 
(though will to external tools like XSLT and such).

Bruce

Follow-Ups:
- Re: [office-metadata] Finding a common proposal..
  - From: Patrick Durusau <patrick@durusau.net>
- Re: [office-metadata] Finding a common proposal..
  - From: Svante Schubert <Svante.Schubert@Sun.COM>
- Re: [office-metadata] Finding a common proposal..
  - From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>

References:
- Finding a common proposal..
  - From: Svante Schubert <Svante.Schubert@Sun.COM>
- Re: [office-metadata] Finding a common proposal..
  - From: "Bruce D'Arcus" <bdarcus@gmail.com>
- Re: [office-metadata] Finding a common proposal..
  - From: Svante Schubert <Svante.Schubert@Sun.COM>