office-metadata message

Subject: Re: [office-metadata] Finding a common proposal..

From: Svante Schubert <Svante.Schubert@Sun.COM>
To: "Bruce D'Arcus" <bdarcus@gmail.com>
Date: Tue, 05 Dec 2006 23:55:36 +0100

Hi Bruce,

Bruce D'Arcus wrote:
>
> Hi Svante,
>
> On Dec 5, 2006, at 4:24 PM, Svante Schubert wrote:
>
>> Low-level Requirements:
>
> Just a reminder: the TC has already agreed to list of requirements:
>
>     <http://www.oasis-open.org/committees/download.php/20493/UCR.pdf>
>
> I certainly don't think it's in the scope of our work to reconsider 
> the requirements. The whole point of that process was to come to a 
> final agreement.
These are fine and should not be touched.
I am just looking for something to weight different design ideas.
Evaluation might be done by comparing the different low-level 
requirements or scenarios, which each design fulfills/enables.
Otherwise how shall we decide, if and how much we should separate 
content from meta data?

>> Agreed Design Decisions:
>>   * RDF compatible (is this agreed, any protest?)
>>
>> Uncertain Design Decisions:
>>   * No redundancy by referencing content used as meta data (no
>>      repetition of data from the content in the meta data)
>>   * Content.xml should contain all text (content) to be viewed
>>   * As much meta data as possible (apart of the metadata being shown)
>> should be stored in a package aside
>
> I don't like the "no redundancy" requirement (e.g. in the spec "there 
> shall be no redundancy") at all. By that logic, the citation field 
> could not have an author name or date (e.g. in-text content of "(Doe, 
> 1999)"), 
Indeed, no data blobs should be allowed. When parts of the blob are meta 
data pieces there is no chance to validate them against the content 
(aside of parsing the blob). No machine is able to see (easily) if the 
text is still consistent with the meta data.
> and I see that kind of restriction as counter-productive. Moreover, 
> ODF already has many structures which include both presentation and 
> machine-oriented content (links, fields, etc.).
>
> You know my view on the second point. Maybe John's promised medical 
> example can shed further light here.
I am looking forward for John's proposal as well. But I strongly advise 
to clarify basic design decisions in parallel.
>
> But I'm actually fine with the last point as a best practices design 
> suggestion (though wouldn't want to try to mandate it in the spec). In 
> fact, I think it a good idea that metadata in general be stored in the 
> package.
It is not sufficient to simply say that you or anybody think it is a 
good idea to store it in the package.
Why do you think it is a good idea, what is the improvement by doing so?
What scenario (low-level requirement) is satisfied by your design?
Is it more worth than the scenario (low-level requirement) being 
fulfilled by a different approach as merging meta-data with content?

>
> So it seems to me the debate here is NO metadata in content vs. SOME 
> metadata in content.
Yes, this is currently the major design decision we are arguing about.
My point is that there can be million versions of meta data for the same 
semantic.
Transforming meta data into a different grammar will be a common scenario.
By separating meta data as much as possible from the content, we ease 
the transformation of meta data and encapsulate it into a different stream.


Regards,
Svante

Follow-Ups:
- Re: [office-metadata] Finding a common proposal..
  - From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>

References:
- Finding a common proposal..
  - From: Svante Schubert <Svante.Schubert@Sun.COM>
- Re: [office-metadata] Finding a common proposal..
  - From: "Bruce D'Arcus" <bdarcus@gmail.com>