office message

Subject: metadata in new ECMA OXML spec
From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>
To: OpenDocument <office@lists.oasis-open.org>
Date: Mon, 4 Sep 2006 23:59:46 -0400
FWIW, here's what the standard doc metadata looks like in OXML:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cp:coreProperties
    
xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core- 
properties"
   xmlns:dc="http://purl.org/dc/elements/1.1/";  
xmlns:dcterms="http://purl.org/dc/terms/";
   xmlns:dcmitype="http://purl.org/dc/dcmitype/";
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";>
   <dc:title>Title</dc:title>
   <dc:subject/>
   <dc:creator>John Doe</dc:creator>
   <cp:keywords/>
   <dc:description/>
   <cp:lastModifiedBy>doejb</cp:lastModifiedBy>
   <cp:revision>6</cp:revision>
   <dcterms:created  
xsi:type="dcterms:W3CDTF">2006-06-13T14:33:00Z</dcterms:created>
   <dcterms:modified  
xsi:type="dcterms:W3CDTF">2006-06-15T23:42:00Z</dcterms:modified>
</cp:coreProperties>

So:

1) like ODF, they use DC
2) also like ODF, this is VERY close to RDF. But rather than reuse the  
standard, there's some NIH here (RDF has an equivalent mechanism to  
assign a datatype to a property, but they've created their own; more  
below)
3) unlike ODF, they also use Extended DC (dcterms)
4) also unlike ODF, they separate metadata about the document from  
metadata about the application.

Here's an example of the latter:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Properties  
xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended- 
properties"
    
xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/ 
docPropsVTypes">
   <Template>Normal</Template>
   <TotalTime>272</TotalTime>
   <Pages>1</Pages>
   <Words>57</Words>
   <Characters>329</Characters>
   <Application>Microsoft Office Word</Application>
   <DocSecurity>0</DocSecurity>
   <Lines>2</Lines>
   <Paragraphs>1</Paragraphs>
   <ScaleCrop>false</ScaleCrop>
   <HeadingPairs>
     <vt:vector size="2" baseType="variant">
       <vt:variant>
         <vt:lpstr>Title</vt:lpstr>
       </vt:variant>
       <vt:variant>
         <vt:i4>1</vt:i4>
       </vt:variant>
     </vt:vector>
   </HeadingPairs>
   <TitlesOfParts>
     <vt:vector size="1" baseType="lpstr">
       <vt:lpstr/>
     </vt:vector>
   </TitlesOfParts>
   <Company>MU</Company>
   <LinksUpToDate>false</LinksUpToDate>
   <CharactersWithSpaces>385</CharactersWithSpaces>
   <SharedDoc>false</SharedDoc>
   <HyperlinksChanged>false</HyperlinksChanged>
   <AppVersion>12.0000</AppVersion>
</Properties>

I actually like the notion of separating out the metadata like this,  
though it's probably not important enough for us to change. Supporting  
dcterms makes sense though.

Picking up on the the comparison with RDF and the need for a model,  
again, they've invented their own solution for extension. From the spec  
(part 4*):

> 7.4 Variant Types
> Variant types define storage elements for a comprehensive list of data  
> types. These elements serve as the framework for representing and  
> round-tripping complex properties and custom file properties. Each  
> variant type is defined as an element where the element name indicates  
> the type and element value represents the data stored. Variant type  
> elements may contain other variant type elements as child elements.

In other words, these are complex extension properties. The  
vt-namespaced stuff above is this sort of content.

I actually find the spec here quite confusing.

Bruce

*  
<http://www.ecma-international.org/news/TC45_current_work/tc45-2006 
-338.pdf>
Follow-Ups:
- Re: [office] metadata in new ECMA OXML spec
  - From: Patrick Durusau <patrick@durusau.net>