office message

Subject: Metadata: Sample use cases and classification

I'd also like to offer a "thank you" to Patrick Durusau
for chairing the metadata subcommittee.

Since the first step is to identify use cases and classifications,
here are a few use cases and a trivial metadata classification
which I offer to the metadata subcommittee. I hope this email
may be useful to the subcommittee as it starts out.

First, the MOST IMPORTANT use case to me -- and possibly to
many others -- is to greatly improve handling references &
citations.  I want to have automatically generated bibliographies
based on what I happen to cite, in whatever the format du jour is.
Current implementations get tantalizingly close, yet don't quite
make it.  This isn't rocket science; there are existing systems
that do this (bibtex, etc.).  Yet having this capability would
put OpenDocument systems FAR beyond what current typical
word processors can do, and would a "killer app"-type reason for
many people (particularly in research and academe) to switch.

Another use case is to allow far more information about the
document itself to be stored inside the document; already we
have author, etc.  I believe Dublin Core is already supported
completely; if not, that'd make sense.  ISBN and ISSN numbers
would be sensible to store.  HOWEVER, I think this should NOT
be our initial focus at this time.  Twelve years ago I would
have said this was the most critical kind of metadata, since
finding stuff used to be so hard.  But Google and other search
engines have become so good at finding things that the need
for self-identifying metadata simply isn't as pressing as
other needs.  If someone wants to work further in this area,
well and good, but don't let that impede in any way work on
improving automatically-generated bibliographies.

Another use case, though far less important I think,
would be "security classification" of paragraphs and headings.
Some government documents have paragraphs of different
classifications (Unclassified, For Official Use Only,
Confidential, Secret, Top Secret) with possible categories as well
(e.g., REL UK=Releasable to UK, WEIRDNAME=only people with the right to
know about WEIRDNAME stuff can see it).  Then you can do stuff
like "only show unclassified material", etc.  Usually each
paragraph is marked at the end, e.g., "(U)", and each heading
at the beginning, e.g., "(S)".  This use is specialized,
but it's useful as an example of a different kind of metadata.

As far as classifications of metadata go, I can see two major
categories of metadata, each subdivided into 2 major
* Data about THIS document
   + Data about this document as a whole (author, etc.).
     This is useful for aiding search.
   + Data about specific sections of this document
     (security classifications, etc.).
* Data about ANOTHER document
   + Data about another document as a whole (author, etc.).
     The bibliography/citation stuff is an example.
   + Data about a PIECE of another document
     (e.g., XLink etc. to allow selection or transclusion
     of a piece of another document).

Since the above is a classification of metadata, I could
tongue-in-cheek call the above my metametadata, and if different people
have different approaches to classifying metadata, we may need
to classify the classifications (creating a meta-meta-metadata).
Yes, I'm abusing the terminology, but I couldn't resist :-).

More seriously, I envision these sorts of proposals to be
discussed and worked out inside the metadata subcommittee.
Hopefully this email will help things get started. I also think
that this shows that the metadata subcommittee charter is a
sensible one, since it IS possible to identify use cases and

--- David A. Wheeler

