office-metadata message

Subject: use case plan?
From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>
To: office-metadata <office-metadata@lists.oasis-open.org>
Date: Fri, 9 Jun 2006 11:04:42 -0400

It's been a month since I left for South America, with no visible 
progress since. So what's our plan to finish the use cases? And when's 
the next conference call?

I've tried to assemble most of the wiki use cases into a single 
document, which can then be converted to HTML.  See below ... it'll 
need editing.

Bruce

# Introduction



# Use Cases

## Enhanced Search

### Overview

The most pressing problem in data mining, web searching, however you
want to term the problem, is that data is basically very dumb. If I
search for job on Google, I am going to get results that include job
as in employment, as well as Job, as in the Book of Job.

While enormous strides have been made in any number of automated
techniques for mining data, the fact remains that current results are
far from ideal. Surely the originators of data meant something when
they originated the data. So why not give them the ability to say
what they meant?

### Scenario

A genetics researcher for example, is writing a paper and wants to
use a name that is common between the mouse and human genomes. But it
is a lot of trouble to mark each term in the document. If they were
able to declare a vocabulary for the document, that is what a
particular word or words mean in the document, any search engine
could interpret those words to have particular meanings. If a user
has the ability to declare what is meant by words in the document,
without the labor of annotating individual words, the document could
provide rich metadata for searching/indexing of the documents.

With the vocabulary of the document specified by metadata, search
engines can distinguish between terms based upon information
specified by the creator of the document.

## Bibliographies and Citations

### Overview

Most textual document include references to content from elsewhere.
That referenced content might be quoted excerpts, data summaries, or
paraphrased findings or conclusions. In fields where attribution of
such referenced content is essential, such as law and academic
research, citations and reference lists associate referenced document
content with their source. And yet, formatted reference lists
typically represent a subset of the source metadata, and may need to
be reformatted for different audiences. In this sense, citations and
reference list items can be understood as dynamic text fields whose
content is generated from linked metadata descriptions.

It would therefore significantly enhance the possibility for user
collaboration and application interoperability to have a standard
metadata infrastructure. Likewise, bibliographic metadata is more
complex than the simple document metadata commonly found in
productivity applications, which is often just a series of key/
values. Consider a simple example of a journal article, which
involves relations between a document and a periodical, one or more
people who author that document, and so forth. Beyond
standardization, then, it is important to have a metadata approach
that can support that sort of richer description.

### Scenario

Three users collaborate on a paper, each using different OpenDocument-
compatible applications.

As they write the paper and add citations, the citations and
bibliography are automatically generated from the embedded metadata.
Because the metadata is embedded, it's also portable. When the users
pass the document around, the logic is always there so that the
formatting can be regenerated. And because the metadata is based on a
standard model, it would also facilitate interoperability between
different third-party bibliographic applications.

When authors finish paper, they send it to a publisher, who can
extract the metadata and make it available to search engines and
journal providers. A standard metadata model also allows the
publisher to regenerate the citations in a variety of standard styles
(such as MLA, APA, Chicago).


## Intellectual Property

### Overview

Published documents of all kinds often include content from
elsewhere: images, data, and so forth. This content typically has
rights information associated with it. Yet currently managing such
information is a manual task. An author or production editor must
obtain the file(s), and separately manage the rights information. In
turn, they must manually add such information to the published text
in the form of a captions with copyright information and so forth.
This can be both tedious and error prone. Allowing such metadata to
be attached to such content would allow for more automated solutions.

### Scenario

A government agency prepares a report that includes summary tables of
data acquired from a third party. The document author embeds the
table data in the document, and captions—including copyright and
source information—are automatically generated.

### Scenario 2

A student includes a Creative Commons-licensed photograph in their
report. The license and attribution are automatically extracted from
the image metadata by the application and appended to the image caption.

## Content Tagging

### Overview

Allow the tagging of OpenDocument document objects like e.g.
paragraphs, words, figures, etc. with meta data.

### Scenario

For example consider an OpenDocument text document, where a paragraph
is marked as important; or a figure, which is tagged with information
about the copyright owner. Objects which should be able to serve as a
tag anchor are:

  * spans
  * paragraphs
  * figures
  * tables

### Scenario 2

In legal publishing (and presumably other domains) it is quite common
to take an existing document (usually published legislation) and
manually tag it with semantic information. It is generally critical
that the presentation be preserved exactly for legal reasons.

For example, a paragraph or series of paragraphs may constitute a
legal definition of a term. A span of text may actually be a cross-
reference within the same legislation, a reference to case law, or an
amendment to another act.

Tagging is often a precursor to transforming a document into a domain-
specific format.

In the more general sense, activities such as indexing and cross-
referencing can be considered content tagging and should probably use
the same mechanism.

# Realtime Collaborative Editing

### Overview

The main idea is expressed in [http://en.wikipedia.org/wiki/
Collaborative_real-time_editor Wikipedia's]  writeup on the topic.

For metadata, we need to keep the ramifications of this in mind. The
same content (word, paragraph, page), may receive multiple instances
of the same metadata element, each from a different author.

One practical ramification of this may be that metadata will always
need to be expressed as XML elements, not as XML attributes, since
you cannot have multiple instances of the same attribute on the same
element.


## Workflow Management


## Roundtrip improvement

### Overview

Use the meta data mechanism to preserve "roundtrip information" from
alien formats.

### Scenario

Consider you have a specialized XML format which should be converted
to OpenDocument and back without the loss of information, i.e.
"roundtripping". Since not all information can be directly converted
to OpenDocument objects meta data could be used, to store the
additional information, such that the roundtrip succeeds.

## Extrinsic metadata


## Asymmetric metadata


## Automatically generated metadata


## Metadata templates


## Security metadata

### Overview

Users often have permissions to see only parts of documents. If a
document is stored on a secure network server, metadata attached to
portions of a document could be used by an application to simply not
render those portions for a user without the required authorizations.
If saving other than to the server is disabled, users with varying
permissions can work on parts of a document they are authorized to
view while the remainder of the document is concealed.

### Scenario

Classification officers or those charged with such responsibilities
in military and governmental offices must often decide what parts of
documents can be released and that varies according to a complex set
of conditions. And those conditions can change. If metadata could be
affixed to a document according to security levels (developed outside
of ODF) that would fit into the current needs of such classification
activities. (Military/governmental)

### Scenario 2

Commercial enterprises often have documents that may contain
sensitive personnel, marketing or legal information, while portions
of the document need to be processed by staff without the required
permissions. Metadata based security for ODF would enable the
construction of applications that can use ODF in its native format
(no additional features required) to meet the security needs of
commercial enterprises as well. (commercial)

### Scenario 3

Consumers may have similar issues but absent proper network and
server management, will need different capabilities to secure
portions of documents. But, the same security metadata could support
applications that selectively encrypt portions of an ODF document
(PCDATA). The encryption aspects are beyond the scope of ODF, but the
availability of metadata security information would support the
development of such applications. (consumers)
Follow-Ups:
- Re: [office-metadata] use case plan?
  - From: Patrick Durusau <patrick@durusau.net>