office-metadata message

Subject: Re: [office-metadata] use case plan?
From: Patrick Durusau <patrick@durusau.net>
To: Bruce D'Arcus <bruce.darcus@OpenDocument.us>
Date: Mon, 12 Jun 2006 06:32:00 -0400
Bruce,

Looks like a very good start to me!

+1 on an HTML version but I would also like to have an ODF version as well.

Hope you are at the start of a great week!

Patrick

Bruce D'Arcus wrote:

>
> It's been a month since I left for South America, with no visible 
> progress since. So what's our plan to finish the use cases? And when's 
> the next conference call?
>
> I've tried to assemble most of the wiki use cases into a single 
> document, which can then be converted to HTML.  See below ... it'll 
> need editing.
>
> Bruce
>
> # Introduction
>
>
>
> # Use Cases
>
> ## Enhanced Search
>
> ### Overview
>
> The most pressing problem in data mining, web searching, however you
> want to term the problem, is that data is basically very dumb. If I
> search for job on Google, I am going to get results that include job
> as in employment, as well as Job, as in the Book of Job.
>
> While enormous strides have been made in any number of automated
> techniques for mining data, the fact remains that current results are
> far from ideal. Surely the originators of data meant something when
> they originated the data. So why not give them the ability to say
> what they meant?
>
> ### Scenario
>
> A genetics researcher for example, is writing a paper and wants to
> use a name that is common between the mouse and human genomes. But it
> is a lot of trouble to mark each term in the document. If they were
> able to declare a vocabulary for the document, that is what a
> particular word or words mean in the document, any search engine
> could interpret those words to have particular meanings. If a user
> has the ability to declare what is meant by words in the document,
> without the labor of annotating individual words, the document could
> provide rich metadata for searching/indexing of the documents.
>
> With the vocabulary of the document specified by metadata, search
> engines can distinguish between terms based upon information
> specified by the creator of the document.
>
> ## Bibliographies and Citations
>
> ### Overview
>
> Most textual document include references to content from elsewhere.
> That referenced content might be quoted excerpts, data summaries, or
> paraphrased findings or conclusions. In fields where attribution of
> such referenced content is essential, such as law and academic
> research, citations and reference lists associate referenced document
> content with their source. And yet, formatted reference lists
> typically represent a subset of the source metadata, and may need to
> be reformatted for different audiences. In this sense, citations and
> reference list items can be understood as dynamic text fields whose
> content is generated from linked metadata descriptions.
>
> It would therefore significantly enhance the possibility for user
> collaboration and application interoperability to have a standard
> metadata infrastructure. Likewise, bibliographic metadata is more
> complex than the simple document metadata commonly found in
> productivity applications, which is often just a series of key/
> values. Consider a simple example of a journal article, which
> involves relations between a document and a periodical, one or more
> people who author that document, and so forth. Beyond
> standardization, then, it is important to have a metadata approach
> that can support that sort of richer description.
>
> ### Scenario
>
> Three users collaborate on a paper, each using different OpenDocument-
> compatible applications.
>
> As they write the paper and add citations, the citations and
> bibliography are automatically generated from the embedded metadata.
> Because the metadata is embedded, it's also portable. When the users
> pass the document around, the logic is always there so that the
> formatting can be regenerated. And because the metadata is based on a
> standard model, it would also facilitate interoperability between
> different third-party bibliographic applications.
>
> When authors finish paper, they send it to a publisher, who can
> extract the metadata and make it available to search engines and
> journal providers. A standard metadata model also allows the
> publisher to regenerate the citations in a variety of standard styles
> (such as MLA, APA, Chicago).
>
>
> ## Intellectual Property
>
> ### Overview
>
> Published documents of all kinds often include content from
> elsewhere: images, data, and so forth. This content typically has
> rights information associated with it. Yet currently managing such
> information is a manual task. An author or production editor must
> obtain the file(s), and separately manage the rights information. In
> turn, they must manually add such information to the published text
> in the form of a captions with copyright information and so forth.
> This can be both tedious and error prone. Allowing such metadata to
> be attached to such content would allow for more automated solutions.
>
> ### Scenario
>
> A government agency prepares a report that includes summary tables of
> data acquired from a third party. The document author embeds the
> table data in the document, and captions—including copyright and
> source information—are automatically generated.
>
> ### Scenario 2
>
> A student includes a Creative Commons-licensed photograph in their
> report. The license and attribution are automatically extracted from
> the image metadata by the application and appended to the image caption.
>
> ## Content Tagging
>
> ### Overview
>
> Allow the tagging of OpenDocument document objects like e.g.
> paragraphs, words, figures, etc. with meta data.
>
> ### Scenario
>
> For example consider an OpenDocument text document, where a paragraph
> is marked as important; or a figure, which is tagged with information
> about the copyright owner. Objects which should be able to serve as a
> tag anchor are:
>
>  * spans
>  * paragraphs
>  * figures
>  * tables
>
> ### Scenario 2
>
> In legal publishing (and presumably other domains) it is quite common
> to take an existing document (usually published legislation) and
> manually tag it with semantic information. It is generally critical
> that the presentation be preserved exactly for legal reasons.
>
> For example, a paragraph or series of paragraphs may constitute a
> legal definition of a term. A span of text may actually be a cross-
> reference within the same legislation, a reference to case law, or an
> amendment to another act.
>
> Tagging is often a precursor to transforming a document into a domain-
> specific format.
>
> In the more general sense, activities such as indexing and cross-
> referencing can be considered content tagging and should probably use
> the same mechanism.
>
> # Realtime Collaborative Editing
>
> ### Overview
>
> The main idea is expressed in [http://en.wikipedia.org/wiki/
> Collaborative_real-time_editor Wikipedia's]  writeup on the topic.
>
> For metadata, we need to keep the ramifications of this in mind. The
> same content (word, paragraph, page), may receive multiple instances
> of the same metadata element, each from a different author.
>
> One practical ramification of this may be that metadata will always
> need to be expressed as XML elements, not as XML attributes, since
> you cannot have multiple instances of the same attribute on the same
> element.
>
>
> ## Workflow Management
>
>
> ## Roundtrip improvement
>
> ### Overview
>
> Use the meta data mechanism to preserve "roundtrip information" from
> alien formats.
>
> ### Scenario
>
> Consider you have a specialized XML format which should be converted
> to OpenDocument and back without the loss of information, i.e.
> "roundtripping". Since not all information can be directly converted
> to OpenDocument objects meta data could be used, to store the
> additional information, such that the roundtrip succeeds.
>
> ## Extrinsic metadata
>
>
> ## Asymmetric metadata
>
>
> ## Automatically generated metadata
>
>
> ## Metadata templates
>
>
> ## Security metadata
>
> ### Overview
>
> Users often have permissions to see only parts of documents. If a
> document is stored on a secure network server, metadata attached to
> portions of a document could be used by an application to simply not
> render those portions for a user without the required authorizations.
> If saving other than to the server is disabled, users with varying
> permissions can work on parts of a document they are authorized to
> view while the remainder of the document is concealed.
>
> ### Scenario
>
> Classification officers or those charged with such responsibilities
> in military and governmental offices must often decide what parts of
> documents can be released and that varies according to a complex set
> of conditions. And those conditions can change. If metadata could be
> affixed to a document according to security levels (developed outside
> of ODF) that would fit into the current needs of such classification
> activities. (Military/governmental)
>
> ### Scenario 2
>
> Commercial enterprises often have documents that may contain
> sensitive personnel, marketing or legal information, while portions
> of the document need to be processed by staff without the required
> permissions. Metadata based security for ODF would enable the
> construction of applications that can use ODF in its native format
> (no additional features required) to meet the security needs of
> commercial enterprises as well. (commercial)
>
> ### Scenario 3
>
> Consumers may have similar issues but absent proper network and
> server management, will need different capabilities to secure
> portions of documents. But, the same security metadata could support
> applications that selectively encrypt portions of an ODF document
> (PCDATA). The encryption aspects are beyond the scope of ODF, but the
> availability of metadata security information would support the
> development of such applications. (consumers)
>
>
>
>

-- 
Patrick Durusau
Patrick@Durusau.net
Chair, V1 - Text Processing: Office and Publishing Systems Interface
Co-Editor, ISO 13250, Topic Maps -- Reference Model
Member, Text Encoding Initiative Board of Directors, 2003-2005

Topic Maps: Human, not artificial, intelligence at work!
References:
- use case plan?
  - From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>