[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: use case plan?
It's been a month since I left for South America, with no visible progress since. So what's our plan to finish the use cases? And when's the next conference call? I've tried to assemble most of the wiki use cases into a single document, which can then be converted to HTML. See below ... it'll need editing. Bruce # Introduction # Use Cases ## Enhanced Search ### Overview The most pressing problem in data mining, web searching, however you want to term the problem, is that data is basically very dumb. If I search for job on Google, I am going to get results that include job as in employment, as well as Job, as in the Book of Job. While enormous strides have been made in any number of automated techniques for mining data, the fact remains that current results are far from ideal. Surely the originators of data meant something when they originated the data. So why not give them the ability to say what they meant? ### Scenario A genetics researcher for example, is writing a paper and wants to use a name that is common between the mouse and human genomes. But it is a lot of trouble to mark each term in the document. If they were able to declare a vocabulary for the document, that is what a particular word or words mean in the document, any search engine could interpret those words to have particular meanings. If a user has the ability to declare what is meant by words in the document, without the labor of annotating individual words, the document could provide rich metadata for searching/indexing of the documents. With the vocabulary of the document specified by metadata, search engines can distinguish between terms based upon information specified by the creator of the document. ## Bibliographies and Citations ### Overview Most textual document include references to content from elsewhere. That referenced content might be quoted excerpts, data summaries, or paraphrased findings or conclusions. In fields where attribution of such referenced content is essential, such as law and academic research, citations and reference lists associate referenced document content with their source. And yet, formatted reference lists typically represent a subset of the source metadata, and may need to be reformatted for different audiences. In this sense, citations and reference list items can be understood as dynamic text fields whose content is generated from linked metadata descriptions. It would therefore significantly enhance the possibility for user collaboration and application interoperability to have a standard metadata infrastructure. Likewise, bibliographic metadata is more complex than the simple document metadata commonly found in productivity applications, which is often just a series of key/ values. Consider a simple example of a journal article, which involves relations between a document and a periodical, one or more people who author that document, and so forth. Beyond standardization, then, it is important to have a metadata approach that can support that sort of richer description. ### Scenario Three users collaborate on a paper, each using different OpenDocument- compatible applications. As they write the paper and add citations, the citations and bibliography are automatically generated from the embedded metadata. Because the metadata is embedded, it's also portable. When the users pass the document around, the logic is always there so that the formatting can be regenerated. And because the metadata is based on a standard model, it would also facilitate interoperability between different third-party bibliographic applications. When authors finish paper, they send it to a publisher, who can extract the metadata and make it available to search engines and journal providers. A standard metadata model also allows the publisher to regenerate the citations in a variety of standard styles (such as MLA, APA, Chicago). ## Intellectual Property ### Overview Published documents of all kinds often include content from elsewhere: images, data, and so forth. This content typically has rights information associated with it. Yet currently managing such information is a manual task. An author or production editor must obtain the file(s), and separately manage the rights information. In turn, they must manually add such information to the published text in the form of a captions with copyright information and so forth. This can be both tedious and error prone. Allowing such metadata to be attached to such content would allow for more automated solutions. ### Scenario A government agency prepares a report that includes summary tables of data acquired from a third party. The document author embeds the table data in the document, and captions—including copyright and source information—are automatically generated. ### Scenario 2 A student includes a Creative Commons-licensed photograph in their report. The license and attribution are automatically extracted from the image metadata by the application and appended to the image caption. ## Content Tagging ### Overview Allow the tagging of OpenDocument document objects like e.g. paragraphs, words, figures, etc. with meta data. ### Scenario For example consider an OpenDocument text document, where a paragraph is marked as important; or a figure, which is tagged with information about the copyright owner. Objects which should be able to serve as a tag anchor are: * spans * paragraphs * figures * tables ### Scenario 2 In legal publishing (and presumably other domains) it is quite common to take an existing document (usually published legislation) and manually tag it with semantic information. It is generally critical that the presentation be preserved exactly for legal reasons. For example, a paragraph or series of paragraphs may constitute a legal definition of a term. A span of text may actually be a cross- reference within the same legislation, a reference to case law, or an amendment to another act. Tagging is often a precursor to transforming a document into a domain- specific format. In the more general sense, activities such as indexing and cross- referencing can be considered content tagging and should probably use the same mechanism. # Realtime Collaborative Editing ### Overview The main idea is expressed in [http://en.wikipedia.org/wiki/ Collaborative_real-time_editor Wikipedia's] writeup on the topic. For metadata, we need to keep the ramifications of this in mind. The same content (word, paragraph, page), may receive multiple instances of the same metadata element, each from a different author. One practical ramification of this may be that metadata will always need to be expressed as XML elements, not as XML attributes, since you cannot have multiple instances of the same attribute on the same element. ## Workflow Management ## Roundtrip improvement ### Overview Use the meta data mechanism to preserve "roundtrip information" from alien formats. ### Scenario Consider you have a specialized XML format which should be converted to OpenDocument and back without the loss of information, i.e. "roundtripping". Since not all information can be directly converted to OpenDocument objects meta data could be used, to store the additional information, such that the roundtrip succeeds. ## Extrinsic metadata ## Asymmetric metadata ## Automatically generated metadata ## Metadata templates ## Security metadata ### Overview Users often have permissions to see only parts of documents. If a document is stored on a secure network server, metadata attached to portions of a document could be used by an application to simply not render those portions for a user without the required authorizations. If saving other than to the server is disabled, users with varying permissions can work on parts of a document they are authorized to view while the remainder of the document is concealed. ### Scenario Classification officers or those charged with such responsibilities in military and governmental offices must often decide what parts of documents can be released and that varies according to a complex set of conditions. And those conditions can change. If metadata could be affixed to a document according to security levels (developed outside of ODF) that would fit into the current needs of such classification activities. (Military/governmental) ### Scenario 2 Commercial enterprises often have documents that may contain sensitive personnel, marketing or legal information, while portions of the document need to be processed by staff without the required permissions. Metadata based security for ODF would enable the construction of applications that can use ODF in its native format (no additional features required) to meet the security needs of commercial enterprises as well. (commercial) ### Scenario 3 Consumers may have similar issues but absent proper network and server management, will need different capabilities to secure portions of documents. But, the same security metadata could support applications that selectively encrypt portions of an ODF document (PCDATA). The encryption aspects are beyond the scope of ODF, but the availability of metadata security information would support the development of such applications. (consumers)
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]