[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [office-metadata] use case plan?
Bruce, Looks like a very good start to me! +1 on an HTML version but I would also like to have an ODF version as well. Hope you are at the start of a great week! Patrick Bruce D'Arcus wrote: > > It's been a month since I left for South America, with no visible > progress since. So what's our plan to finish the use cases? And when's > the next conference call? > > I've tried to assemble most of the wiki use cases into a single > document, which can then be converted to HTML. See below ... it'll > need editing. > > Bruce > > # Introduction > > > > # Use Cases > > ## Enhanced Search > > ### Overview > > The most pressing problem in data mining, web searching, however you > want to term the problem, is that data is basically very dumb. If I > search for job on Google, I am going to get results that include job > as in employment, as well as Job, as in the Book of Job. > > While enormous strides have been made in any number of automated > techniques for mining data, the fact remains that current results are > far from ideal. Surely the originators of data meant something when > they originated the data. So why not give them the ability to say > what they meant? > > ### Scenario > > A genetics researcher for example, is writing a paper and wants to > use a name that is common between the mouse and human genomes. But it > is a lot of trouble to mark each term in the document. If they were > able to declare a vocabulary for the document, that is what a > particular word or words mean in the document, any search engine > could interpret those words to have particular meanings. If a user > has the ability to declare what is meant by words in the document, > without the labor of annotating individual words, the document could > provide rich metadata for searching/indexing of the documents. > > With the vocabulary of the document specified by metadata, search > engines can distinguish between terms based upon information > specified by the creator of the document. > > ## Bibliographies and Citations > > ### Overview > > Most textual document include references to content from elsewhere. > That referenced content might be quoted excerpts, data summaries, or > paraphrased findings or conclusions. In fields where attribution of > such referenced content is essential, such as law and academic > research, citations and reference lists associate referenced document > content with their source. And yet, formatted reference lists > typically represent a subset of the source metadata, and may need to > be reformatted for different audiences. In this sense, citations and > reference list items can be understood as dynamic text fields whose > content is generated from linked metadata descriptions. > > It would therefore significantly enhance the possibility for user > collaboration and application interoperability to have a standard > metadata infrastructure. Likewise, bibliographic metadata is more > complex than the simple document metadata commonly found in > productivity applications, which is often just a series of key/ > values. Consider a simple example of a journal article, which > involves relations between a document and a periodical, one or more > people who author that document, and so forth. Beyond > standardization, then, it is important to have a metadata approach > that can support that sort of richer description. > > ### Scenario > > Three users collaborate on a paper, each using different OpenDocument- > compatible applications. > > As they write the paper and add citations, the citations and > bibliography are automatically generated from the embedded metadata. > Because the metadata is embedded, it's also portable. When the users > pass the document around, the logic is always there so that the > formatting can be regenerated. And because the metadata is based on a > standard model, it would also facilitate interoperability between > different third-party bibliographic applications. > > When authors finish paper, they send it to a publisher, who can > extract the metadata and make it available to search engines and > journal providers. A standard metadata model also allows the > publisher to regenerate the citations in a variety of standard styles > (such as MLA, APA, Chicago). > > > ## Intellectual Property > > ### Overview > > Published documents of all kinds often include content from > elsewhere: images, data, and so forth. This content typically has > rights information associated with it. Yet currently managing such > information is a manual task. An author or production editor must > obtain the file(s), and separately manage the rights information. In > turn, they must manually add such information to the published text > in the form of a captions with copyright information and so forth. > This can be both tedious and error prone. Allowing such metadata to > be attached to such content would allow for more automated solutions. > > ### Scenario > > A government agency prepares a report that includes summary tables of > data acquired from a third party. The document author embeds the > table data in the document, and captions—including copyright and > source information—are automatically generated. > > ### Scenario 2 > > A student includes a Creative Commons-licensed photograph in their > report. The license and attribution are automatically extracted from > the image metadata by the application and appended to the image caption. > > ## Content Tagging > > ### Overview > > Allow the tagging of OpenDocument document objects like e.g. > paragraphs, words, figures, etc. with meta data. > > ### Scenario > > For example consider an OpenDocument text document, where a paragraph > is marked as important; or a figure, which is tagged with information > about the copyright owner. Objects which should be able to serve as a > tag anchor are: > > * spans > * paragraphs > * figures > * tables > > ### Scenario 2 > > In legal publishing (and presumably other domains) it is quite common > to take an existing document (usually published legislation) and > manually tag it with semantic information. It is generally critical > that the presentation be preserved exactly for legal reasons. > > For example, a paragraph or series of paragraphs may constitute a > legal definition of a term. A span of text may actually be a cross- > reference within the same legislation, a reference to case law, or an > amendment to another act. > > Tagging is often a precursor to transforming a document into a domain- > specific format. > > In the more general sense, activities such as indexing and cross- > referencing can be considered content tagging and should probably use > the same mechanism. > > # Realtime Collaborative Editing > > ### Overview > > The main idea is expressed in [http://en.wikipedia.org/wiki/ > Collaborative_real-time_editor Wikipedia's] writeup on the topic. > > For metadata, we need to keep the ramifications of this in mind. The > same content (word, paragraph, page), may receive multiple instances > of the same metadata element, each from a different author. > > One practical ramification of this may be that metadata will always > need to be expressed as XML elements, not as XML attributes, since > you cannot have multiple instances of the same attribute on the same > element. > > > ## Workflow Management > > > ## Roundtrip improvement > > ### Overview > > Use the meta data mechanism to preserve "roundtrip information" from > alien formats. > > ### Scenario > > Consider you have a specialized XML format which should be converted > to OpenDocument and back without the loss of information, i.e. > "roundtripping". Since not all information can be directly converted > to OpenDocument objects meta data could be used, to store the > additional information, such that the roundtrip succeeds. > > ## Extrinsic metadata > > > ## Asymmetric metadata > > > ## Automatically generated metadata > > > ## Metadata templates > > > ## Security metadata > > ### Overview > > Users often have permissions to see only parts of documents. If a > document is stored on a secure network server, metadata attached to > portions of a document could be used by an application to simply not > render those portions for a user without the required authorizations. > If saving other than to the server is disabled, users with varying > permissions can work on parts of a document they are authorized to > view while the remainder of the document is concealed. > > ### Scenario > > Classification officers or those charged with such responsibilities > in military and governmental offices must often decide what parts of > documents can be released and that varies according to a complex set > of conditions. And those conditions can change. If metadata could be > affixed to a document according to security levels (developed outside > of ODF) that would fit into the current needs of such classification > activities. (Military/governmental) > > ### Scenario 2 > > Commercial enterprises often have documents that may contain > sensitive personnel, marketing or legal information, while portions > of the document need to be processed by staff without the required > permissions. Metadata based security for ODF would enable the > construction of applications that can use ODF in its native format > (no additional features required) to meet the security needs of > commercial enterprises as well. (commercial) > > ### Scenario 3 > > Consumers may have similar issues but absent proper network and > server management, will need different capabilities to secure > portions of documents. But, the same security metadata could support > applications that selectively encrypt portions of an ODF document > (PCDATA). The encryption aspects are beyond the scope of ODF, but the > availability of metadata security information would support the > development of such applications. (consumers) > > > > -- Patrick Durusau Patrick@Durusau.net Chair, V1 - Text Processing: Office and Publishing Systems Interface Co-Editor, ISO 13250, Topic Maps -- Reference Model Member, Text Encoding Initiative Board of Directors, 2003-2005 Topic Maps: Human, not artificial, intelligence at work!
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]