OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

dita-busdocs message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [dita-busdocs] linguistics and DITA Business Docs


Hi Bruce,

 

I noticed that these emails came into today but have been prevented from reading them due to some meetings. I am quite eager to read and think about them this evening.

 

 

Thank you,

Michael

 

From: Bruce Nevin (bnevin) [mailto:bnevin@cisco.com]
Sent: Wednesday, July 23, 2008 2:40 PM
To: Ann Rockley; dita-busdocs@lists.oasis-open.org
Cc: Michael Boses; rhanna@rim.com
Subject: RE: [dita-busdocs] linguistics and DITA Business Docs

 

I would add that labels for the equivalence classes for a sublanguage (column headings for the binary array of discourse analysis) seem typically to be classifier vocabulary for objects, events, and operations in the domain. 

 

This classifier vocabulary is part of the metalanguage for the domain. Sublanguage grammar has a metalanguage external to the sublanguage (in sublanguage sentences of an antecedent domain or domains), in which the definitions and presuppositions of the domain are specified. This is not the case for language as a whole, which necessarily contains its own metalanguage.

 

This may seem excessively recondite, but one consequence is that the elaborate tree-diagramming formalisms of some schools of linguistics have no innate or otherwise privileged status. They are no more than notational conventions for representing metalinguistic propositions that can be stated with simpler means that are part of the language itself. Non-linguists should not be snowed by them, nor expect that they are a necessary adjunct to applying methods of linguistic analysis to the problems that concern us.

 

    /BN 

 


From: Ann Rockley [mailto:rockley@rockley.com]
Sent: Wednesday, July 23, 2008 11:38 AM
To: Bruce Nevin (bnevin); dita-busdocs@lists.oasis-open.org
Cc: mboses@invisiondev.com; rhanna@rim.com
Subject: Re: [dita-busdocs] linguistics and DITA Business Docs

This is really good Bruce, thanks! We had a discussion today about what methodologies we could use to analyze content for potential specializations. Your pointers here will help with that task.

At 11:07 PM 7/22/2008, Bruce Nevin (bnevin) wrote:

You have looked into some of the literature of narrative analysis and discourse analysis, and have been unable to find anything that directly applies to the "business documents" that you have examined.
 
You are looking at the literature of typography and book design to identify accepted concepts and terminology.
 
Both of these literature surveys seem to me to have the implicit purpose of enlisting some recognized authority to underwrite any proposal that we make for DITA-based markup of the structure of "narrative" as found in the business documents of interest.
 
You have asked for my input as a linguist. First, I have to concur in your assessment of the recent literature of discourse analysis. I am not able to find much that is germane to our purposes . For example, consider the list of "topics of interest to discourse analysts" found in http://en.wikipedia.org/wiki/Discourse_analysis:

* The various levels or dimensions of discourse, such as sounds (intonation, etc.), gestures, syntax, the lexicon, style, rhetoric, meanings, speech acts, moves, strategies, turns and other aspects of interaction

* Genres of discourse (various types of discourse in politics, the media, education, science, business, etc.)

* The relations between discourse and the emergence of sentence syntax

* The relations between text (discourse) and context

* The relations between discourse and power

* The relations between discourse and interaction

* The relations between discourse and cognition and memory

None of these connect with our interest in examining "narrative" texts of "business documents" and identifying those parts of them which are semantically distinct and structurally identifiable. (To say "semantically distinct and structurally identifiable" is pleionastic, if not redundant, BTW--form and information are inextricable.)
 
If you read that wikipedia article, you will see that the above list applies to relatively recent ideas about discourse analysis--since the 1970s and 1980s. The earlier form of discourse analysis is what I am most familiar with, as developed by Zellig Harris and his students beginning in perhaps 1938. His most famous student, Noam Chomsky, confesses that he never really understood it--and that goes far to account for the neglect of this approach. Harris's work culminated in 1989 in a demonstration of the form of information in science and in 1991 in a theory of language and information (refs in that wiki article).
 
The methodology is an extension of distributional analysis in linguistics. If two items (morphemes, words, phrases) each occur in the same context, there is to that degree a semantic and structural equivalence between them. This is the methodological basis for establishing grammatical categories for sentences in a language (verb, noun, etc.), and for establishing more fine-grained semantic subcategories. But beyond the grammar of sentences, local equivalence classes can be set up within a discourse, applying only within that discourse or within a set of like discourses. With the aid of paraphrastic transformations the successive periods of a discourse can be regularized so that the members of each equivalence class fall within columns of a table (binary array). Beyond that, discourses of a constrained subject matter have the same equivalence classes; changes of topic ("changing the subject") within a discourse correspond with changes of vocabulary and changes of the equivalence classes in which they fall; related subject-matter domains intersect in these particulars; terms are borrowed from one domain to another, necessarily with changes of the contexts in which they occur and hence of their equivalence classes and their meanings; the language of a restricted domain has a distinct sublanguage grammar and lexicon; "general usage" may be an envelope of sublanguages; and so on.
 
So deep a command of the semantics of discourse we do not require, and anyway it depends upon a degree of analysis that is impracticable for us or for users of DITA.
 
Nonetheless, the methods of linguistic analysis are relevant, I think, for sharpening and extending what we call "content analysis" in the development of a data model. I suppose what I need to do is look at the examples that you surveyed, with particular attention to questions and problems that you identified. For example, in our meeting on July 5, someone said that there is no name for a paragraph preceding a subsection or a paragraph following a subsection, and that this lack of settled nomenclature difficult to discuss. I would need to see examples, because I don't know what you mean.
 
The DITA base types are so loose and unrestricted that it seems possible to shoehorn almost anything into them. We should carefully examine the opposite path: when we try that, where are the gaps and where are the pinches? Do we want more semantic specificity?
 
Another potential resource has come to my attention. The WikiSlice project is looking to put Wikipedia articles into DITA topics.
http://wiki.laptop.org/go/Projects/Wikislice
It might be useful to see what they're running into with the "narrative" text of wikipedia articles.
 
 /BN


_____________________________
Ann Rockley, President, The Rockley Group Inc.

The XML and Component Content Management Report is now available. This report provides insights into 5 XML authoring tools and 14 CCM tools. A must have if you are in the process of making tools decisions. For more information see http://www.cmswatch.com/CCM/Report/

Co-Chair DITA for Enterprise Business Documents Subcommittee http://wiki.oasis-open.org/dita/BusDocs

The Rockley Group Inc are experts in customer-centric enterprise content management and information architecture for component content management.  Check out our blog
www.rockleyblog.com for discussions on key aspects of content management.

www.rockley.com, 905-939-9298



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]