dita-busdocs message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: linguistics and DITA Business Docs
- From: "Bruce Nevin (bnevin)" <bnevin@cisco.com>
- To: <dita-busdocs@lists.oasis-open.org>
- Date: Tue, 22 Jul 2008 23:07:34 -0400
You have looked into
some of the literature of narrative analysis and discourse analysis, and have
been unable to find anything that directly applies to the "business documents" that you have
examined.
You are looking at
the literature of typography and book design to identify accepted concepts and
terminology.
Both of these
literature surveys seem to me to have the implicit purpose of enlisting some
recognized authority to underwrite any
proposal that we make for DITA-based markup of the structure of
"narrative" as found in the business documents of interest.
You have asked
for my input as a linguist. First, I have to
concur in your assessment of the recent literature of discourse
analysis. I am not able to find much that
is germane to our purposes . For example,
consider the list of "topics
of interest to discourse analysts" found
in http://en.wikipedia.org/wiki/Discourse_analysis:
* The various
levels or dimensions of discourse, such as sounds (intonation, etc.),
gestures, syntax, the lexicon, style, rhetoric, meanings, speech acts, moves,
strategies, turns and other aspects of interaction
* Genres of discourse
(various types of discourse in politics, the media, education, science,
business, etc.)
* The relations between discourse and the emergence of
sentence syntax
* The relations between text (discourse) and context
*
The relations between discourse and power
* The relations between discourse
and interaction
* The relations between discourse and cognition and
memory
None of these connect with our interest
in examining "narrative" texts of
"business documents" and
identifying those parts
of them which are semantically
distinct and structurally identifiable.
(To say "semantically distinct and
structurally identifiable" is pleionastic, if not redundant, BTW--form and
information are inextricable.)
If you read that wikipedia article, you will see that
the above list applies to relatively recent ideas about discourse
analysis--since the 1970s and 1980s. The earlier form of discourse
analysis is what I am most familiar with, as developed by Zellig Harris and his
students beginning in perhaps 1938. His most
famous student, Noam Chomsky, confesses that he never really understood it--and
that goes far to account for the neglect of this approach. Harris's
work culminated in 1989 in a
demonstration of the form of information in science and in 1991 in a theory of
language and information (refs in that wiki article).
The methodology is an
extension of distributional analysis in linguistics. If two items (morphemes,
words, phrases) each occur in the same context, there is to that degree a semantic and structural equivalence between
them. This is the methodological basis for establishing grammatical categories
for sentences in a language (verb,
noun, etc.), and for establishing more
fine-grained semantic subcategories. But beyond the grammar of sentences, local
equivalence classes can be set up within a discourse, applying only within that discourse or within a set of
like discourses. With the aid of paraphrastic transformations the
successive periods of a discourse can be regularized so that the members of each
equivalence class fall within columns of a table (binary array). Beyond that, discourses of a constrained subject
matter have the same equivalence classes; changes of topic ("changing the
subject") within a discourse correspond with changes of vocabulary and changes
of the equivalence classes in which they fall; related subject-matter domains
intersect in these particulars; terms are borrowed from one domain to
another, necessarily with changes of the contexts in which they occur and
hence of their equivalence classes and their meanings; the language of a
restricted domain has a distinct sublanguage grammar and lexicon; "general
usage" may be an envelope of sublanguages; and so on.
So deep a command of the semantics of
discourse we do not require,
and anyway it depends upon a degree of analysis that is impracticable for
us or for users of DITA.
Nonetheless, the
methods of linguistic analysis are relevant, I think, for sharpening and
extending what we call "content analysis" in the development of a data model. I
suppose what I need to do is look at the examples that you surveyed, with
particular attention to questions and problems that you identified. For example,
in our meeting on July
5, someone said that there is no name for a paragraph preceding a
subsection or a paragraph following a subsection, and that this lack of settled nomenclature
difficult to discuss. I would need to see examples, because I don't know what you
mean.
The DITA base types are so loose and unrestricted that it
seems possible to shoehorn almost anything into them. We should carefully
examine the opposite path: when we try that, where are the gaps and where are
the pinches? Do we want more semantic specificity?
Another potential
resource has come to my attention. The WikiSlice project is looking to put
Wikipedia articles into DITA topics.
http://wiki.laptop.org/go/Projects/Wikislice
It
might be useful to see what they're running into with the "narrative" text of
wikipedia articles.
/BN
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]