dita-busdocs message

Subject: RE: [dita-busdocs] RE: Comments on the backgrounder
From: "Ann Rockley" <rockley@rockley.com>
To: "'Michael Boses'" <mboses@QUARK.com>,"'Bruce Nevin \(bnevin\)'" <bnevin@cisco.com>,<dita-busdocs@lists.oasis-open.org>
Date: Wed, 3 Mar 2010 22:52:35 -0500
I agree, good comments.

I think we should update it. We have done a lot of thinking since then. Keep
the original and create a 2.0 version as you suggested. 

Ann

-----Original Message-----
From: Michael Boses [mailto:mboses@QUARK.com] 
Sent: Wednesday, March 03, 2010 5:45 PM
To: Bruce Nevin (bnevin); dita-busdocs@lists.oasis-open.org
Subject: [dita-busdocs] RE: Comments on the backgrounder

Hi Bruce,

These are all very thoughtful suggestions that would clearly improve the
document. I think what we will have to determine as a subcommittee is
whether this document should remain as is, since it is a historical
background, or whether we should update it. I am for updating it (a
Background 2.0?), and perhaps expanding the scope where necessary to include
information that was not available when we first wrote it.

Thanks,
Michael

-----Original Message-----
From: Bruce Nevin (bnevin) [mailto:bnevin@cisco.com]
Sent: Wednesday, March 03, 2010 12:57 PM
To: dita-busdocs@lists.oasis-open.org
Subject: [dita-busdocs] Comments on the backgrounder

These are comments on the Backgrounder doc:
http://www.oasis-open.org/committees/download.php/25981/DITA%20for%20Enterpr
ise%20Business%20Documents%20Sub-committee%20Background.pdf

My comments are preceded by : and in general text not so tagged is quoted
from the document.
    _______________________________________________
    ø¤º°`°º¤ø,¸¸,ø¤º°`°º¤ø,¸,ø¤º°`°º¤ø,¸¸,ø¤º°`°º¤ø
    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
___________

Because of this, organizations are no longer satisfied to just manage
unstructured content; they now want to structure it. In fact, it may be that
the term unstructured content is a misnomer that served its purpose for a
decade or so. We would suggest that a more modern view of content would be
that there is some content that we have successfully structured, and other
content that we do not know how to structure yet.

: All content is structured. If it were not, there would be no information
there. The distinction is whether or not (some of) the structure of the
content is made explicit with machine-readable tags.
___________

information strategists needed a DTD or schema that represented a standard,
...

: Better to say it conforms to a standard.

___________

the absence of a sub-committee focus on narrative business documents

: the absence of specific guidance from OASIS

___________

Narrative business documents are generally authored and presented as
contiguous sections of content that are larger than the normal DITA Topic,
leading to questions as to how topics should be aggregated into a document
that can be validated against a DITA DTD.

: I don't think you mean that the sections are larger than a topic, but that
the document comprising them is so. But size is not the point. The issue is
the need for contiguity in a whole. The question of size relative to topics
might come up as an argument against embedded topics, but that's not the
thrust of discussion here. Something along these lines perhaps:

Narrative business documents are authored and presented as a whole. It is
very difficult to make a business case for authoring separate topics for
inclusion in a business document. Yet we do not want to lose the advantages
of separate topics for content management and reuse. We must therefore ask
how to aggregate topics into a document that can be edited and validated
against a DITA DTD yet stored and managed separately.

___________

Business documents are made up of sections and sub-sections that are roughly
analogous to DITA Topics, but the topic segmentation is not as clear as it
is for technical documents

: What do you mean by segmentation? That topics do not always have titles,
so it's not always obvious where the boundaries are between topics? Or do
you mean rather that the classification of different topic types is not as
clear?

___________

: An example would clarify what is meant by in-line content that is
difficult to harmonize with sections. I missed the discussion of that, so
I'm puzzled.

___________

: The metamodel doesn't seem to address the examples in the bulleted list on
pp. 2-3, that is, I don't see relevant terminological clarification
springing from the metamodel at this level of granularity in any obvious
way, because the metamodel (as I have seen it) is concerned with topic
types. The second example might require <title> to be optional in <topic>
but seems answerable by <section> and <sectiondiv> unless there's some
requirement for reuse in a map with <topicref> rather than in a topic with
@conref.
___________

Most people understand that an XML schema is a definition that states how a
document will be structured, the types of content a document may (or must)
contain, and the metadata that may (or must) be used to describe the content
in the document. The physical representation of this contract is one or more
.xsd files, which contain technical notation that defines the rules of the
contract.

: The term "contract" is used in transactional uses of well-formed XML, e.g.
web services. (An example:
http://msdn.microsoft.com/en-us/library/ms972326.aspx.) It's not appropriate
here. If you do use the contract metaphor, you need a sentence or phrase
prior to first use indicating the parties to the contract. Suggestion:

Most people understand that an XML schema defines a class of documents by
specifying how documents of that type can be structured, the types of
content a document may (or must) contain, and the metadata that may (or
must) be used to describe the content in the document. It is represented
physically by one or more files containing this definition in the form of
rules in technical XSD or DTD notation.

___________

It is an abstraction because it is not based on any particular instance of a
document type-instead it is abstracted from the sum of all known instances
of a document type and their particular variations.

: Suggestion:

It is abstract because it is not based on any particular instance of a
document type. Instead, by a process called content analysis, it is
abstracted from the full range of variation displayed by all known instances
of the document type.

___________

This particular type of abstract model is called a meta-model. A meta-model
attempts to describe the component parts of something, and the meaning each
component part has for a particular purpose.2 So when applied to narrative
business documents, a meta-model would first describe the types of
components that occur (simple examples are a title or caption), and the
meaning each component conveys to the reader of the document.

: Suggestion:

This is what is called a metamodel. We can identify the parts of a document
(title, paragraph, list, and so on), and define the purpose of each part.
Although it would be based on that single document, the result would be a
model of all documents that are structured in exactly the same way (same
number of paragraphs etc. in the same order). A metamodel likewise describes
the component parts and the meaning that each has for a particular purpose,
but allows for all the variation that actually occurs in documents of that
type.2 Applying this to narrative business documents, a metamodel first
describes the types of components that occur (title, paragraph, list, and so
on), and then indicates the purpose for which the writer uses each component
and the meaning that it conveys to the reader of the document. It also
indicates the meaning and purpose of the document type as a whole.
___________

Linguistics immediately comes to mind since it is the science of language.
However, linguistics is primarily concerned with the spoken word. When
linguists do study the written word, it is normally at the sentence, or
sub-sentence level.3 For this reason the methods that linguists use to look
at language might be of general interest to business analysts, but few if
any direct writings of linguists address the structural semantics in
documents that are so important to understanding how to apply DITA to these
documents.

: Oh dear, I seem to have misrepresented matters. There is indeed extensive
work on discourse analysis, which perforce concerns structural constraints
across sentences and texts and the information borne by those constraints.
(I've just acquired a copy of H.G. Widdowson, Text, Context, Pretext:
Critical issues in discourse analysis, which seems quite worthwhile. I have
mentioned Zellig Harris's work, culminating in The Form of Information in
Science (Kluver 1989) and A Theory of Language and Information (Oxford
1991). I mentioned the MLP system available on sourceforge.)

: The poor fit between that work and our work with DITA is not due to
limitations in linguistics so much as limitations of our interests in the
structure and meaning of language. We are concerned with just enough of
semantics to enable machines efficiently to store and deliver relevant
content at the user's request, and in addition we must enable rendering and
publishing software to format content appropriately without human
intervention. Linguists are concerned with much richer and deeper aspects of
the meaning and use of language (semantics and pragmatics) as intimately
connected with syntax (which concerns us not at all). On the other hand, not
many linguists are concerned with the automation of appropriate format. This
is the point of your citation in fn 3 (a bit munged btw: Richard Power,
Donia Scott, Nadjet Bouayad-Agha: Document Structure. Computational
Linguistics 29(2): 211-260 (2003) and
http://www.mitpressjournals.org/doi/pdfplus/10.1162/089120103322145315?cooki
eSet=1). Another by Bouayad-Agha, Scott, & Power is "The influence of layout
on the interpretation of referring expressions (download links are at
http://en.scientificcommons.org/16559885), which cites Bob Longacre's 1979
"The paragraph as a grammatical unit". It's hard to be categorial about what
is and is not being done in linguistics, especially computational
linguistics. The work on text generation is especially relevant, e.g. Dick
Kittredge's http://www.cogentex.com/, Mellish, Scott, Cahill, Paiva, Evans,
& Reape (2006), "A Reference Architecture for Natural Language Generation
Systems" http://en.scientificcommons.org/15985779,
Power, Scott, & Bouayad-agha (2007) " Generating Texts with Style"
http://en.scientificcommons.org/42437671.

: Suggestion:
Linguistics immediately comes to mind since it is the science of language.
However, the scope of linguistics is so broad and deep that it is difficult
to find a fit to our requirements for DITA and business documents. We are
concerned with just enough of semantics to enable machines efficiently to
store and deliver relevant content at the user's request. Linguists are
concerned with much richer and deeper aspects of the meaning and use of
language (semantics and pragmatics) as intimately connected with syntax
(which concerns us not at all). The automation of appropriate format, an
important motivation for DITA, is a minor concern for computational
linguists working on generating text from data, who are mainly concerned
with discourse coherence, alternative linearizations of content, and
variations of style and tone, which we take to be the work of human writers.
___________

The goal of the sub-committee would be to start with the useful components
of the typographical document model and expand these to create a
light-weight narrative business document model that is sufficient to support
the remaining sub-committee activities.

: We've stated the goal already. This isn't the goal, it's a proposed means
for starting work toward the goal.

: Suggestion:

Despite its limitations, the typographical document model provides a
starting place which the subcommittee can expand to create a light-weight
narrative business document model that is sufficient to support the
remaining sub-committee activities.

___________

The term harmonizing is used carefully here to suggest that ...

: Suggestion:
The term harmonizing is used advisedly here to suggest that
___________

The goal is to implement DITA for narrative business documents with as
little business disruption as possible, while suggesting as few changes as
possible to the standard itself.

: Suggestion:
The intent is to implement ...

: We might want to acknowledge that the adoption of DITA will entail changes
to business processes, as described under Goal Four. The point here, as also
articulated with the last goal, is that we want those to be beneficial
process improvements, we want to avoid disruption that is not so motivated.

___________

Ideally, the narrative business document meta-model can be implemented
without requesting changes to the schema, but to launch the effort with this
requirement would seem to be self-defeating.

: I think you can omit this sentence. No one is expecting us to work under
that kind of restriction. We don't need to be so defensive.
___________

We anticipate that the sub-committee will focus primarily on structural
specializations, with some domain specializations that relate to the
meta-model itself. We do not expect to address the domain specializations
that may be required for narrative business documents in a specific
industry. For example, we would view task as a domain specialization for the
technical document meta-model, while assembly and disassembly might be
further specializations of task for a particular industry. It is our
intention to focus at the task level of granularity.

: We can frame this in the present rather than as a future prospect.

: Suggestion:
The primary focus of the subcommittee is on structural specializations, with
some domain specializations that relate to the metamodel itself.
Industry-specific domain specializations are out of scope.
___________

This may not be needed as a separate goal, since it could be viewed as part
of the previous goal. It is listed separately for three reasons.

: This stands well as a separate goal without special justification. The
rationale needs to state the business case of DITA adopters working with
documents of the types that we examine. This section needs a complete
rewrite. The first bullet under Goal One belongs here, it doesn't really
bear on the metamodel.

: If I'm overruled and you keep this language, I have some specific
comments, but in my present view my comments are irrelevant because the
content should be replaced.

___________

Goal Four: ...

: We should mention here the recent development of parallel subcommittees on
the Adoption TC.
___________

: I posted comments on the wording of the goals and deliverables as they
were stated on the wiki. Here are those suggestions again, so they're all in
one place.

Goals:

1. Develop and recommend a metamodel for enterprise business documents which
* Characterizes the range of documents addressed by this subcommittee.
* Identifies properties of these documents relevant to their treatment in
the DITA standard.

2. Develop and recommend an approach to harmonizing the enterprise business
document metamodel with the DITA standard.

3. Develop and recommend a standard approach for fully expanding DITA Map
references into an editable process instance that may be validated against
an approved DITA DTD or schema, without compromising content management and
reuse.

4. Long term, to develop and recommend guidance for organizations that
intend to adopt DITA for enterprise business documents.

Deliverables:

1. Recommended baseline enterprise business document metamodel.

2. Recommended harmonization of enterprise business document metamodel with
the DITA Standard.

3. Recommended approach for fully expanding DITA Map references into a valid
editable process instance.

4. Long term, recommended guidance for implementing DITA for enterprise
business documents.

___________

: The term metamodel is usually not hyphenated these days. In general, there
has long been a drift in English from separate words through hyphenation to
single words. Base ball and basket ball are two familiar examples that were
separated at the turn of the century, hyphenated in the 1940s. Even these
are disanalogous in that meta- is not a separable word at all, but rather a
very familiar prefix that is rarely if ever hyphenated (metaphysics,
metalanguage, metamathematics, metadata, ...).

Likewise subcommittee rather than sub-committee; probably subsections rather
than sub-sections.

More detail than you probably want to know: Compound modifiers, on the other
hand, are hyphenated only when they precede the word they modify, to avoid
ambiguity -- "in (line content)" vs. "(in-line) content", for example. Even
this is relaxed (no hyphen) when there is no ambiguity, e.g. when the
compound is so familiar that the alternative does not occur to the reader.
But in those conditions compounds tend to become single words: e.g. "online
content" vs. "in-line content".
___________

---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
References:
- March 1 2010 Regular Meeting Minutes
  - From: mboses@quark.com
- Comments on the backgrounder
  - From: "Bruce Nevin (bnevin)" <bnevin@cisco.com>
- RE: Comments on the backgrounder
  - From: Michael Boses <mboses@QUARK.com>