dita-busdocs message

Subject: RE: Comments on the backgrounder
From: Michael Boses <mboses@QUARK.com>
To: "Bruce Nevin (bnevin)" <bnevin@cisco.com>,"dita-busdocs@lists.oasis-open.org" <dita-busdocs@lists.oasis-open.org>
Date: Wed, 3 Mar 2010 15:45:28 -0700
Hi Bruce,

These are all very thoughtful suggestions that would clearly improve the document. I think what we will have to determine as a subcommittee is whether this document should remain as is, since it is a historical background, or whether we should update it. I am for updating it (a Background 2.0?), and perhaps expanding the scope where necessary to include information that was not available when we first wrote it.

Thanks,
Michael

-----Original Message-----
From: Bruce Nevin (bnevin) [mailto:bnevin@cisco.com]
Sent: Wednesday, March 03, 2010 12:57 PM
To: dita-busdocs@lists.oasis-open.org
Subject: [dita-busdocs] Comments on the backgrounder

These are comments on the Backgrounder doc:
http://www.oasis-open.org/committees/download.php/25981/DITA%20for%20Enterprise%20Business%20Documents%20Sub-committee%20Background.pdf

My comments are preceded by : and in general text not so tagged is quoted from the document.
    _______________________________________________
    ø¤º°`°º¤ø,¸¸,ø¤º°`°º¤ø,¸,ø¤º°`°º¤ø,¸¸,ø¤º°`°º¤ø
    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
___________

Because of this, organizations are no longer satisfied to just manage unstructured content; they now want to structure it. In fact, it may be that the term unstructured content is a misnomer that served its purpose for a decade or so. We would suggest that a more modern view of content would be that there is some content that we have successfully structured, and other content that we do not know how to structure yet.

: All content is structured. If it were not, there would be no information there. The distinction is whether or not (some of) the structure of the content is made explicit with machine-readable tags.
___________

information strategists needed a DTD or schema that represented a standard, ...

: Better to say it conforms to a standard.

___________

the absence of a sub-committee focus on narrative business documents

: the absence of specific guidance from OASIS

___________

Narrative business documents are generally authored and presented as contiguous sections of content that are larger than the normal DITA Topic, leading to questions as to how topics should be aggregated into a document that can be validated against a DITA DTD.

: I don't think you mean that the sections are larger than a topic, but that the document comprising them is so. But size is not the point. The issue is the need for contiguity in a whole. The question of size relative to topics might come up as an argument against embedded topics, but that's not the thrust of discussion here. Something along these lines perhaps:

Narrative business documents are authored and presented as a whole. It is very difficult to make a business case for authoring separate topics for inclusion in a business document. Yet we do not want to lose the advantages of separate topics for content management and reuse. We must therefore ask how to aggregate topics into a document that can be edited and validated against a DITA DTD yet stored and managed separately.

___________

Business documents are made up of sections and sub-sections that are roughly analogous to DITA Topics, but the topic segmentation is not as clear as it is for technical documents

: What do you mean by segmentation? That topics do not always have titles, so it's not always obvious where the boundaries are between topics? Or do you mean rather that the classification of different topic types is not as clear?

___________

: An example would clarify what is meant by in-line content that is difficult to harmonize with sections. I missed the discussion of that, so I'm puzzled.

___________

: The metamodel doesn't seem to address the examples in the bulleted list on pp. 2-3, that is, I don't see relevant terminological clarification springing from the metamodel at this level of granularity in any obvious way, because the metamodel (as I have seen it) is concerned with topic types. The second example might require <title> to be optional in <topic> but seems answerable by <section> and <sectiondiv> unless there's some requirement for reuse in a map with <topicref> rather than in a topic with @conref.
___________

Most people understand that an XML schema is a definition that states how a document will be structured, the types of content a document may (or must) contain, and the metadata that may (or must) be used to describe the content in the document. The physical representation of this contract is one or more .xsd files, which contain technical notation that defines the rules of the contract.

: The term "contract" is used in transactional uses of well-formed XML, e.g. web services. (An example: http://msdn.microsoft.com/en-us/library/ms972326.aspx.) It's not appropriate here. If you do use the contract metaphor, you need a sentence or phrase prior to first use indicating the parties to the contract. Suggestion:

Most people understand that an XML schema defines a class of documents by specifying how documents of that type can be structured, the types of content a document may (or must) contain, and the metadata that may (or must) be used to describe the content in the document. It is represented physically by one or more files containing this definition in the form of rules in technical XSD or DTD notation.

___________

It is an abstraction because it is not based on any particular instance of a document type-instead it is abstracted from the sum of all known instances of a document type and their particular variations.

: Suggestion:

It is abstract because it is not based on any particular instance of a document type. Instead, by a process called content analysis, it is abstracted from the full range of variation displayed by all known instances of the document type.

___________

This particular type of abstract model is called a meta-model. A meta-model attempts to describe the component parts of something, and the meaning each component part has for a particular purpose.2 So when applied to narrative business documents, a meta-model would first describe the types of components that occur (simple examples are a title or caption), and the meaning each component conveys to the reader of the document.

: Suggestion:

This is what is called a metamodel. We can identify the parts of a document (title, paragraph, list, and so on), and define the purpose of each part. Although it would be based on that single document, the result would be a model of all documents that are structured in exactly the same way (same number of paragraphs etc. in the same order). A metamodel likewise describes the component parts and the meaning that each has for a particular purpose, but allows for all the variation that actually occurs in documents of that type.2 Applying this to narrative business documents, a metamodel first describes the types of components that occur (title, paragraph, list, and so on), and then indicates the purpose for which the writer uses each component and the meaning that it conveys to the reader of the document. It also indicates the meaning and purpose of the document type as a whole.
___________

Linguistics immediately comes to mind since it is the science of language. However, linguistics is primarily concerned with the spoken word. When linguists do study the written word, it is normally at the sentence, or sub-sentence level.3 For this reason the methods that linguists use to look at language might be of general interest to business analysts, but few if any direct writings of linguists address the structural semantics in documents that are so important to understanding how to apply DITA to these documents.

: Oh dear, I seem to have misrepresented matters. There is indeed extensive work on discourse analysis, which perforce concerns structural constraints across sentences and texts and the information borne by those constraints. (I've just acquired a copy of H.G. Widdowson, Text, Context, Pretext: Critical issues in discourse analysis, which seems quite worthwhile. I have mentioned Zellig Harris's work, culminating in The Form of Information in Science (Kluver 1989) and A Theory of Language and Information (Oxford 1991). I mentioned the MLP system available on sourceforge.)

: The poor fit between that work and our work with DITA is not due to limitations in linguistics so much as limitations of our interests in the structure and meaning of language. We are concerned with just enough of semantics to enable machines efficiently to store and deliver relevant content at the user's request, and in addition we must enable rendering and publishing software to format content appropriately without human intervention. Linguists are concerned with much richer and deeper aspects of the meaning and use of language (semantics and pragmatics) as intimately connected with syntax (which concerns us not at all). On the other hand, not many linguists are concerned with the automation of appropriate format. This is the point of your citation in fn 3 (a bit munged btw: Richard Power, Donia Scott, Nadjet Bouayad-Agha: Document Structure. Computational Linguistics 29(2): 211-260 (2003) and http://www.mitpressjournals.org/doi/pdfplus/10.1162/089120103322145315?cookieSet=1). Another by Bouayad-Agha, Scott, & Power is "The influence of layout on the interpretation of referring expressions (download links are at http://en.scientificcommons.org/16559885), which cites Bob Longacre's 1979 "The paragraph as a grammatical unit". It's hard to be categorial about what is and is not being done in linguistics, especially computational linguistics. The work on text generation is especially relevant, e.g. Dick Kittredge's http://www.cogentex.com/, Mellish, Scott, Cahill, Paiva, Evans, & Reape (2006), "A Reference Architecture for Natural Language Generation Systems" http://en.scientificcommons.org/15985779,
Power, Scott, & Bouayad-agha (2007) " Generating Texts with Style" http://en.scientificcommons.org/42437671.

: Suggestion:
Linguistics immediately comes to mind since it is the science of language. However, the scope of linguistics is so broad and deep that it is difficult to find a fit to our requirements for DITA and business documents. We are concerned with just enough of semantics to enable machines efficiently to store and deliver relevant content at the user's request. Linguists are concerned with much richer and deeper aspects of the meaning and use of language (semantics and pragmatics) as intimately connected with syntax (which concerns us not at all). The automation of appropriate format, an important motivation for DITA, is a minor concern for computational linguists working on generating text from data, who are mainly concerned with discourse coherence, alternative linearizations of content, and variations of style and tone, which we take to be the work of human writers.
___________

The goal of the sub-committee would be to start with the useful components of the typographical document model and expand these to create a light-weight narrative business document model that is sufficient to support the remaining sub-committee activities.

: We've stated the goal already. This isn't the goal, it's a proposed means for starting work toward the goal.

: Suggestion:

Despite its limitations, the typographical document model provides a starting place which the subcommittee can expand to create a light-weight narrative business document model that is sufficient to support the remaining sub-committee activities.

___________

The term harmonizing is used carefully here to suggest that ...

: Suggestion:
The term harmonizing is used advisedly here to suggest that
___________

The goal is to implement DITA for narrative business documents with as little business disruption as possible, while suggesting as few changes as possible to the standard itself.

: Suggestion:
The intent is to implement ...

: We might want to acknowledge that the adoption of DITA will entail changes to business processes, as described under Goal Four. The point here, as also articulated with the last goal, is that we want those to be beneficial process improvements, we want to avoid disruption that is not so motivated.

___________

Ideally, the narrative business document meta-model can be implemented without requesting changes to the schema, but to launch the effort with this requirement would seem to be self-defeating.

: I think you can omit this sentence. No one is expecting us to work under that kind of restriction. We don't need to be so defensive.
___________

We anticipate that the sub-committee will focus primarily on structural specializations, with some domain specializations that relate to the meta-model itself. We do not expect to address the domain specializations that may be required for narrative business documents in a specific industry. For example, we would view task as a domain specialization for the technical document meta-model, while assembly and disassembly might be further specializations of task for a particular industry. It is our intention to focus at the task level of granularity.

: We can frame this in the present rather than as a future prospect.

: Suggestion:
The primary focus of the subcommittee is on structural specializations, with some domain specializations that relate to the metamodel itself. Industry-specific domain specializations are out of scope.
___________

This may not be needed as a separate goal, since it could be viewed as part of the previous goal. It is listed separately for three reasons.

: This stands well as a separate goal without special justification. The rationale needs to state the business case of DITA adopters working with documents of the types that we examine. This section needs a complete rewrite. The first bullet under Goal One belongs here, it doesn't really bear on the metamodel.

: If I'm overruled and you keep this language, I have some specific comments, but in my present view my comments are irrelevant because the content should be replaced.

___________

Goal Four: ...

: We should mention here the recent development of parallel subcommittees on the Adoption TC.
___________

: I posted comments on the wording of the goals and deliverables as they were stated on the wiki. Here are those suggestions again, so they're all in one place.

Goals:

1. Develop and recommend a metamodel for enterprise business documents which
* Characterizes the range of documents addressed by this subcommittee.
* Identifies properties of these documents relevant to their treatment in the DITA standard.

2. Develop and recommend an approach to harmonizing the enterprise business document metamodel with the DITA standard.

3. Develop and recommend a standard approach for fully expanding DITA Map references into an editable process instance that may be validated against an approved DITA DTD or schema, without compromising content management and reuse.

4. Long term, to develop and recommend guidance for organizations that intend to adopt DITA for enterprise business documents.

Deliverables:

1. Recommended baseline enterprise business document metamodel.

2. Recommended harmonization of enterprise business document metamodel with the DITA Standard.

3. Recommended approach for fully expanding DITA Map references into a valid editable process instance.

4. Long term, recommended guidance for implementing DITA for enterprise business documents.

___________

: The term metamodel is usually not hyphenated these days. In general, there has long been a drift in English from separate words through hyphenation to single words. Base ball and basket ball are two familiar examples that were separated at the turn of the century, hyphenated in the 1940s. Even these are disanalogous in that meta- is not a separable word at all, but rather a very familiar prefix that is rarely if ever hyphenated (metaphysics, metalanguage, metamathematics, metadata, ...).

Likewise subcommittee rather than sub-committee; probably subsections rather than sub-sections.

More detail than you probably want to know: Compound modifiers, on the other hand, are hyphenated only when they precede the word they modify, to avoid ambiguity -- "in (line content)" vs. "(in-line) content", for example. Even this is relaxed (no hyphen) when there is no ambiguity, e.g. when the compound is so familiar that the alternative does not occur to the reader. But in those conditions compounds tend to become single words: e.g. "online content" vs. "in-line content".
___________

---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
Follow-Ups:
- RE: [dita-busdocs] RE: Comments on the backgrounder
  - From: "Ann Rockley" <rockley@rockley.com>
References:
- March 1 2010 Regular Meeting Minutes
  - From: mboses@quark.com
- Comments on the backgrounder
  - From: "Bruce Nevin (bnevin)" <bnevin@cisco.com>