dita message

Subject: DITA "givens" to document

From: Don Day <dond@us.ibm.com>
To: <dita@lists.oasis-open.org>
Date: Wed, 25 Aug 2004 14:31:08 -0500

Within the scope of the specification, its probably necessary to document some additional things that have not come up yet in our outlines or discussions. Here's a starter list of some things that tend to be overlooked or ancillary... please feel free to suggest revisions or additions.

Encodings: The normal DITA vocabularies presume a standard encoding of UTF-8.

Language: The normal (default?) language of the base DITA vocabularies is US English. Instances may be composed in any language; the xml:lang attribute on the topic element or its specializations allows specification of other languages for topic content.

PIs: The normal DITA vocabularies do not specify Processing Instructions, and any PIs in DITA instances are normally ignored in standard DITA processing.

Comments: This specification has no policy regarding use of XML comments in DITA source. Authors may insert XML comments () at their own discretion; any comments in DITA instances are normally ignored in standard DITA processing.

Whitespace: whitespace in DITA instances follows usual XML rules unless a processor is otherwise advised by the use of xml:space fixed attributes in the language declarations. For the basic topic, this attribute defines "preserve" specifically for the elements "pre" and "lines". The preservation of non-significant whitespace (ie, tabs or linefeeds between elements in non-mixed-content models) is not required by DITA processing. However, content creation tools should respect the policies of content owners for the preservation of embedded non-significant whitespace (such as intentional pretty-printing offsets).

Metadata: The normal DITA vocabularies provide for rich metadata in the prolog of each topic instance. However the use of metadata is dependent on agreement among content using communities. The DITA specification suggests mappings for the Dublin Core set of properties but does not enforce the usage or interpretation thereof, for example.

Infotypes or Information types: The normal DITA vocabularies reify the principles of "information typing" as defined and commonly practiced for User Assistance. The basic infotypes posited in these vocabularies are Concept, Task, and Reference. DITA specialization allows for rich extension of these basic infotypes, but DITA itself does not define the possible universe of information types or taxonomies that might define a progression of specializations.

Preservation of <!DOCTYPE...> artifacts; use of un-doctyped instances in DTD-based systems, etc..

General entity declarations and references: As an application of XML, DITA supports use of DTD-based mechanisms for defining and using general entities, but because this practice is not equally supportable in Schema-based systems, DITA provides a compensating mechanism that works in either system, the "conref" content referencing mechanism. While not fully equal to the power of general entities, DITA conref has design characteristics that promote more consistent use of content referencing. Therefore, even for implementations based today on the use of DTDs which permit entity declarations, preferred DITA usage is to shun this mechanism and support your content reuse architectures on the strategic conref mechanism.

Data entities: Unlike most other SGML/XML languages, DITA does not define notations as a mechanism for referencing non-XML content. Instead the language follows the HTMl model of direct pointers (URLs) to information and conveying data attributes to the processors via mimetype metadata if required. DITA's core topic vocabulary provides HTML's <object> element as a standard mechanism for encoding the use of other formats and their associated renderers.

Accessibility: The design of DITA's base vocabulary (topic) presumes conversion to other formats for information delivery. This important principle means that accessible markup need not be managed explicitly by a writer, but may be created during transformation to formats that support accessible navigation, etc.. DITA provides alternate documentation mechanisms for its problematic content types: the object element provides a <desc> element for description; the image element provides an <alt> element. Good authoring practice requires considerate use of these features. [Conversely, although DITA content can be rendered directly in some browsers, such "source as deliverable" usage has no accessible hooks for screenreaders. If a delivery system requires XML with accessible features, it is okay to provide those features into the language by transform as long as the XML data is well formed and pre-normalized to insert defaulted class attributes for rendering support.]

Styles in content: The normal DITA vocabularies and processors provide no direct styling mechanisms. Elements that have layout-like behaviors may have attributes to support instance-specific layout policies (whether an image should cause a break in context, for example, or table column specifications). DITA provides the @outputclass attribute as a key for processors to associate particular style- or role-based processing for elements. In general, the use or non-use of this attribute in a topic instance should not compromise the usefulness of the published information.

In a slightly different category, what are best practices for interoperable use and interchange of DITA specializations and instances?

- Suggested packaging for exporting DITA specialized vocabularies, instances, and processors to other users, say for translation?

- Interchange/interoperability: do content creators/owners need normative guidelines to ensure that all DITA topics are maximally and equally able to be used by all tools?

- others?

Regards,
--
Don Day <dond@us.ibm.com>
Chair, OASIS DITA Technical Committee
IBM Lead DITA Architect
11501 Burnet Rd., MS 9037D018, Austin TX 78758
Ph. 512-838-8550 (T/L 678-8550)

"Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?"
--T.S. Eliot