dita message

Subject: Conformance and Feature Optionality/Flexibility
From: Eliot Kimber <ekimber@reallysi.com>
To: dita@lists.oasis-open.org
Date: Tue, 15 Jan 2008 14:59:45 -0500
I'd like to see if I can define some categories and definitions that 
will let us clearly distinguish between invariant and variant processing 
and better characterize the conditions under which a given DITA-aware 
processor conforms to the DITA specification.

A general statement: Users are always allowed to limit features as a 
matter of local policy. For example, you might impose a policy that all 
conrefs must be to resources in a particular place in your storage 
system. That sort of constraint has no effect on the processing result, 
it only constrains what authors are allowed to do. However, this type of 
constraint is a matter of local policy that users choose to impose on 
themselves. If a tool, inadvertently or by design, imposes a constraint 
that is not inherent in the spec, it must at least clearly document that 
constraint. That suggests there needs to be a conformance category like 
"conforming with limits" meaning that the processor can correctly 
process only a subset of all possible valid DITA documents. The most 
likely example would be constraints on what things can be addressed.

Thus, for the specific case of addressing, there is a fundamental 
difference between implementing addressing as defined in the spec, but 
for a subset of possible addresses, as opposed to implementing 
addressing in a way that is completely different from the spec. The 
first case is fine, the second case is bad. It also suggests that 
systems that implement such constraints should clearly distinguish 
errors that are violations of the spec and errors that are violations of 
policy. For example, say you bring in a new document that violates the 
conref location policy but the address used to create the conref is 
otherwise correct. In that case it would be inappropriate to report the 
document as just "invalid" but it would be appropriate to report that 
the conref target location policy had been violated.

---------------
Processor Types

While the universe of possible DITA processor types is unbounded, I 
think we can reasonably codify the following general classes of 
processor that have important distinguishing characteristics and 
conformance and flexibility implications:

- Source-to-Source Transformers: Tools that take DITA-based content as 
input and produce new data sets that are not intended as final 
deliverable renditions but as the source for additional processing, such 
as further rendition processing or authoring workflows. Such transforms 
would include both DITA-to-DITA transforms, where the result of the 
transform is a new set of DITA-based documents, and DITA-to-X 
transforms, where the result is not DITA based (e.g., DocBook documents, 
FrameMaker MIF, etc.) but is not a final form either. Source-to-source 
transformers may be standalone tools or may be inseparable components of 
tools in other categories.

- Renderers: Tools that take DITA-based content as input and produce 
some final-form consumable rendition of it: visual, aural, tactile, 
where the intended use of the rendition is to enable human consumption 
of the information, e.g., a printed manual, a set of Web pages, a 
talking book. Renderers may be standalone publishing tools or may be 
components of editors that provide a visual editing mode. A renderer 
always either incorporates a source-to-source transform within itself or 
uses the output of a standalone source-to-source transformer. The key is 
that rendering DITA content correctly in the general case always 
requires some amount of source-to-source transform, either literally or 
logically, in order to correctly implement the semantics of conref and 
maps, irrespective of the nature of the final rendition.

- Editors: Tools that enable the interactive creation and modification 
of DITA-based XML documents.  Editors may or may not incorporate 
renderers that produce a more-or-less WYSIWYG rendered view of the 
content for authoring (as distinct from simply enabling processing of 
the content through a separate renderer, e.g., the Toolkit).

- Information Management System: Tools that store and manage DITA-based 
content in a way that takes advantage of DITA-specific aspects of the 
data, such as providing features for manipulating DITA maps or searching 
based on specialization hierarchies or map context. Such systems may be 
content management systems that support authoring workflows or they may 
be retrieval systems that support delivery workflows (e.g., an Astoria 
or XHive Docato vs. MarkLogic or eXist).

------------------------------------
DITA-Defined Processing Requirements

I think that DITA processing indicated by the DITA specification can be 
classified into the following categories:

- Required and invariant: The processing result must be exactly as 
specified in the spec and there is no useful deviation from the spec. 
This will be those things that are purely mechanical, such as resolving 
address pointers to resources.

- Required but variable: The processing result must be "consistent with" 
the specification but there are different ways in which the processing 
could legitimately be expressed. Conref is the obvious member of this 
category, where the "effective result" of applying conref is clearly 
defined but there are many ways that that result could look in practice. 
Different types of processors may have different expectations. For 
example, a renderer must render the conref as resolved in some way, 
while an editor needs to enable the creation of conrefs and their 
navigation but may or may not be obligated to provide a "resolved view". 
An information management system needs to maintain knowledge of the 
conrefs but won't do anything to the data itself.

- Rendition-defined with defaults: Any rendition may do whatever it 
wants but the specification defines default rendition effects for 
well-known rendition targets that renderers should produce in absence of 
explicit, user-requested overrides. The intent here is to ensure that 
different renderers give consistent results for the same elements 
rendered to the same rendition type. E.g., The "b" element should be 
rendered as bold text by default in visual renderings. The default 
behavior is normative in that every conforming DITA processor that 
produces a given rendition type must provide at least one style 
configuration that produces the default result. This does not imply that 
the default rendition produced by a given tool be the DITA-defined 
default, only that the DITA-defined default be an available option.

- Rendition-defined without defaults: The processing result is entirely 
rendition-specific and the specification defines no specific default 
behavior, although it may indicate non-normative possible renderings, 
e.g. "This could be presented as a table or a list or a graphic or ...".

"Well known" rendition types must include:

- Paged media (printed pages)

- HTML-based interactive media (Web browsers, HTML-based help systems)

- Digital talking books (e.g., DAISY/NIMAS)

- Embedded "constrained format" help (e.g., phone help, printer help)

- Interactive electronic technical manuals (IETMs)

That is, less formally, visual, aural, and interactive renditions, with 
the visual renditions having more or less typographic capability.

-----------------------
DITA Feature Categories

I think that DITA features can be sorted into the following categories:

- Element type and attribute definitions (the "DITA document types")

   This is the core types and all the DITA TC-defined specializations.

   The syntax rules represented by these parts of the standard are clear 
and the rules for what you can and can't control is well defined in the 
architecture spec. That is, the rules for specialization of DITA markup 
are well defined. For conformance, this is primarily applicable to 
specialization declaration implementations and editors to the degree 
that they allow things required by DITA and don't allow things that 
aren't allowed. In general, an editor would have to go out of its way to 
not conform in this area assuming it's out-of-the-box configuration is 
otherwise correct (uses the official DTDs and so on). There might be 
areas where a given editor has hard-to-correct limitations, such as in 
table processing or something, but any XML editor should always be able 
to allow any valid DITA document to be created.

   Support for specialization at all beyond the core types might be a 
conformance question, I suppose. But I would assert that support for 
specialization (that is, recognition and processing of DITA element 
types based on their class hierarchy is an non-optional feature of DITA).

   That is, a tool that only supports use of the base DITA types using 
the TC-provided shells cannot be a conforming DITA processor, unless we 
want to define a class of conformance that is exactly this.

- Addressing

   This is mechanical pointers from one DITA construct to another, 
including:

   - href= and conref= values

   - keyref (as it is being defined in 1.2)

   Addressing is formally defined in terms of two aspects:

   - The syntax by which addresses (pointers) are written as strings 
within documents

   - The processing by which pointers are resolved to resources

   Both of these are invariant, such that for a given address string 
used in a given context against a given set of data, the resulting 
resource must always be the same.  Any processor that needs to resolve 
pointers must implement the pointer resolution as required by the spec.

NOTE: What it does with the resolved result is an entirely different 
question.

- Linking

   This is all those features that serve to establish relationships 
among abstract components, including:

   - topicref

   - xref

   - reltable

   - data-about

   For a given relationship the set of things related and their 
DITA-defined roles within the relationship are invariant. However, the 
rendition result for a given relationship instance or type would be 
rendition-defined with defaults for most or all of the DITA-defined link 
types.

- Conref

   This is a special case of linking where the there is less useful room 
for variance. In particular, the effective value of resolving a conref 
must be invariant for a given pair of elements. However, the rendition 
result for a conref could vary, in that you might have a rendition that 
reports both the conref source and target in some useful way (for 
example, showing both the source and effective values for the attributes 
shared between the two elements involved). But the elements involved 
must be the same in all cases.

- Rendition behavior

   This is all features that relate to how a given element looks or 
behaves interactively in the context of a particular rendition type. It 
is mostly bound to element types, e.g., lists must have a list nature, 
tables should have tabular nature, etc. Rendition is either 
rendition-specific with defaults or rendition specific. However, there 
is an essentially impossible to enforce intent that the rendition of a 
given element type be consistent with its core semantic, meaning that it 
would be "wrong" to render "pre" elements as flowing text unless you 
could show how that particular rendering is in fact consistent with the 
basic semantic of "pre".

In these cases the intent of the spec can be best expressed through 
definition of normative default rendering effects where there is a 
relevant distinct rendering effect. This would require at least crisping 
up the existing language to make it more precise and/or add something 
for each relevant rendition type. This could aided to some degree by 
making some general statements that serve as "default defaults" just to 
avoid continually restating the obvious.

It might be useful to take some the various forms of processing variance 
that have been discussed or people think they might want and try to see 
where they fall within this category matrix--if this set of categories 
is useful then we should be able to quickly distinguish allowable or 
allowed variances from disallowed variances, or at least be able to 
focus our discussion when there is not consensus about whether a given 
variance should be allowed.

Cheers,

Eliot

-- 
Eliot Kimber
Senior Solutions Architect
"Bringing Strategy, Content, and Technology Together"
Main: 610.631.6770
www.reallysi.com
www.rsuitecms.com