dita message

Subject: Re: [dita] Foreign Generalization: Should be moved to a non-normativeappendix

From: Michael Priestley <mpriestl@ca.ibm.com>
To: Eliot Kimber <ekimber@reallysi.com>
Date: Wed, 2 Dec 2009 14:09:37 -0500

Eliot Kimber <ekimber@reallysi.com> wrote on 12/01/2009 10:57:48 PM: > > Can respecialization be done on the validatable, generalized content > > without knowledge of which system did the generalization? Or does the > > wording below allow different groups to generalize in different ways, > > allowing generalized content to be tied to the source group's > > implementation? > > I don't understand your question. > > Generalization is reversible if and only if the class attributes are > preserved. That is independent of how the generalization was created and > whether or not there are foreign elements and whether or not they contain > non-DITA markup.

One of the possible outcomes of generalization is validatable DITA, with a doctype and preserved class attributes that allow reversion to the previous doctype. This is per the existing generalization spec. I want to understand whether this scenario is still supported if we don't say how to generalize foreign content.
> I think it must always be the case that for specializations of foreign the > module that defines the specialization must also include the DTD > declarations for any allowed non-DITA content.
> > Therefore, if you are respecializing to that module, you can integrate the > module into the effective DITA doctype and therefore will have the required > declarations to restore any protected markup as elements of the > respecialized document instance. I don't see how that can depend on the > details of how the generalization was performed.

If generalizers are not required to split out foreign markup from the document when creating a validatable generalization, then they may use some other technique, for example namespaces, which could work for them if they're using schemas but will break if the recipient is using DTDs.

If generalizers do split out the content, but choose not to set the DITA-foreign type on the object element, then a respecializing process will not be able to distinguish which objects should be resolved to inline during respecialization.

The guideline for naming conventions on the generated sidefile were intended to limit the possibility of a generated file overwriting another existing file, and to make it easy to associate the generated file with the source file and topic from which it was extracted. This part is already just a recommendation.
> > Here's a question: if <foreign> is not specialized but you have included, in > your shell, declarations for non-DITA elements, the problem still exists > even though foreign itself isn't specialized (and therefore doesn't need to > be generalized). > > Therefore, the issue isn't really an issue of generalization, but an issue > of *any transformation* from one DITA document type to a different DITA > document type: there is *always* the potential that non-DITA markup in > <foreign> cannot be validated in the transformation target.
What you describe is a non-standard use of <foreign>. If someone does this, then they've no longer got valid DITA content according to the spec.

The intent of <foreign> specializations is to provide a hook and name for the module, which can then provide a specific content model. The foreign markup can then be included and assembled into a doctype like any other domain module. The specialized <foreign> element provides a signal to processors about where that foreign markup has been included (through the specialized element's class attribute).

If you include random markup in your DITA document without a specialized element, then there is no way for a processor to tell whether a specific document contains markup from that domain. That means it cannot be reliably generalized to any valid ancestor.

> > Thus, the focus on generalization with respect to <foreign> is a red > herring, or rather, an instance of a more general problem.
> > The solution could be made clearer by having a separate "foreign markup > domain" module that provides the declarations for non-DITA elements and is > required to be declared in @domains. That would provide both a DITA-defined > place to declare foreign elements when <foreign> is not specialized and > enable determination if the target doctype supports the same set of modules > [you still wouldn't be able to tell what module a given non-DITA element > type belonged to unless we provided way to define the mapping somewhere].

What you're suggesting is already part of the spec, and is already required. This is how <foreign> is meant to be used. And you would be able to tell what module a non-DITA element type belonged to, by inspecting the class attribute of its specialized container.
> > But without that, you can't know by DITA-defined means that a target > document supports any given non-DITA elements, so your default behavior has > to be to protect such markup.

Without that you don't have valid DITA, according to the spec. So you are protecting a broad range of behavior that is already disallowed. We don't allow people to randomly add markup to DITA and still call it DITA. There are procedures, to ensure interoperability and interchangeability. Specializing <foreign> is one of those procedures.
> > We should definitely make the isomorphic relationship between content and > <object> within <foreign> clearer in the spec--given an explanation of how > the two are functionally equivalent, it becomes clearer that doing that > transform is one way to solve the unvalidatable foreign content problem. > > Note also that in XSLT 2 it is trivial to parse the content of a CDATA > marked section. That was not the case with XSLT 1. Since the Toolkit now > supports XSLT 2 we could, for example, implement the processing of > CDATA-encapsulated <foreign> content in the 1.5 Toolkit. > > In fact, now that I think about it, you could even have a complete XML > document with its own DOCTYPE decl in a CDATA marked section, e.g.: > > <foreign> > <![CDATA[ > <?xml version="1.0"?> > <!DOCTYPE mathml PUBLIC "whatever" "mathml.dtd"> > <mathml> > ... > </mathml> > ]]> > </foreign> > > The current standard doesn't disallow that (it couldn't) but it also doesn't > indicate that processors should handle that case as though the content were > a separate document entity. But that behavior is definitely implicit in the > "it can be replaced with an <object> element" statement. > > That is, if the content of <foreign> can be replaced by an <object> element > that references the content, it follows that an object element can be > replaced by a foreign element that contains the content referenced by the > <object> element.
I'm not going to worry about this for 1.2. I just want to protect the behavior we defined as valid for 1.1.

> > Cheers, > > Eliot > > -- > Eliot Kimber > Senior Solutions Architect > "Bringing Strategy, Content, and Technology Together" > Main: 610.631.6770 >www.reallysi.com>www.rsuitecms.com>

Follow-Ups:
- Re: [dita] Foreign Generalization: Should be moved to a non-normativeappendix
  - From: Eliot Kimber <ekimber@reallysi.com>

References:
- RE: [dita] Foreign Generalization: Should be moved to a non-normativeappendix
  - From: Michael Priestley <mpriestl@ca.ibm.com>
- Re: [dita] Foreign Generalization: Should be moved to a non-normativeappendix
  - From: Eliot Kimber <ekimber@reallysi.com>