dita message

Subject: Re: [dita] Foreign Generalization: Should be moved to a non-normativeappendix

From: Michael Priestley <mpriestl@ca.ibm.com>
To: Eliot Kimber <ekimber@reallysi.com>
Date: Wed, 2 Dec 2009 15:16:57 -0500

Eliot Kimber <ekimber@reallysi.com> wrote on 12/02/2009 03:04:03 PM: > There are three ways content can be protected: move it to a side file, > escape all markup characters, or encapsulate it in a CDATA marked section. > The second two are obvious and would be to any implementor. The first is not > obvious and therefore we definitely need to say that it is *allowed* by the > standard.

If we allow three ways in the standard, then all three ways are required to be supported by respecialization processes. And I do think we would need to indicate how a process should interpret the content. For example - if a respecializer encounters inline content that is neither CDATA, nor sidefiled, assume markup has been escaped, and unescape it? That doesn't seem a terribly safe assumption to me, but if we want to open that door we will need to explore all the places it leads.

I wanted to require support for only one of those. If we want to broaden the options, I think it creates more cost to implementers, not fewer.
> I think if we refocused the current Foreign generalization topic on > "protecting foreign content" when transforming to DITA documents that use > DTDs and then in the context of the "foreign-to-object" technique mention > that if you are doing generalization as part of the transform and > respecialization is a requirement you should set the @data attribute as > indicated in the current spec. Also, this topic should be in the DITA > Processing section, not in the specialization section, since it's not really > about specialization or generalization specifically.

The only case in which there is foreign content in DITA is in the context of a foreign specialization. The only case in which that content requires protection is during transformation to another DITA format. The only case of DITA-to-DITA transformation that we prescribe behavior for is generalization and respecialization.

That's why I continue to say this is a generalization problem, and belongs where it is, with the current requirements. > > But that is the only part of this whole issue that specifically involves > generalization. Otherwise this is a more general DITA-to-DITA transform > issue. > > I'm not sure I understand MP's concern about the scenario "being supported". > This case *must* be handled by any transformation processor that produces > validatable result documents. It doesn't matter whether we say anything > about it or not. So there's no way that the scenario *can't* be > supported--even if the spec was silent there would still be ways to solve > the problem.

Not automatically. A human could inspect the problem, design a workaround, and write some code. That's not how generalization is supposed to work. The goal is interoperable systems by default, not by extra effort.
> > With respect to how one may validly include non-DITA markup declarations, > see my comments below.

Gotcha. It does at least say that it "hinders interoperability", which is a mild understatement. At the very least, I'd suggest that we make that an option which we strongly recommend against. I'm not sure we can remove it without being backwards incompatible with 1.1, but we can make the existing warning stronger.
> > Cheers, > > E. > > > On 12/2/09 1:09 PM, "Michael Priestley" <mpriestl@ca.ibm.com> wrote: > > > Eliot Kimber <ekimber@reallysi.com> wrote on 12/01/2009 10:57:48 PM: > > [...] > > >> I think it must always be the case that for specializations of foreign > > the > >> module that defines the specialization must also include the DTD > >> declarations for any allowed non-DITA content. > >> > >> Therefore, if you are respecializing to that module, you can integrate > > the > >> module into the effective DITA doctype and therefore will have the > > required > >> declarations to restore any protected markup as elements of the > >> respecialized document instance. I don't see how that can depend on the > >> details of how the generalization was performed. > > > > If generalizers are not required to split out foreign markup from the > > document when creating a validatable generalization, then they may use > > some other technique, for example namespaces, which could work for them if > > they're using schemas but will break if the recipient is using DTDs. > > The issue only exists for DTD-based validation: the XSDs already specify > "skip" for the content of <foreign>, so you *never* have to protect it when > the output target uses XSDs for validation. > > For DTDS, namespaces don't help. There are only three ways to protect the > content and it is sufficient to enumerate them. > > [...] > > >> Here's a question: if <foreign> is not specialized but you have > > included, in > >> your shell, declarations for non-DITA elements, the problem still exists > >> even though foreign itself isn't specialized (and therefore doesn't need > > to > >> be generalized). > >> > >> Therefore, the issue isn't really an issue of generalization, but an > > issue > >> of *any transformation* from one DITA document type to a different DITA > >> document type: there is *always* the potential that non-DITA markup in > >> <foreign> cannot be validated in the transformation target. > > > > What you describe is a non-standard use of <foreign>. If someone does > > this, then they've no longer got valid DITA content according to the spec. > > I'm not sure the current spec actually disallows this case. From the 2nd > review draft under "Specializing foreign or unknown content": > > "There are three methods of incorporating foreign content into DITA. > > - A domain specialization of the <foreign> or <unknown> element. This is the > usual implementation. > - A structural specialization using the <foreign> or <unknown> element. This > affords more control over the content. > - Do nothing: simply embed the foreign content within <foreign> or > <unknown>. Because of the ANY content model of these elements, this method > offers the least amount of control over the content and hinders > interoperability. > > Note item 3: that is exactly the case I am referring to below. > > However, I agree that it probably *should not* be allowed. That is, > unspecialized uses of <foreign> should only allow DITA elements (e.g., > <desc>, <object>, etc.). This would require the creation of modules and > remove the problem of knowing where to get declarations for non-DITA > elements. > > Either the third bullet above is correct and therefore it's implicitly > allowed to include non-DITA declarations into a shell (because there's no > other way for you to do it) or including non-DITA declarations into a shell > is not allowed and therefore the third bullet is nonsense because you can > never have a valid case of no <foreign> specializations and valid non-DITA > elements. > > MP clearly thought the latter was the case and I am agreeing that it > *should* be the case but also asserting that the current spec as written > doesn't say that. > > Cheers, > > E. > > > The intent of <foreign> specializations is to provide a hook and name for > > the module, which can then provide a specific content model. The foreign > > markup can then be included and assembled into a doctype like any other > > domain module. The specialized <foreign> element provides a signal to > > processors about where that foreign markup has been included (through the > > specialized element's class attribute). > > > > If you include random markup in your DITA document without a specialized > > element, then there is no way for a processor to tell whether a specific > > document contains markup from that domain. That means it cannot be > > reliably generalized to any valid ancestor. > > > >> > >> Thus, the focus on generalization with respect to <foreign> is a red > >> herring, or rather, an instance of a more general problem. > >> > >> The solution could be made clearer by having a separate "foreign markup > >> domain" module that provides the declarations for non-DITA elements and > > is > >> required to be declared in @domains. That would provide both a > > DITA-defined > >> place to declare foreign elements when <foreign> is not specialized and > >> enable determination if the target doctype supports the same set of > > modules > >> [you still wouldn't be able to tell what module a given non-DITA element > >> type belonged to unless we provided way to define the mapping > > somewhere]. > > > > What you're suggesting is already part of the spec, and is already > > required. This is how <foreign> is meant to be used. And you would be able > > to tell what module a non-DITA element type belonged to, by inspecting the > > class attribute of its specialized container. > > > >> > >> But without that, you can't know by DITA-defined means that a target > >> document supports any given non-DITA elements, so your default behavior > > has > >> to be to protect such markup. > > > > Without that you don't have valid DITA, according to the spec. So you are > > protecting a broad range of behavior that is already disallowed. We don't > > allow people to randomly add markup to DITA and still call it DITA. There > > are procedures, to ensure interoperability and interchangeability. > > Specializing <foreign> is one of those procedures. > > > >> > >> We should definitely make the isomorphic relationship between content > > and > >> <object> within <foreign> clearer in the spec--given an explanation of > > how > >> the two are functionally equivalent, it becomes clearer that doing that > >> transform is one way to solve the unvalidatable foreign content problem. > >> > >> Note also that in XSLT 2 it is trivial to parse the content of a CDATA > >> marked section. That was not the case with XSLT 1. Since the Toolkit now > >> supports XSLT 2 we could, for example, implement the processing of > >> CDATA-encapsulated <foreign> content in the 1.5 Toolkit. > >> > >> In fact, now that I think about it, you could even have a complete XML > >> document with its own DOCTYPE decl in a CDATA marked section, e.g.: > >> > >> <foreign> > >> <![CDATA[ > >> <?xml version="1.0"?> > >> <!DOCTYPE mathml PUBLIC "whatever" "mathml.dtd"> > >> <mathml> > >> ... > >> </mathml> > >> ]]> > >> </foreign> > >> > >> The current standard doesn't disallow that (it couldn't) but it also > > doesn't > >> indicate that processors should handle that case as though the content > > were > >> a separate document entity. But that behavior is definitely implicit in > > the > >> "it can be replaced with an <object> element" statement. > >> > >> That is, if the content of <foreign> can be replaced by an <object> > > element > >> that references the content, it follows that an object element can be > >> replaced by a foreign element that contains the content referenced by > > the > >> <object> element. > > > > I'm not going to worry about this for 1.2. I just want to protect the > > behavior we defined as valid for 1.1. > > > >> > >> Cheers, > >> > >> Eliot > >> > >> -- > >> Eliot Kimber > >> Senior Solutions Architect > >> "Bringing Strategy, Content, and Technology Together" > >> Main: 610.631.6770 > >>www.reallysi.com> >>www.rsuitecms.com> >> > > -- > Eliot Kimber > Senior Solutions Architect > "Bringing Strategy, Content, and Technology Together" > Main: 610.631.6770 >www.reallysi.com>www.rsuitecms.com> > > --------------------------------------------------------------------- > To unsubscribe from this mail list, you must leave the OASIS TC that > generates this mail. Follow this link to all your TCs in OASIS at: >https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php>

Follow-Ups:
- Re: [dita] Foreign Generalization: Should be moved to a non-normativeappendix
  - From: Eliot Kimber <ekimber@reallysi.com>

References:
- Re: [dita] Foreign Generalization: Should be moved to a non-normativeappendix
  - From: Michael Priestley <mpriestl@ca.ibm.com>
- Re: [dita] Foreign Generalization: Should be moved to a non-normativeappendix
  - From: Eliot Kimber <ekimber@reallysi.com>