dita message

Subject: Re: [dita] Support for foreign content vocabularies such as MathML and SVG

From: Erik Hennum <ehennum@us.ibm.com>
To: Christopher Wong <cwong@idiominc.com>
Date: Thu, 14 Apr 2005 10:31:01 -0700

Hi, Chris:

Good points as usual. In response, I'd suggest that DITA should support three fundamental kinds of content:

Textual discourse -- The principal body elements of DITA 1.0 cover text intended primarily to be read by people. The base elements support the standard structure of rich text discourse (including sections, figures, lists, paragraphs blocks, phrases, and so on). You can specialize for more precise discourse structure and semantics.

Data values - The proposed <data> element for DITA 1.1 would cover values intended to be consumed primarily by automated processes, typically for metadata but also for hybrid form-like documents. For instance, you might need to provide additional metadata in the topic head or embedded within textual content as you suggested on the user group list. The default processing could skip the <data> element when formatting text but harvest the values for machine processable representations such as the HTML <meta> element, for RDF, and so on. You might nest data elements for structures and specialize for more precise semantics and constraints on structures and values including enumerations. Possibly, this <data> element might have attributes compatible with the RDF Annotation proposal (http://www.formsplayer.com/notes/rdf-a.html).

Unknown content - The proposed <unknown> element for DITA 1.1 would, in essence, cover inline objects -- a special kind of content with its own standard representation. The classic examples might include SVG, MathML, and XForms. The standard representation could be extended through specialization, if appropriate. The <unknown> element would need to have some mechanism for alternate textual content. The default processing for the <unknown> element might be to emit the alternate textual content. Specialized processing could override this default processing to provide a different representation of the unknown content for supported formats. For instance, a MathML specialization of the <unknown> element might provide override processing for XHTML that generates an external bitmap graphic and that emits an image tag that references the external graphic, populating the alt attribute of the image tag from the alternate textual content.

That said, I share your concern for potential abuse of the <unknown> element. People might be tempted to plug in existing textual markup by specializing from <unknown> instead of doing the work to specialize correctly from the appropriate rich text elements of the body. We should assert firmly that specializing textual discourse from <unknown> is just as much an abuse of the architecture as specializing markup from <p> that has no basic paragraph semantic. We can also point out that the <unknown> element has no standard processing (other than the alternate textual fallback, which would be standard DITA body elements and their specializations).

Finally, I agree that the <unknown> element doesn't address the attribute extension problem -- an important and difficult problem. The <data> element provides some relief there, though maybe not a complete solution. Anyway, <unknown> would give us the ability to incorporate standard existing vocaublaries for special content -- something that we need, too.

What do you think?

Erik Hennum
ehennum@us.ibm.com

Christopher Wong <cwong@idiominc.com> wrote on 04/14/2005 07:58:13 AM: > Any content? DITA's main limitation in specialization is the inability > to add new attributes. People will see the <unknown> mechanism as a way > to accomplish specializations that would need compromises otherwise. > What I fear is that this loophole becomes the de facto specialization > mechanism because it's easier, bypassing the conventional but more > restrictive mechanism, hindering interoperability. > > Chris > > Erik Hennum wrote: > > > * Support for foreign content vocabularies such as MathML and SVG > > > > Some content types have generally accepted vocabularies such as MathML > > (Mathematics Markup Language) and SVG (Scalable Vector Graphics). > > Instead of reinventing the wheel, DITA might allow optional use of > > these well-known vocabularies through specialization. That way, people > > who don't need them don't have to include them while people who do > > need them can have broad interoperability on the basis of the > > established vocabulary. > > > > The specific proposal is to introduce a DITA <unknown> element with a > > content model that allows any content. Specializers can then > > specialize the root elements of foreign vocabularies from this > > <unknown> element. The default behavior of the <unknown> element might > > be to process the content of any child <section> element. That way, > > the specializer can provide a specialized <section> element that > > provides textual content to be used in place of the foreign content, > > thereby preserving the intelligibility of the topic when sent to a > > DITA adopter who isn't using the foreign vocabulary. > >

Follow-Ups:
- Re: [dita] Support for foreign content vocabularies such as MathMLand SVG
  - From: Christopher Wong <cwong@idiominc.com>

References:
- Support for foreign content vocabularies such as MathML and SVG
  - From: Christopher Wong <cwong@idiominc.com>