dita message

Subject: RE: [dita] Proposed revision for the keyword definition (was Keywords inDITA)

From: Michael Priestley <mpriestl@ca.ibm.com>
To: "Paul Prescod" <paul.prescod@blastradius.com>
Date: Tue, 15 Mar 2005 09:50:03 -0500

Compare with <ph>, which also has almost no semantics, and yet is used extensively for specialization. In fact, most of the elements in base topic are pretty free of what I would call semantics - eg <ul>, <p>, <section> - these are structures, in my mind, rather than semantics. But <ph> is perhaps the clearest parallel since it also operates at a low level (word/phrase rather than block/section). With respect to using <ph> instead, <ph> is not available everywhere <keyword> is. The intent of the separation was to ensure that we could have places in the markup that would only allow word markup without nesting (ie allow keyword but not ph). One example of this is the keywords list in the prolog, whose purpose would be somewhat undermined if the list allowed complex nesting structures. It will be interesting to see how this plays out in future (post-1.0) discussion of the requirement to allow <keyword> to nest :-)

With regards to associative linking for <keyword> elements, we are now engaging in speculation about a function that hasn't been designed yet, but I don't see a problem with a closer binding than you seem to be implying: ie, rather than saying "all keywords bind to all documents that define them" I should be able to scope it by saying "keywords in this set of documents have accompanying explanations in this other set"; if others reuse my docs they can choose to reuse the explanations or not, but the consistency of linking would depend on the availability of explanations, whose binding would be determined by whoever is assembling the information set, and by default would not bind beyond its original scope. Again, a bit of a tangent.

With respect to your sound-bite Special processing" should always go along with "specialization".: This is counter to a basic principle of specialization, which is that processing behavior is inherited. It is also counter to the actual implementation of our core specializations at OASIS, and of specializations elsewhere, where much of the processing behavior is in fact inherited, and only a subset of specialized elements take advantage of override processing.

With respect to using outputclass instead of specializing: this again is counter to a basic principle of specialization. Lots of languages have class attributes that let you subclass an element; DITA is a language that has default superclasses instead, so you can use an actual new element as a subclass rather than doing subclassing in an attribute layer. Using actual elements allows us to enforce subclass rules and structures using standard DTD and Schema declarations, instead of inventing a whole new schema language just to talk about subclass constraints. That said, DITA does provide a way to provide some of the benefits of specialization as an interim step while a specialization is being developed, and can then be used as fodder to the migration process that moves content into the new specialization.

I think in a future version of DITA we will be looking at creating more general base class elements, which hopefully would allow us to clear up some of the controversy around keyword once and for all. In the meantime we are still stuck with a word-level generic element that has some inappropriate connotations for some of its intended uses.

Michael Priestley
mpriestl@ca.ibm.com

"Paul Prescod" <paul.prescod@blastradius.com>

03/15/2005 05:52 AM

To	Michael Priestley/Toronto/IBM@IBMCA
cc	<dita@lists.oasis-open.org>
Subject	RE: [dita] Proposed revision for the keyword definition (was Keywords in DITA)

You may be able to convince me but you haven't yet Michael. I think that your claims that "<keyword> has almost no semantics" and "<keyword> could be used for specialized processing" are somewhat at odds. We write standards so that we can exchange documents and get reliable behaviour. If you start linking unspecialized keywords to topics in your processing then when I send you my documents your process will do that to my documents as well. So we should either all decide that the <keyword> element implies a reference to a similarly-named topic or we should all decide it doesn't. If you want your keywords to have special behaviour then you should specialize.

Sound-bite: "Special processing" should always go along with "specialization". (as an aside, if specialization is too heavy-weight for some special processing then maybe we need ways of making it lighterweight. I've had some ideas about how to use outputclass as a lightweight way of specializing without changing DTDs).

As far as reuse, why not use <ph>?

Nevertheless, I could agree to the text below if you are not convinced by my argument.

From: Michael Priestley [mailto:mpriestl@ca.ibm.com]
Sent: Monday, March 14, 2005 7:05 PM
To: Paul Prescod
Cc: Dana Spradley; dita@lists.oasis-open.org; Don Day; Erik Hennum; JoAnn Hackos; Rob Frankland
Subject: RE: [dita] Proposed revision for the keyword definition (was Keywords in DITA)

One reason to use keyword in content is when a specialized element is not available, but some semantic significance is still there that may provide fodder for processing. For example, the source for the DITA language reference marks up XML element names with <keyword>. That info could be used to turn the keywords into links to their equivalent reference topics.

Another reason to use keyword is when you need reuse of a specific word or phrase, again for which a specialized element is not available. For example, it's a standard practice not to enter the product name directly in content, but reuse it from a common elements repository, so it can be updated it in one place when the product name changes.

I like your description, but would want to modify it to allow for some of these alternate uses. How about:

<keyword> represents a word or phrase with special significance in a particular domain. In the general case, <keyword> elements typically do not have any special semantics and processing associated with them, but can still be useful for organizing content for reuse or special processing. <keyword> specializations are more meaningful and are therefore preferable. <keyword> in the <keywords> element distinguishes a word or phrase that describes the content of a topic (a topic description keyword). Topic description keywords are typically used for searching, retrieval and classification purposes."

Michael Priestley
mpriestl@ca.ibm.com

References:
- RE: [dita] Proposed revision for the keyword definition (was Keywords in DITA)
  - From: "Paul Prescod" <paul.prescod@blastradius.com>