dita message

Subject: Re: [dita] When does DITA Document Type Not Meet Requirements?

From: Michael Priestley <mpriestl@ca.ibm.com>
To: "W. Eliot Kimber" <ekimber@innodata-isogen.com>
Date: Wed, 8 Dec 2004 11:49:20 -0500

Eliot writes:
>I think we're talking about two different things. > >I don't recognize "creating a new base class for the existing DITA tag >set" as a meaningful action.

If you are using specialization but not basing it off of the DITA base topic, you have a new base class. This is a very meaningful action. It's exactly what we did when we developed maps. >Either you are using DITA as defined or you're defining a purpose-built >document type for your own business process. That document type might be >"DITA based" or "DITA like" or "informed by DITA" but it is not DITA >because it can't be.
>
>That is, if you need a base class that DITA doesn't provide then you've >established *that you can't use DITA* in the sense that any documents >you create *cannot be interchanged* within a DITA context, by definition.
This is a side discussion - there are in fact things you can do to control and limit breakage, as has been discussed on the dita users list. But I want to focus back on your assertion that you cannot create new structures in DITA without an exact equivalent in the base.
>That is, DITA itself *cannot be extended* with new base classes--it's >not a meaningful thing to do within the context of what the DITA >specification allows you to do. The only thing you can do is specialize >from the DITA-provided base classes or strict the content rules, but you >can't add new base classes. Nor would you expect to be able to, since >DITA-level interchange depends entirely on a fixed set of base types >(just as any API depends on an invariant set of base classes and methods).
I think of sections, paragraphs, lists, tables, and phrases as being the most basic types you can create without becoming abstract. Certainly for DITA 1.0 we concentrated on getting the useful markup up and running - abstractions can follow.
>But at least for my clients, that's usually not a big deal because >interchange *outside the enterprise* at the DITA level is usually not a >requirement at all or is a very minor requirement relative to the >requirment to satisfy the local business requirements. Interchange >within the enterprise is of course in terms of the enterprise-specific >document type, so that interchange is not impaired.
Understood. Not the case for IBM, obviously, or anyone else that has business partners and OEM agreements.
>That's one of the points I'm trying to make: for most of my clients, >cross-enterprise interchange of DITA-based data is not a requirement, or >if it is, it has much lower weight than other requirements. Therefore, >solutions that impair the ability to directly interchange documents in a >DITA context *are not a problem*. If there is a requirement to >interchange data in DITA form that requirement can be satisfied by >providing a non-DITA-to-DITA transform, which be relatively simple, >especially if the custom DTD is closely based on DITA.

I understand that your customer requirements don't include content integration with third parties, although I honestly find it surprising if they are indeed enterprise businesses. We have this issue at IBM even within the company, since we're large enough to engage in multiple businesses that still end up installed at a single customer site.

>I have to stress: for my clients the primary value of DITA at the moment >is in the methodology of modularation and aggregation and the notion and >practice of specialization. But that methodology can be, and often must >be, applied to non-DITA applications for the simple reason that DITA >itself as currently defined cannot be made to satisfy the requirements.
That's where we disagree. I think you are limiting yourself in what you do with DITA out of an admirably principled but unpragmatic attachment to semantic purity.

> >But note that we can almost certainly refine the DITA architecture in >2.0 so that it can be used as the base for most, if not all, modular >technical documentation applications.
Agreed, we can and should make enhancements.
>> Would using plain nested ph's address your concern? Admittedly nested >> phrases are generic and without semantic meaning - but that's what >> specialization adds, after all. > >But "ph" has a semantic: phrase. If I use it to create something that is >not, fundamentally, a phrase, then I've created an invalid and >inappropriate specialization.

"phrase" has almost no discernible semantic whatsoever, and it doesn't bother me in the least to specialize from it.
> >For example, given a body of information, if I say "find all "ph" >elements" (that is, all elements based on "ph"), and I find something >that doesn't appear to be a phrase semantically, then I've broken my >system because I've encoded a lie into it.

Why would a user ever search on something as general as a "ph"? They might as well search in things that occur in paragraphs or lists - it has absolutely no semantic significance. This is an unrealistic example.
>That is, as far as I'm concerned, lying about the essential nature of >your data is wrong and you should never do it. Ever. Under any >circumstances.

1) I don't think it's lying
2) All values are relative. There are cases where it will make sense to break DITA, although I don't believe this is one, and there are cases where it will make sense to break semantics, although I don't believe this is one. >> Ultimately, longer term, we can probably come up with some generic >> ancestors for each level of content - like <block> as the ancestor for all >> tables, lists, paragraphs, and figures. > >Here we agree: what DITA needs is probably another layer of >generalization above the current topic level plus additional generic >elements in the current topic content models to provide more structural >flexibility and more generic bases for specialization.

I suspect we'll have much more fun working on DITA 2.0 :-)
> >At the same time, there are definitely additional specific types that >should probably be codified at the topic level so that there is >agreement and consistency on their use. I would consider having "see" >and "see-also" elements in the index item markup as a typical example: >we can probably agree that these are universal semantics that should be >done consistently across all DITA documents.

I'd be tempted to go the other direction, and factor out all the index term stuff as a domain specialization of keyword, so indexing can be easily removed by authoring groups that don't index (they do exist). >> But pragmatically, you can get >> what you want today with a specialization off of <ph> and an override. >> That's certainly both more cost-effective and more standards-friendly than >> duplicating the entire set of all existing markup every time you lack a >> more specific ancestor for that one new element. And when the generic >> <list> or <block> ancestor materializes, you can repoint your class >> attributes without affecting any document instances or customized >> processing you created in the meantime. > >This analysis would be correct *if* re-use of the existing DITA code >base was a key requirement or if DITA-based interchange was a key >requirement (and you didn't mind lying about your data). For my clients, >neither of these are requirements (and I refuse to lie), therefore I >will not create what I consider to be an inappropriate specialization of >DITA. If the either of these *were* top requirements then of course I >might consider it, although I would resist it.
1) I don't consider it lying to specialize off of elements as generic as <ph>
2) Do you accept that for many people on this Technical Committee DITA-based interchange and code reuse are a requirement?

I accept that you have performed a cost-benefit analysis for your clients, in which the cost (what you see as a degree of semantic purity) does not justify the benefits (interchangeability of content, and sharing of infrastructure costs).

But I might ask why they are looking at DITA at all - what benefits does a standard bring for them, if it's not content interchange and infrastructure reuse? That's why we all chose XML, after all - to ease content exchange and enable standard tools. And people use XML all the time, even with its quirky limitations (like the inability to specify the order of elements in a mixed content model).

Michael Priestley
mpriestl@ca.ibm.com

Follow-Ups:
- Re: [dita] When does DITA Document Type Not Meet Requirements?
  - From: "W. Eliot Kimber" <ekimber@innodata-isogen.com>

References:
- Re: [dita] When does DITA Document Type Not Meet Requirements?
  - From: "W. Eliot Kimber" <ekimber@innodata-isogen.com>