dita message

Subject: Re: [dita] When does DITA Document Type Not Meet Requirements?
From: "W. Eliot Kimber" <ekimber@innodata-isogen.com>
To: DITA TC list <dita@lists.oasis-open.org>
Date: Wed, 08 Dec 2004 12:23:04 -0600
Michael Priestley wrote:

> Eliot writes:
> 
>> I think we're talking about two different things.
>> 
>> I don't recognize "creating a new base class for the existing DITA
>> tag set" as a meaningful action.
> 
> 
> If you are using specialization but not basing it off of the DITA
> base topic, you have a new base class. This is a very meaningful
> action. It's exactly what we did when we developed maps.

You're missing my point: of course you can create a new class, but when 
you do so, the result is not a DITA-defined document type, it's a local, 
task-specific document type that, by definition, cannot directly 
participate in DITA-based interchange.

That is, there is no normative sense in which such a document type is a
"conforming DITA application"

>> Either you are using DITA as defined or you're defining a
>> purpose-built document type for your own business process. That
>> document type might be "DITA based" or "DITA like" or "informed by
>> DITA" but it is not DITA because it can't be.
>> 
>> That is, if you need a base class that DITA doesn't provide then
>> you've established *that you can't use DITA* in the sense that any
>> documents you create *cannot be interchanged* within a DITA
>> context, by definition.
> 
> 
> This is a side discussion - there are in fact things you can do to
> control and limit breakage, as has been discussed on the dita users
> list. But I want to focus back on your assertion that you cannot
> create new structures in DITA without an exact equivalent in the
> base.

It's not a side discussion: it's the very heart of what we're dicussing: 
the current limitations in the concrete DITA document types force the 
creation of non-DITA-conforming document types in order to statisfy some 
requirements.

But I will point out that we are in agreement that this problem with the 
current DITA specification can be addressed in 2.0 by refining and 
extending the concrete DITA document types.

>> That is, DITA itself *cannot be extended* with new base
>> classes--it's not a meaningful thing to do within the context of
>> what the DITA specification allows you to do. The only thing you
>> can do is specialize
> 
>> from the DITA-provided base classes or strict the content rules,
>> but you
> 
>> can't add new base classes. Nor would you expect to be able to,
>> since DITA-level interchange depends entirely on a fixed set of
>> base types (just as any API depends on an invariant set of base
>> classes and
> 
> methods).
> 
> I think of sections, paragraphs, lists, tables, and phrases as being
> the most basic types you can create without becoming abstract.
> Certainly for DITA 1.0 we concentrated on getting the useful markup
> up and running - abstractions can follow.

Of course, but the side effect is that current DITA limits ones ability 
to specialize because it lacks some needed purely generic abstractions.

That's really all I've been trying to get across--there's some stuff I 
need from DITA that it currently does not provide.

>> But at least for my clients, that's usually not a big deal because
>>  interchange *outside the enterprise* at the DITA level is usually
>> not a requirement at all or is a very minor requirement relative to
>> the requirment to satisfy the local business requirements.
>> Interchange within the enterprise is of course in terms of the
>> enterprise-specific document type, so that interchange is not
>> impaired.
> 
> 
> Understood. Not the case for IBM, obviously, or anyone else that has
>  business partners and OEM agreements.

Again, I assert that you are significantly oversimplifying issues of 
cross-enterprise interchange. But clearly we have different experience 
so there's no point in arguing about it.

Of course the ideal is for DITA to be sufficiently flexible that one can 
have both DITA-based cross-enterprise interchange and satisfy local 
requirements using a pure DITA base. I think that is achievable in 2.0. 
I assert that it is not achievable in all cases today, especially for 
enterprises like my typical clients.

>> That's one of the points I'm trying to make: for most of my
>> clients, cross-enterprise interchange of DITA-based data is not a
>> requirement, or if it is, it has much lower weight than other
>> requirements. Therefore, solutions that impair the ability to
>> directly interchange documents in a DITA context *are not a
>> problem*. If there is a requirement to interchange data in DITA
>> form that requirement can be satisfied by providing a
>> non-DITA-to-DITA transform, which be relatively simple, especially
>> if the custom DTD is closely based on DITA.
> 
> 
> I understand that your customer requirements don't include content 
> integration with third parties, although I honestly find it
> surprising if they are indeed enterprise businesses. We have this
> issue at IBM even within the company, since we're large enough to
> engage in multiple businesses that still end up installed at a single
> customer site.

I didn't say they don't have that requirement (although my current 
clients in fact don't as they write all their own content) but that even 
when they do, those requirements have much lower weight than other 
requirements.

>> I have to stress: for my clients the primary value of DITA at the
>> moment is in the methodology of modularation and aggregation and
>> the notion and practice of specialization. But that methodology can
>> be, and often must be, applied to non-DITA applications for the
>> simple reason that DITA itself as currently defined cannot be made
>> to satisfy the requirements.
> 
> 
> That's where we disagree. I think you are limiting yourself in what
> you do with DITA out of an admirably principled but unpragmatic
> attachment to semantic purity.

Then we have a fundamental disagreement about what good practice is.

>>> Would using plain nested ph's address your concern? Admittedly
>>> nested phrases are generic and without semantic meaning - but
>>> that's what specialization adds, after all.
>> 
>> But "ph" has a semantic: phrase. If I use it to create something
>> that is not, fundamentally, a phrase, then I've created an invalid
>> and inappropriate specialization.
> 
> 
> "phrase" has almost no discernible semantic whatsoever, and it
> doesn't bother me in the least to specialize from it.

It does have a semantic that is clearly different from, for example, fig.

>> For example, given a body of information, if I say "find all "ph" 
>> elements" (that is, all elements based on "ph"), and I find
>> something that doesn't appear to be a phrase semantically, then
>> I've broken my system because I've encoded a lie into it.
> 
> 
> Why would a user ever search on something as general as a "ph"? They
> might as well search in things that occur in paragraphs or lists - it
> has absolutely no semantic significance. This is an unrealistic
> example.

I might want to examine all the phrase instances in my information set 
to ensure that the authors have marked things correctly or I might be 
doing some sort of automatic indexing or classifaction. The point is not 
whether it's likely, the point is that *if you do do it you will get a 
bad result if you have lied*.

But consider your earlier suggestion to specialize from "fig" for 
something that is clearly not a figure. It's much more likely that I 
might want to find all figures in my information set, and if I get back 
something that is clearly not semantically a figure, that's a problem.

The other point here is that one value of specialization is that it 
enables retrieval based on base types. It's not just about getting a 
particular formatting result, for example. That's another reason that I 
but less weight on re-use of rendering code--rendering is just one of 
many things that will be done with the data and, in many respects, one 
of the least interesting.  A large part of the real business value to 
sophisticated content management systems is that they enable 
sophisticated retrieval and classification. The DITA type hierarchy that 
a given element participates in is a key classification that I expect my 
repository to know and use. In that context, rendition systems and the 
code needed to implement them are almost an afterthought.

> I'd be tempted to go the other direction, and factor out all the
> index term stuff as a domain specialization of keyword, so indexing
> can be easily removed by authoring groups that don't index (they do
> exist).

But I would expect the DITA spec to then provide an "indexing domain" 
module. The point is that things that are clearly fundamental need to be 
codified at a fairly low level.

Also, since I would never have a customer author directly in the 
DITA-provided DTDs there is no issue about what types are allowed or not 
allowed. That is, for authoring I would always create a specialized DTD 
that is tailored to each group's needs. As part of that process you can 
include or exclude indexing as needed, for example.

I realize that some people author directly in DITA-provided DTDs, just 
as some people author directly in DocBook. I would never suggest to a 
client that they do either and would refuse to implement a system that 
did, for the simple fact that, because of their necessary generality, 
neither the DITA-defined DTDs nor DocBook are, out of the box, 
particularly well suited for authoring.

Given that in the types of systems I build you have to create a 
customizated, task-focused, business-process-specific authoring 
environment anyway it doesn't make sense to *not* specialize in order to 
optimize authoring as well. So I take it as a given that the authoring 
DTDs will always be specialized to some degree, even if it's only to 
leave out those things you don't need.

> But I might ask why they are looking at DITA at all - what benefits
> does a standard bring for them, if it's not content interchange and 
> infrastructure reuse? 

They're looking at DITA because there is fundamental in using a 
standard, so they have to ask the question, "does DITA satisfy my 
requirements?". In some cases it does, in other cases it doesn't. When 
it does then you get 100% of the value. When it doesn't you still get a 
lot of value because the standard serves to document and codify good 
practice, meaning that while I might need a custom document type I don't 
have to invent the whole thing from scratch--I can use DITA as a 
starting point and go from there, making an informed business decision 
about when to be strictly conforming and when not to be.

Understand that standards, especially application standards like DITA 
(rather than infrastructure standards like XML or XSLT), offer more than 
just content interchange and infrastructure re-use. They also offer 
design and knowledge re-use, irrespective of any interchange or code 
re-use that might happen.

When you hire a consultant to design and build a system, most of what 
you're paying for is design and knowledge. The implementation details 
are essentially a commodity that can be bought on a pure cost basis (do 
I hire programmers in India or China for this job?). But the knowledge 
is not a commodity *until* it is written down and codified. One way to 
that is to write a book or develop a trademarked process (Information 
Mapping comes to minde). Another way to do it is to develop an 
application standard, which makes the knowledge and the design available 
to the whole community.

The value to me, as a system designer and integrator, of a standard like 
DITA is largely that I can say to clients "here's an existing design and 
body of practice that reflects decades of technical documentation 
practice. You can be confident that the basic design and approach is 
fundamentally sound." The alternative is for me to say "I'm a really 
smart guy and I know what you need and I'll solve all your problems", 
which while it may or may not be true, is a much harder sell.

Part of what is driving this discussion is that anyone for whom DITA out 
of the box is clearly a good fit would never talk to me--they don't need 
to. So my clients are those that have already established that DITA (or 
DocBook or your favorite packaged application) won't entirely meet their 
needs. But they would much rather pay for a delta off of an established 
standard than a ground-up solution and I'd rather give them that because 
it offers my clients the best value--I can focus on just the parts of 
the problem that *aren't* solved, making it possible to produce a better 
result faster and at a lower overall cost to the client. It also means 
that I can get them to a point where even harder problems become the new 
unsolved problems.

Or said another way: I want to always be working on the hardest problem 
my clients have with respect to technical documentation authoring, 
management, and delivery. So the faster I get them past the basic 
infrastructure issues that standard like DITA addresses, the faster I 
can work on really interesting stuff.

Please understand: my frustration with DITA short term, in that it 
doesn't do everything I need it to do *today*. But I have every 
confidence that it will. If I didn't think DITA was useful I wouldn't be 
here.

   That's why we all chose XML, after all - to
> ease content exchange and enable standard tools. 

These are only some of the reasons people use XML and not necessarily 
the most compelling reasons for using XML for some users.

Cheers,

E.
-- 
W. Eliot Kimber
Professional Services
Innodata Isogen
9390 Research Blvd, #410
Austin, TX 78759
(512) 372-8122

eliot@innodata-isogen.com
www.innodata-isogen.com
References:
- Re: [dita] When does DITA Document Type Not Meet Requirements?
  - From: Michael Priestley <mpriestl@ca.ibm.com>