OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

dita message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: The value of re-use and interchange


In our discussion we've talked a lot about re-use and interchange.

I want to make sure we're being realistic about what the relative value 
is of different types of re-use and interchange because I think some of 
the values are being or may be overstated or overvalued. But that could 
just be my jaded view of the world.

As background: I've been working with SGML and XML for almost 20 years 
now in the context of industrial scale technical documentation 
authoring, management, production, and delivery. For all of that time my 
primary focus has been on satisfying the following requirements:

- Enable the creation, management, and delivery of large systems of 
interlinked documents

- Enable interchange of structured content horizontally across 
organizations and enterprises and vertically from the past to the future 
within a singe organization.

- Enable optimization of XML applications through enabling controlled 
local specialization to meet both local and global business requirements 
with the lowest overall cost (both implementation cost and opportunity 
cost).

In this time I've been exposed to, worked with or been involved with a 
number of industry groups trying to promote interchange of documents 
between suppliers and customers, among government entities, and so on. 
I've worked with companies trying to create bodies of modular, re-usable 
content in order to reduce authoring costs and reduce time-to-market for 
product documentation. I've worked with enterprises for whom link 
management is a key business driver. I've worked with enterprises for 
which localization of content is a key business driver (which is my 
current professional focus in the context of using XSL-FO to render 
documents in most of the world's modern national languages).

Out this experience I've learned:

- It has been almost impossible to realize any great benefit from 
cross-enterprise interchange of content. I think this is for several 
reasons:

   1. Early SGML systems were frightfully expensive and the 
infrastructure was not sufficiently mature. This has changed to some 
degree, but not entirely (for example, the Arbortext tools, which I 
consider to be essentially required for any large-scale, productive, 
technical documentation system, are priced to reflect that value).

   2. The SGML applications were not particularly well designed and 
tended to add unnecessary impedence to the information creation and 
management tasks. Unfortunately, I haven't seen much improvement here.

   3. The nature of any wide-scope application is that it will be 
suboptimal for most local requirements and provide few mechanisms for 
local, controlled, specialization. This is just a fact. The only hope at 
alleviating this is a controlled specialization mechanism such as that 
in DITA.

   4. There are always myriad practical details that tend to make it 
harder than it originally seemed. This is also unavoidable.

   5. The rate of technology change has increased so quickly that by the 
time you define and deploy a system that truly enables interchange the 
world has changed underneath you, at best eroding the value of the 
system, at worst making it obsolete or irrelevant. At the same, 
enterprises have, for good or ill, shifted toward a much narrower 
near-term focus, making it harder to fund and justify long-range 
projects that cannot be immediately justified by cost savings.

   Certainly interchange has been made to work in some cases but I think 
that the business value realized has been much lower than was originally 
promised or hoped. There's still a significant start up cost in time and 
money that is hard to get over.

- Doing content re-use is much harder than people usually think, for a 
number of reasons. One reason is that it makes authoring harder. Another 
is that it makes link management and versioning harder because of the 
dependencies. It also makes quality assurance harder. In essessence, the 
human issues of configuration management and communication among authors 
add significant cost and may (but not always) offset the value of 
re-use. Whether there is truly a benefit to modular re-use for a given 
business depends on many variables, including the nature of the things 
documented, the requirement for correctness (is this a mobile phone with 
a 2-year product life cycle or a commercial aircraft with a 50-year 
product life cycle and critical safety requirements?), the 
sophistication of the authors involved, and so on.

For example, consider the effort involved in creating a DITA map over a 
repository of several thousand content objects. Even with sophisticated 
authoring tools it's a significant conceptual challenge that many 
technical writers are simply not prepared for or willing to take on. 
This means that you likely have to hire, train, support, and retain 
highly skilled information developers to create and maintain your maps. 
Good for the skilled writers but an additional cost to the organization 
when you could have had less direct re-use but less skilled (but equally 
effective) writers. That is, sometimes the less sophisticated, brute 
force approach to document creation and management is the better 
business decision even though it's less elegant technologically.

- Link management in the context of modular information systems is a 
challenge that requires significant investment in information management 
infrastructure. There are, to date, no commercial tools that, in my 
opinion, satisfy this requirement, especially in the context of long 
product life cycles with lots of revision. I think I know *how* to solve 
this problem, and we (ISOGEN) have published our ideas and urged anyone 
who wants to to implement them, but to date nobody has (we did but for 
business reasons have been unable to market that code).

- Content interchange *within* enterprises is generally much more 
valuable than interchange *across* enterprises. That is, I can get a lot 
of value interchanging content between the product group, the training 
group, and sales group, but much less value interchanging between myself 
and my print engine supplier, for the simple reasons that the cost of 
enabling that cross-enterprise interchange is high, the actual volume of 
data interchanged is low, and the interchanged data will likely need 
local re-authoring anyway. In practice it's easier to do interchange via 
transformation than by standardization across enterprise boundaries 
except where volumes are high or there's some other non-typical 
requirement that demands standardization. Implementing transforms is 
relatively cheap relative to the cost of defining, implementing, and 
enforcing interchange standards.

- The cost of implementing production (rendering) systems is much much 
lower than the cost of creating and maintaining the data. That is, the 
cost of authoring and maintenance is high, and the value of having well 
structured data is high, but the cost of implementing transforms to do 
stuff with that data is low.  Now that we have technologies like XSLT, 
XSL-FO, SAX, and DOM and no shortage of people who can apply them well, 
transforms and the like are essentially commodities, no different from 
any other code you might have written. They also have relativley low 
long-term maintenance costs--we can reasonably expect that XSLT skills 
will be widely available 10 or 20 years from now.

Therefore, the value of being able to re-use existing code is relatively 
low, especially compared to the overall cost of a total information 
support system. So while using the DITA-provided XSLTs is useful for 
getting something working quickly, you'd be much less likely to depend 
on them for a production system because that system probably requires 
lots of things that the DITA-provided code simply wouldn't have, from 
conforming to your local engineering practices to providing 
business-specific functionality.

So while code re-use is always valuable, I find it's value relative to 
other values and costs to generally be non-compelling, simply because it 
tends to have diminishing value as a given system becomes more 
sophisticated and more specialized. The place where I find code re-use 
most valuable is in the implementation of core generic semantics, like 
link address resolution, transclusion resolution, and so on, all of 
which are (or can be) completely generic and independent of specific 
content semantics. For example, I've only ever written XPath resolution 
in XSLT once, but I've written templates to format chapters dozens of 
different ways.

If the question is "create an enterprise-specific document type or 
re-use the production tools for standard doctype X" it's not even an 
issue--the enterprise-specific document type wins every time because 
satisfying the enterprise's information capture and representation 
requirements are almost always the most important thing (and always are 
if the time scope of the system is anything more than a year or two).

So to summarize, I think that it is easy to oversell and overvalue the 
following:

- cross-enterprise standards-based interchange of content (that is, 
interchange in terms of a standardized document type, rather than by 
transformation).

- wide-scope re-use of content modules.

- re-use of existing code, especially for rendition

It is this experience and analysis that causes me to focus much more on 
the core infrastructure aspects of something like DITA, i.e., the 
specialization mechanism and the general shape of the base types, than 
on code of the moment or issues of cross-enterprise interchange.

Cheers,

Eliot
-- 
W. Eliot Kimber
Professional Services
Innodata Isogen
9390 Research Blvd, #410
Austin, TX 78759
(512) 372-8122

eliot@innodata-isogen.com
www.innodata-isogen.com



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]