OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

docbook message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: Future DocBook: goals/requirements?

Hash: SHA1

/ Michael Smith <smith@xml-doc.org> was heard to say:
| Could you say something about the main goals or requirements behind the
| changes you've in outlined in your 'Ruminations' articles ?

Here are some further thoughts on why I think now is the time to
refactor DocBook. Apologies, in advance, if some of these issues have
already been touched on in the thread. I haven't caught up yet, but I
had noticed that Michael asked this question so I wrote these thoughts
while I was disconnected on the plane ride home.

1. The single most compelling reason, the reason that I think would be
   sufficient if it was the only reason, is that DocBook has become
   brittle. It has grown, slowly and reasonably conservatively but
   continuously, for many years. Changes that were each individually
   small and well conceived form quite a tenuous pile when taken all
   together. Look at the number of class and mixture parameter
   entities we now have. Many are very similar but not the same. Can
   you tell from inspection why they aren't the same? Is the
   organizing principle that created them discernable? I don't think
   so. As the current maintainer, I'm aware that this is my fault to
   one degree or another.

   Whatever the cause, and irrespective of whether or not it was
   avoidable, we've reached the point where my software engineering
   experience suggests that attempts to continue on a path of
   accumulating patches is not practical.

2. DocBook was conceived, designed, and built within the limiting
   framework of SGML and then XML DTDs. In some ways it stands as a
   testament to just how much you could do with those technologies.
   But they are hardly modern.

   For a project as large and important (if one measures importance in
   terms of number of users or amount of legacy, at least) as DocBook,
   I think novelty for novelty's sake would be a very bad idea indeed.
   In fact, if all things were equal, I don't think it would be
   inappropriate for DocBook to lag behind the technology curve. It
   needs to be stable and reliable.

   But all things are not equal. I think we've passed a complexity
   threshold beyond which the parameter entity mechanisms available in
   DTDs are simply not up to the task of supporting further
   development. I am not, and have never intended to, suggest that
   DocBook shouldn't be available as a DTD for many years to come, I
   just don't think that the DTD should be the "source format", the
   format upon which further development and customization is based.

3. Engineering advances do not proceed smoothly and uniformly over
   time. Instead, they proceed in fits and starts, with watershed
   events spuring periods of rapid development. I think RELAX NG is a
   watershed event in markup languages.

   DocBook hasn't suddenly become unmanageable because we added one more
   tag. The development of DocBook has been straining the bounds of
   DTD development for some time. I have been thinking about how to
   make progress, about how to perform a refactoring (although I'm not
   sure I was consciously aware that that was what I was considering)
   for several years. The famous "PE reorganization" RFE has existed
   for at least five years. I've considered, and even prototyped,
   several possible approaches.

   RELAX NG is a watershed event because it changes the validation
   model just a little bit. It removes some restrictions and allows us
   to think about validation in a different way. Suddenly I see a
   clear path forward, a way to build a much simpler, more coherent,
   more easily customizable DocBook framework.

   Now, at the moment, I have only a vision, and a few sketchy
   prototypes. I don't have enough running code to be certain my ideas
   will work. But I feel pretty confident.

4. Tools exist (thank you again, James) that will allow us to continue
   to support existing tools and applications even as we move forward.
   If moving to RELAX NG required us to turn our back on every
   DTD-based XML tool that processes DocBook, the very idea of doing
   it would be very much D.O.A.

   My vision for the intermediate future is one where DocBook is
   maintained in RELAX NG and where customization layers (both
   extensions and subsets) are devised at the RELAX NG level. But DTDs
   are still provided by translating the RELAX NG grammars with Trang.

   It is likely to be the case that the DTDs will not validate
   precisely the same documents as the RELAX NG grammar. The extent to
   which there is variation will depend on part upon how we design
   DocBook, but I don't think perfect fidelity should be a goal.

   If perfect fidelity isn't possible, why bother? Because even a
   slightly less constrained schema can still be used to drive editing
   tools like Emacs and Epic. And it will allow all the existing
   DTD-based tools to continue to offer some level of validation.
   (They'll be able to find simple typos, for example, even if they
   can't enforce every constraint.)

5. DocBook needs to be able to adapt to a changing world. I've already
   found several occasions, for example, in which it would have been
   convenient for DocBook to have been in a namespace. I can imagine
   scenarious where it would be almost necessary. No matter what you
   think about namespaces, I think they're here to stay. I don't see
   any long term viability to an attitude of refusing to use them, at
   least judiciously.

6. I think similar arguments can be made for the judicious use of
   simple data types, although I'm by no means certain of that. I can
   imagine, for example, that there might be value in validating that
   the content of the <date> element is, in fact, a date. And
   even more potential value in being able to sort dates and other
   simple values "correctly".

7. I think DocBook is a world leader in its class. I think there's an
   opportunity here to continue that leadership role and I think we
   should take that opportunity. We should reinvent DocBook for the
   modern markup world.

   I don't think anything I'm suggesting is radical. I don't propose
   that we invent something that's going to be maliciously (or
   capriciously) incompatible with the current needs or even the
   current markup of existing users.

   It's just time to refactor. I think that's a natural part of the
   life cycle of an software system that's in the middle of its
   productive lifespan.

| Is the aim mainly to make the vocabulary easier to maintain, or is it to
| make it easier to use? Or just to bring some order and consistency to
| the content models?

Yes, yes, and yes.

| Looking at the classes of changes you outline in the articles
| (rationalizing inlines, normalizing metadata, discarding cruft,
| miscellaneous changes to simplify thing) and in your protoype, it seems
| like it's more of a "cleaning up" and not really anything like the kinds
| of more extensive refactorings that others have mentioned on the list
| (e.g., splitting DocBook into a 'core' set of elements + modules for
| different types of user needs).

I've argued[1] (hmm, for consistency I should put the text above in
the blog thing as well, will do) that multiple namespaces shouldn't be
used to make extension modules. But I'm not opposed, at least right
this moment, to making a smaller core with additional modules.

| That is, it's still one big schema of 300+ elements, with most of the
| attribute values on those elements being the same as what they are
| currently.
| And when you say that your prototype is three-quarters finished, what's
| the nature of the other one-quarter you'd do if you were to finish it?

One large part is making class/mixture equivalents for the block
elements. It's in rough shape for the inlines, but not so much for the

Basically, it's done enough to make me feel comfortable that the basic
ideas work. But there's gobs of T's to cross and I's to dot.

| You mention that the TC has talked many times about 'reworking the
| parameter entities', but your current prototype isn't meant to be a
| complete solution to that, right? In your Relax NG grammar, I see named
| patterns for classes of inlines, but none yet for classes of divisions/
| components/blocks -- and also not yet any definition-replacement hooks
| that would facilitate customization of the schema.

RELAX NG provides some of the customization facilities directly. But
you're right about the blocks.

                                        Be seeing you,

[1] http://norman.walsh.name/2003/06/11/oneNSorMany
- -- 
Norman Walsh <ndw@nwalsh.com>      | Mankind are always happy for
http://www.oasis-open.org/docbook/ | having been happy; so that if you
Chair, DocBook Technical Committee | make them happy now, you make them
                                   | happy twenty years hence by the
                                   | memory of it.--Sydney Smith
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.7 <http://mailcrypt.sourceforge.net/>


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]