[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: Future DocBook: goals/requirements?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 / Michael Smith <smith@xml-doc.org> was heard to say: | Could you say something about the main goals or requirements behind the | changes you've in outlined in your 'Ruminations' articles ? Here are some further thoughts on why I think now is the time to refactor DocBook. Apologies, in advance, if some of these issues have already been touched on in the thread. I haven't caught up yet, but I had noticed that Michael asked this question so I wrote these thoughts while I was disconnected on the plane ride home. 1. The single most compelling reason, the reason that I think would be sufficient if it was the only reason, is that DocBook has become brittle. It has grown, slowly and reasonably conservatively but continuously, for many years. Changes that were each individually small and well conceived form quite a tenuous pile when taken all together. Look at the number of class and mixture parameter entities we now have. Many are very similar but not the same. Can you tell from inspection why they aren't the same? Is the organizing principle that created them discernable? I don't think so. As the current maintainer, I'm aware that this is my fault to one degree or another. Whatever the cause, and irrespective of whether or not it was avoidable, we've reached the point where my software engineering experience suggests that attempts to continue on a path of accumulating patches is not practical. 2. DocBook was conceived, designed, and built within the limiting framework of SGML and then XML DTDs. In some ways it stands as a testament to just how much you could do with those technologies. But they are hardly modern. For a project as large and important (if one measures importance in terms of number of users or amount of legacy, at least) as DocBook, I think novelty for novelty's sake would be a very bad idea indeed. In fact, if all things were equal, I don't think it would be inappropriate for DocBook to lag behind the technology curve. It needs to be stable and reliable. But all things are not equal. I think we've passed a complexity threshold beyond which the parameter entity mechanisms available in DTDs are simply not up to the task of supporting further development. I am not, and have never intended to, suggest that DocBook shouldn't be available as a DTD for many years to come, I just don't think that the DTD should be the "source format", the format upon which further development and customization is based. 3. Engineering advances do not proceed smoothly and uniformly over time. Instead, they proceed in fits and starts, with watershed events spuring periods of rapid development. I think RELAX NG is a watershed event in markup languages. DocBook hasn't suddenly become unmanageable because we added one more tag. The development of DocBook has been straining the bounds of DTD development for some time. I have been thinking about how to make progress, about how to perform a refactoring (although I'm not sure I was consciously aware that that was what I was considering) for several years. The famous "PE reorganization" RFE has existed for at least five years. I've considered, and even prototyped, several possible approaches. RELAX NG is a watershed event because it changes the validation model just a little bit. It removes some restrictions and allows us to think about validation in a different way. Suddenly I see a clear path forward, a way to build a much simpler, more coherent, more easily customizable DocBook framework. Now, at the moment, I have only a vision, and a few sketchy prototypes. I don't have enough running code to be certain my ideas will work. But I feel pretty confident. 4. Tools exist (thank you again, James) that will allow us to continue to support existing tools and applications even as we move forward. If moving to RELAX NG required us to turn our back on every DTD-based XML tool that processes DocBook, the very idea of doing it would be very much D.O.A. My vision for the intermediate future is one where DocBook is maintained in RELAX NG and where customization layers (both extensions and subsets) are devised at the RELAX NG level. But DTDs are still provided by translating the RELAX NG grammars with Trang. It is likely to be the case that the DTDs will not validate precisely the same documents as the RELAX NG grammar. The extent to which there is variation will depend on part upon how we design DocBook, but I don't think perfect fidelity should be a goal. If perfect fidelity isn't possible, why bother? Because even a slightly less constrained schema can still be used to drive editing tools like Emacs and Epic. And it will allow all the existing DTD-based tools to continue to offer some level of validation. (They'll be able to find simple typos, for example, even if they can't enforce every constraint.) 5. DocBook needs to be able to adapt to a changing world. I've already found several occasions, for example, in which it would have been convenient for DocBook to have been in a namespace. I can imagine scenarious where it would be almost necessary. No matter what you think about namespaces, I think they're here to stay. I don't see any long term viability to an attitude of refusing to use them, at least judiciously. 6. I think similar arguments can be made for the judicious use of simple data types, although I'm by no means certain of that. I can imagine, for example, that there might be value in validating that the content of the <date> element is, in fact, a date. And even more potential value in being able to sort dates and other simple values "correctly". 7. I think DocBook is a world leader in its class. I think there's an opportunity here to continue that leadership role and I think we should take that opportunity. We should reinvent DocBook for the modern markup world. I don't think anything I'm suggesting is radical. I don't propose that we invent something that's going to be maliciously (or capriciously) incompatible with the current needs or even the current markup of existing users. It's just time to refactor. I think that's a natural part of the life cycle of an software system that's in the middle of its productive lifespan. | Is the aim mainly to make the vocabulary easier to maintain, or is it to | make it easier to use? Or just to bring some order and consistency to | the content models? Yes, yes, and yes. | Looking at the classes of changes you outline in the articles | (rationalizing inlines, normalizing metadata, discarding cruft, | miscellaneous changes to simplify thing) and in your protoype, it seems | like it's more of a "cleaning up" and not really anything like the kinds | of more extensive refactorings that others have mentioned on the list | (e.g., splitting DocBook into a 'core' set of elements + modules for | different types of user needs). I've argued[1] (hmm, for consistency I should put the text above in the blog thing as well, will do) that multiple namespaces shouldn't be used to make extension modules. But I'm not opposed, at least right this moment, to making a smaller core with additional modules. | That is, it's still one big schema of 300+ elements, with most of the | attribute values on those elements being the same as what they are | currently. | | And when you say that your prototype is three-quarters finished, what's | the nature of the other one-quarter you'd do if you were to finish it? One large part is making class/mixture equivalents for the block elements. It's in rough shape for the inlines, but not so much for the blocks. Basically, it's done enough to make me feel comfortable that the basic ideas work. But there's gobs of T's to cross and I's to dot. | You mention that the TC has talked many times about 'reworking the | parameter entities', but your current prototype isn't meant to be a | complete solution to that, right? In your Relax NG grammar, I see named | patterns for classes of inlines, but none yet for classes of divisions/ | components/blocks -- and also not yet any definition-replacement hooks | that would facilitate customization of the schema. RELAX NG provides some of the customization facilities directly. But you're right about the blocks. Be seeing you, norm [1] http://norman.walsh.name/2003/06/11/oneNSorMany - -- Norman Walsh <ndw@nwalsh.com> | Mankind are always happy for http://www.oasis-open.org/docbook/ | having been happy; so that if you Chair, DocBook Technical Committee | make them happy now, you make them | happy twenty years hence by the | memory of it.--Sydney Smith -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: Processed by Mailcrypt 3.5.7 <http://mailcrypt.sourceforge.net/> iD8DBQE+7jZzOyltUcwYWjsRAlDxAJ4qEc3FIg8OHRRovwdAWWry35zB1gCdEUzh ptG5aZKBE5gK68zExwFcAMs= =jdgi -----END PGP SIGNATURE-----
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]