[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [docbook] Why use DocBook when there is =?UTF-8?Q?HTML=3F?=
Mauritz Jeanson wrote: > Mark Pilgrim abandoned DocBook in favour of HTML... > What are your thoughts about this? We're sort of off in a little corner of our own, so our mileage probably differs from everyone else's. But we find DocBook indispensable. Our corner is doing literate programming of grammars of natural languages. This means embedding formal grammar fragments into a prose (descriptive) grammar; the fragments can be extracted and turned into a complete (morphological and phonological, not syntactic) grammar, which can in turn be converted into a morphological parser. The prose grammar acts as the documentation of the formal grammar. There are several factors that make DocBook seem like the right way to go. First, it's content markup, not formatting markup. To be sure, we've had to add some content markup tags (for interlinear text, but also the literate programming tags--we used Norm Walsh's extensions). But the use of content markup allows us to do things like extracting all the words in the target language (but not, for example, individual suffixes appearing in the text) to run them through the parser for purposes of verification. We can also extend the DocBook XML to embed an entire lexicon for test purposes (the lexicon would of course have its own internal tags). Second, the work we're doing (unlike Mark Pilgrim's Python book) is explicitly targeted at Forever. Grammars never get superseded: there's an entire industry of documenting and describing endangered languages, and of course there are useful grammars of languages which have been extinct for thousands of years. (Linguists never throw anything away :-).) So the content markup tags of DocBook XML provide what I believe is a better way for documentation which will be interpretable for the long term (hundreds or maybe even thousands of years). Third, some of the things we're doing are very messy to typeset. Our last grammar was of Urdu, which uses an almost calligraphic version of the Arabic script called Nasta'liq. Short of typesetting Mongolian vertically, I guess this is as far as you could get from ASCII. I don't think it would render well in HTML. To be honest, we didn't try to render it using the standard XSL-FO path either; we could only get what we wanted using XeTeX (a Unicode-aware version of LaTeX), for which our conversion process relies on an open source program called dblatex. The result is output as a PDF. Maybe there is a way to do the above in HTML, but when we were figuring out how to do it, we didn't run across such a method. I'll take this opportunity to say that one of the things that seems odd to me about DocBook is that it is targeted so explicitly at computer documentation. Many of its tags make no sense outside that context. So we have modified the schema not just by adding elements for linguistics and literate programming, but by removing many of the tags that are blatantly irrelevant. Computer documentation is the sort of thing that will, in most cases, go out of date soon; and for that purpose, maybe Pilgrim is right that HTML makes sense. But there are plenty of domains for which people write books that don't go out of date (ranging from poetry to archaeology), and for which DocBook might make more sense to people if it didn't seem so much like a geek's view of the world. My 2/100 of a dollar... Mike Maxwell CASL/ U MD
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]