[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: DOCBOOK: objection to docbook.dcl
<!-- Resent because the first try didn't get through --> Adam Di Carlo wrote at 21 Mar 2001 -0500: > Shipped with the DocBook DTDs from 2.4.1 and up is 'docbook.dcl', an > SGML declaration for use with DocBook documents. However, this > declartion is unnecessarily restrictive, to the level where it is > rather cumbersome to implement. This is a wonderful piece of mail! Somewhere, in some archive, is a piece of mail dated several years ago from Eve Maler explaining the changes that she'd made from the previous DocBook SGML Declaration to arrive at the current Declaration. The changes would have ranged from prudent for the time to some, like NAMELEN 45, that would have pushed the limit of what you could do with SGML systems of the time. Since then, of course, XML has standardised the removal of many of the petty restrictions that we had to put up with (be thankful that you don't need to worry about SGML's Capacities), Unicode has become commonplace, and SP has become the default SGML parser by being both better and cheaper than its now moribund competition (although I still wish that SP had implemented CONCUR). This mail, then, reflects the current reality whereas the SGML Declaration that it rails against pushed the envelope of the SGML systems of its time. > My argument is that the DocBook declaration should diverge from the SP > (and OpenSP) implied declarations only where the divergance expresses > a real necessity to diverge. This is based on the principle that The DocBook SGML Declaration predates current SP implementations. I'm a bit hazy about whether sgmls or nsgmls was current when the first version of the current DocBook SGML Declaration was released, but DocBook itself is definitely older than nsgmls. I am certain, however, that the DocBook SGML Declaration predates Unicode and multi-byte support being built into the standard SP distribution. > software (including SGML parsers) should be tolerant of what they > accept. The unnecessarily broad divergance of the shipped Docbook > declaration puts a burden on document engineers using DocBook. ISO 8879 allows an SGML Declaration to be provided as part of the document's prolog but, in the absence of a provided SGML Declaration, an SGML parser can infer its own. Parsers have to support the Reference Concrete Syntax (RCS), but a concrete syntax only covers about half the things that you can declare in an SGML Declaration, and you really don't want to restrict yourself to the RCS since, among other things, it has NAMELEN 8. (Parsers need to support the RCS since an SGML Declaration conforms to the RCS, including the NAMELEN restriction.) A concrete syntax doesn't cover, for example, the CHARSET description or the OMITTAG parameter. There is a complete SGML Declaration provided in ISO 8879 that you could consider as THE standard SGML Declaration, but many people have complained about that using much the same terms as you use to complain about the DocBook SGML Declaration. When I taught a tutorial on the SGML Declaration, one of the first exercises was parsing a document sans SGML Declaration using multiple SGML parsers to see what surprises you got because different parsers infer different SGML Declarations. SP actually has a very permissive inferred SGML Declaration, which of course is why you expect that every SGML Declaration should be as permissive. > I am considering here only the DocBook SGML DTD, since I presume the > Declaration is rather irrelevant for XML files, since all XML files > have the same XML declaration applied to them. > > I consider here 'docbook.dcl' as shipped with DocBook 4.1. > > Major problems: > > OMITTAG is turned off (why?) The conventional wisdom is or was that different SGML parsers were likely to infer different combinations of tags is you left off too many. In fact, in the bad old days, I had one project where I had to use a specific parser (not sgmls or nsgmls) because that parser would infer the tags that I wanted and sgmls/nsgmls would just complain. > NAMELEN is too short It was permissive for the time. > Document Character set is too restrictive Allowing more than ASCII was permissive for the time, and Unicode wasn't even on many people's radar at the time. Even getting the right CHARSET identifier was a black art for a time since different parsers recognised different CHARSET identifiers. > SUBDOC is turned off (why?) Because not every SGML parser supported it and because the conventional wisdom was that parsing another DTD for each SUBDOC was an enormous overhead. > > Description: > > * OMITTAG is turned off > > 'OMITTAG' is turned off in 'docbook.dcl', disallowing markup > minimization of any sort. This is on in the implied declaration of > both Jade and OpenJade. This creates problems because documents using > the default declaration for their parser will have a valid document, > but if the user decides to be more fasidious and user the docbook SGML > declaration, sudden their document will not be valid. > > The major problem is that trying to turn this on will make a large > number of existing SGML DocBook instances invalid. There's always "spam" from the SP distribution for normalising SGML documents. > * NAMELEN is too short > > The NAMELEN quantity set in docbook.dcl is set to 45, rather than the > default SP NAMELEN of 99999999. > > A number of users have complained of problems due to this limitation > (do a google search on 'docbook namelen' to see what I mean) in any > cases (such as the SUSE Linux distribution) where the declaration is > enforced. > > Quoting <URL:http://xml.coverpages.org/wlw14.html>: > > Care should be used when changing these since creating a variant > syntax may make it difficult for some SGML systems to process > documents created with that syntax. The best means of guaranteeing > portability between different SGML systems and applications is to > use the reference concrete syntax as much as possible. > > One wonders why we need to diverge from the reference concrete syntax > here. Be careful what you wish for. > > * Document Character set it too restrictive > > As an example, to workaround limitations in the support of KOI-R SDATA > entities in Jade and OpenJade, KOI-R users have to use unicode > entities. With the docbook.dcl file, these entities are disallowed, > although they are perfectly valid with the implied SP declaration. > Example of being disallowed: > > jade:/usr/share/sgml/entities/sgml-iso-entities-8879.1986/ISOcyr1.ent:1:16:E: \ > "1072" is not a character number in the document character set There's another workaround for KOI-R in my now-dated paper at [1]. Using Unicode wasn't an option even for SP at the time that the current DocBook SGML Declaration was created. Now, of course, it is more of an option. > * SUBDOC is turned off > > Why is it necessary to disallow SUBDOC in DocBook SGML documents? > Seems like some authors may wish to use this, even if its not fully > supported by existing stylesheets. The problem at the time was stylesheets, since there wasn't a standard stylesheet (perhaps except for an ArborText stylesheet), but non-support among SGML parsers. Regards, Tony Graham ------------------------------------------------------------------------ Tony Graham mailto:tony.graham@ireland.sun.com Sun Microsystems Ireland Ltd Phone: +353 1 8199708 Hamilton House, East Point Business Park, Dublin 3 x(70)19708 [1] http://www.mulberrytech.com/papers/docchar.htm
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC