OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

docbook message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]

Subject: Re: DOCBOOK: On the size of DocBook...

[this is a long posting -- I didn't have time to make it more concise]

Norman Walsh <ndw@nwalsh.com> writes:

> The recent thread about DocBook and LaTeX raised the issue of the size
> of DocBook (measured as the number of elements). (It's not the first
> thread to raise the issue, just the most recent.)


> Whenever I think of adding new elements to DocBook, I think about
> these content models and wonder if it's really worth it. Now, in a
> sense, this is completely unfair. It's quite possible that the
> proposed element is just as valuable, to someone certainly and to
> everyone maybe, as, say "errorcode". The fact that errorcode got there
> first doesn't seem like a very satisfying criteria on which to choose
> between them.

I absolutely agree with this. I don't think the "there are already too
many elements" argument should prevent the TC from giving very careful
and objective consideration to adding elements that really should be in
there. Take the proposed url element for example. I'm sure not everybody
would agree that should be added, but I think Elliotte Rusty Harold has
stated some good reasons for adding it -- and adding it as a new
element, not as a new class value on another element.

[I know I've already said this, but I'll take the opportunity to trot the
hobbyhouse back out...]

I think we also need to be careful about trying to solve the "there are
already too many elements" issue just by adding new class values on
existing elements -- systemitem or whatever -- rather than adding them
as new elements.

It seems like adding new class values increases the complexity of the
DTD just as much, but does it in a way that obscures the complexity
more.  What I mean is, when that's done, it's still adding to the
overall number of "logical units" or "semantic components" in the DTD.
But it's just adding them in a way that makes those logical units:

  * less intuitive to users
  * less versatile (you can't sub-class attributes)


> A few things occur to me.
> 1. The difference between 400 elements and 800 elements isn't
> significant, just add 'em all.

Sort of a straw man, I think :-)

> 2. 400 is just too many, we need to make DocBook smaller.

A straw man with a little less straw? Given the backward compatibility
issues and user-community needs, this seems like the
least-likely-to-happen solution -- and maybe the least desirable.

> 3. Some sort of "pizza cutter" a la TEI could be invented to allow
> selection of "just the right" elements. (But what will that do to
> interchange?!)
> 4. Refactoring the parameter entity structure in a more satisfying way
> might make it easier to customize which would offer some sort of a
> compromise between 1 and 3.

Definitely not straw men. I think we ought to consider these carefully.

(For anybody who doesn't know what the TEI "pizza cutter" is: basically,
it's a sort of "configurator" that lets you choose sets of elements that
you want to include or not include in the DTD you use for authoring your
documents, and then generates a custom DTD that includes just the
element sets you want and excludes the rest.)

First, I think implementing 3 might actually require that we do 4. I'm
not sure a really useful pizza cutter would even be practical with the
current parameter entity organization, at least not the parameter entity
organization at the information-pool level. I think TEI was actually
designed around the specific requirement to include/exclude element-sets
at the information-pool level, and DocBook wasn't nearly as much.

But all that said, I wonder whether that kind of parameter-entity reorg
is possible and/or prudent. There's a paragraph in Eve Maler and Jeanne
El Andaloussi's "Developing SGML DTDs" that reads:

  Some DTD implementors choose to store declarations for individual
  element types (particularly those in the information pool) in separate
  modules, building up a so-called "element library" that can be
  recombined in different ways for different DTDs. However, in our
  experience the complex interdependencies between information pool
  elements are easier to understand and maintain if the entire
  information pool is stored in a single module, with marked sections
  used to "modularize" individual element types.

Anyway, about the question at the end of number 3 above -- But what will
that do to interchange? -- It seems like interchange isn't an issue if

  * the customized DTDs are strict subsets of the complete DTD

  * and users/user communities treat their customized DTDs as "authoring
    DTDs" and continue to use the full DTD for validation (that is,
    don't expect that DTDs that others interchange with their community
    will validate against their custom authoring-DTD subset)

Which makes me think of another possibility to add -- something that's
sort of already been discussed on this list:

5. The DocBook TC, with suggestions/feedback from the various DocBook
   users communities, produces a set of standard off-the-shelf
   strict-subset "authoring DTDs" designed to meet the needs of specific
   user groups (e.g., one for the "math markup" community, the "help
   authoring" community, the Java documentation community).

   And (sort of at the risk of stating the obvious), even if everybody
   used their own authoring DTDs, they could continue to use the same
   processing apps. For example, there would be no need for groups to
   use/import different sets of stylesheets as long as the DocBook
   XSL/DSSSL stylesheets continue to support "full" DocBook.  (That is,
   everyone could continue to build their stylsheet customization on top
   of the same support-for-full-DocBook XSL/DSSSL stylesheets.)

One of the values of having a set of standard strict-subset authoring
DTDs is that would be carefully considered by the TC, potentially a lot
more carefully than possibly-not-compatible-with-one-antoher ad-hoc
custom authoring DTDs that users from the same community might end up
creating and propagating and using.
What I mean is, I think maybe there are some identifiable DocBook user
sub-communities within which users have the same basic markup needs --
their needs within their community are not that radically different from
one another. If the TC doesn't produce a subset that meets their needs,
and that community is not well-organized enough to produce a suitable
custom authoring DTD on its own, we risk having individual users within
those communities producing conflicting, sub-optimal customizations.

My experience is that users and user communities -- especially those
that might be considered "casual document authors" (for example,
individual open-source developers who write docs for their own
applications) really, really, don't like to be told, "DocBook is highly
customizable -- go ahead and customize it to meet your needs".

It seems like what they want typically want instead is something that
"just works right off the shelf".

That's it for now. But I really hope we can continue the discussion
about this and maybe arrive at some resolutions.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]

Powered by eList eXpress LLC