docbook message

Subject: DOCBOOK: Re: doc domain vs. problem domain semantics
From: "Matt G." <matt_g_@hotmail.com>
To: docbook@lists.oasis-open.org
Date: Thu, 03 Jan 2002 06:07:27 +0000
>From: Norman Walsh <ndw@nwalsh.com>
>To: docbook@lists.oasis-open.org
>Subject: DOCBOOK: Re: doc domain vs. problem domain semantics
>          (Re[2]: listitem)
>Date: Mon, 31 Dec 2001 16:43:41 -0500
>
>/ "Matt G." <matt_g_@hotmail.com> was heard to whine:
>| As a matter of fact, I'd guess that more often than not,
>| variablelist is used to list things other than variables.
>
>Ah, the "variable" in variablelist isn't really for programming
>language variables. It's really a description list. In fact, if
>HTML had come first, we probably would have called it a
>descriptionlist.  As it is, I don't really recall the etymology.

Okay, that sure had me confused.  Why not transition to descriptionlist?


>| This gets the subject of
>| my message, and the tangent the thread is getting off to, which
>| is that since there aren't semantics rich enough to describe the
>| types of
>| formatting structures people use in documents, the more
>| domain-specific ones are fallen back upon, as a crutch.  This
>| has the effect of ruining the semantics of the domain-specific
>| markup, particularly if it's uses are mixed, within a single
>| document.
>
>I'm not sure I understand what you're trying to say.

Sorry, I should have been consistent and said "problem domain-specific".  I 
should clarify that I'm referring to 3 domains:
	problem domain: what the author is trying to describe (e.g.
	classes, types, commands, processes, pipes, message queues,
	streams, groups, RPC calls, sockets, daemons, character
	devices, etc.)

	document domain: the document constructs (I mean structural,
	as in paragraphs and tables, but the line between document
	structures and presentation is fuzzy, in places)

	application domain (or solution domain): generically, the
	means by which the problem is solved (e.g. typeface, font
	style and size, margins, pagebreaks, indentation, etc.)


The richness of the structure and information decrease, as you get more 
towards the application domain.  Clearly, DocBook is focused on the two 
former.  What I was saying is that "tag abuse", as you called it, 
effectively ruins the semantics of the abused tag.  So, a deficiency in 
problem domain semantics, w/o a suitable fallback in the document domain, 
leads to the potential abuse of another problem domain construct (which 
directly damages the most richly structured information that has the 
greatest potential longevity and utility).  I thought variablelist was an 
example of this.  On the other hand, if you have a document construct on 
which to fall back, your document may not be as richly structured as it 
could be, but at least it's not as destructive as tag abuse.


>But I will point out that there's a constant tension between
>general markup and specific markup. DocBook tries to achieve a
>good balance for computer software and hardware documentation. But

Of course.  As any experienced schema designer can attest, formalizing 
concepts can be difficult.  I think DocBook generally does a good job.


>The entire design of DocBook is geared to make it possible for
>you to write customization layers that provide the exact markup
>that you need.

Right, but do you think DocBook is rich enough to serve as an intermediate 
format for most types of publications, without resorting to tag abuse?  In 
other words, are its document domain semantics sufficiently rich to provide 
all the structural constructs most documents need?

If not, do you consider this goal to be realistic?  If you do, then how far 
off the mark do you consider DocBook to be?

Where would you draw the line; for what types of publications could the 
document domain semantics of DocBook (or a spiffed up version) be used, as 
an intermediate format?  Textbooks?  Newspapers?  Magazines?  (The latter 
two are really collections of documents, of course.)  How would you 
characterize the dichotomy between document structures that are (or would 
be) supported and those that aren't?  For example, it's true that some 
magazines are awfully layout-oriented, but if DocBook (or some derivative 
format) isn't suitable for authoring them, why not?  Where does the real 
problem lie?


> >>| More importantly (in the
> >>| short-term) it doesn't even appear to be nested, at all, in >>| the 
>DSSSL print style-sheets (version 1.74b - the latest).
>
>I've lost the beginning of this thread, what doesn't appear nested?

variablelists.  They don't nest properly, with DSSSL print style-sheets 
(version 1.74b), using the TeX backend & OpenJade v1.3.  I suppose I should 
whip up an example and submit a bug report.


>| So, is there really no desire to augment it to be better suited
>| for more general documentation tasks and more easily adaptable
>| to other sorts of problem domains than HW/SW?
>
>There are thousands of things that we could add that would
>ideally suit the needs of one community or another.  DocBook
>could be extended to provide structures suitable for medical
>publishing, for legal publishing, for automotive manufacturing
>publishing, etc. ad infinitum.

See, that's exactly *not* what I'm talking about.  I'm wondering how 
suitable of a *foundation* (for layering or augmentation) you think DocBook 
is or could be, so *others* could leverage much of the work done on DocBook 
and many of the existing (and future) tools.


>But I'm not sure that's the best approach. People often complain
>that DocBook is too big. Making it 10 times bigger is probably not
>a good idea.

I agree (with you and those people).  Though I disagree with the approach of 
Simplified DocBook (possibly because it's intended to solve some problems 
I'm not concerned with).  I think a more appropriate solution would be to 
partition the elements into a document domain group, and a number of 
different problem domain-specific groups (e.g. publishing meta, program 
sourcecode doc, program usage doc, hw/sw concepts, and misc.).  Put them in 
separate schemas, and maybe even namespaces.  Also, document them in 
separate groups.


...and now comes the really foolish thing:

>| IMO, the DocBook DTD (which, admittedly, I haven't really spent
>| much time dissecting) should be partitioned into document
>| construct and HW/SW constructs (in addition to the various other
>| classes of attribute and entity definitions).  Stylesheets, too.
>| This would make it easier for say a biotech publication or
>| physics department of a major university to use the core
>| documentation semantics as a foundation for their own
>| field-specific documentation vocabulary, without carrying extra
>| baggage or suffering with unnecessary name collisions with
>| semantics foreign to their domain.
>
>With respect, DocBook is designed *specifically* to make this
>possible.  Perhaps you ought to spend some time looking at it.

"with respect"?  I don't see why ;)  Okay, my apologies.  I sometimes get 
very idealistic and consumed with thinking about how things *should* be 
structured, while being a little slow to dig into the details (perpetually 
feeling like "I'm really too busy to spend much time on this, just now").


However, here's a suggestion: rather than simply structuring it that way, 
internally, why not do one or both of the following:
	* Document it that way, rather than just lumping all the
	  elements together
	* provide a release of the DTD and/or stylesheets without
	  any of the HW/SW-specific stuff.


>| Do you see that what I'm interested in is two things:
>| 1) Preserving the semantics of HW/SW-specific constructs, by
>|    providing suitable fall-backs
>| 2) Allowing DocBook to be more easily adapted to other domains,
>|    either through augmentation or as a richly structured
>|    intermediate format.
>
>I suppose. I think it would be helpful if you made some concrete
>proposals.

As for point #1, I can probably spend some time going through TDG w/ a 
fine-tooth comb and might come up w/ more examples like "variablelist".

On the topic of #2, I wish I could, but I'm no publishing guru, and that's 
the kind of person who I imagine could say whether "it's all there", or 
identify "what's missing", or decide that "DocBook is too far off the mark 
and there's not the will or resources among the TC to get there", or "it 
just doesn't make sense to carry all the legacy of DocBook, and it'd be 
easier to start from scratch".  To be honest, I'm definitely not that 
person, so I'm really just wondering aloud, and am interested in hearing 
your/others' views on this issue.


>| So, you don't have a tool to generate your dependencies
>| automatically, do you?  I'll soon whip one up, in Python.  I
>| probably won't bother to
>
>Generating dependencies for things like entities is easy. But the
>processing semantics of included fragments isn't self-evident so
>I'm not sure there's a way to make a tool for it.

Huh?  What do you mean by "included fragments"?  You mean like the 'fileref' 
attribute of <imagedata> instances?  That's an example of what I think it'd 
be nice to use a command-line XPath or XQuery tool to collect.  I'll 
probably just end up writing an XSLT script to do it, though (obviously, a 
separate means would be necessary to collect entity references, unless XSLT 
2.0 includes this info).

So, what are you saying has processing semantics such that it'd be unclear 
whether you'd want to rebuild the document, if it changed?


>| would support XSLT 1.1.  I also dearly wish it had a
>| command-line flag for specifying an SYSTEM id search path (for
>| external entities and DTD subsets), similar to the '-I' option
>| supported by most C/C++ compilers!!
>
>Check out the XML Catalogs specification and try using public
>identifiers.

I already use automatically generated catalog files to resolve the latest 
(theoretically) compatible DTD, for a given document.  But I don't think 
catalog files are an efficient way to manage entity resolution.  There's no 
way I want to be forced to maintain a separate list of locations for each 
entity I'm using in my document.  Furthermore, Catalog files' inability to 
provide more than one layer of indirection forces you to automatically 
generate them, for use within a directory tree that's under source control 
and used by multiple developers.  Finally, most(?) XSLT tools don't even 
support catalog files.

Why is OpenJade's '-D' (which works like '-I', for most C preprocessors) a 
bad way to go?  I think it's the best tradeoff between control, ease of use, 
and low maintenance burden, for my purposes.  I just wish Xalan supported 
it.


>                                        Be seeing you,
>                                          norm

Yeah, I appreciate your good humor about these things.  I do tend to ramble.


Matt Gruenke


_________________________________________________________________
Chat with friends online, try MSN Messenger: http://messenger.msn.com
Follow-Ups:
- DOCBOOK: Re: doc domain vs. problem domain semantics
  - From: Norman Walsh <ndw@nwalsh.com>