docbook message

Subject: DOCBOOK: Re: doc domain vs. problem domain semantics
From: "Matt G." <matt_g_@hotmail.com>
To: ndw@nwalsh.com
Date: Mon, 11 Feb 2002 07:41:23 +0000
Sorry for letting so much time go by, before responding to this.


>From: Norman Walsh <ndw@nwalsh.com>
>To: "Matt G." <matt_g_@hotmail.com>
>CC: docbook@lists.oasis-open.org
>Subject: Re: doc domain vs. problem domain semantics
>Date: Thu, 03 Jan 2002 12:23:12 -0500
>
> >>I'm not sure I understand what you're trying to say.
>[...]
>| the two former.  What I was saying is that "tag abuse", as you
>| called it, effectively ruins the semantics of the abused tag.
>| So, a
>
>A few observations:
>
>1. The semantics in question are only "ruined" for the document(s) in
>which tag abuse occurs.

Or collection of documents with which the problematic document(s) are 
indistinguishably lumped.  If the tag abuse is widespread enough, something 
like a mere 10% of bad documents (under certain circumstances) can 
theoretically cause tools authors to to The Wrong Thing(tm), effectively 
ruining the semantics of the tag for all.  As I'm sure you're aware, entropy 
is a powerful force.


>2. You can use the role attribute to distinguish your "abusive" uses
>from "real" uses and thereby avoid ruining anything irreparably.

That's a good suggestion for the well-intentioned & enlightened abuser.  For 
others, since they're lazy/ignorant enough to resort to tag-abuse, in the 
first place, they're the most likely not to bother (or consider) using the 
"role" attribute.


>3. The DocBook Technical Committee (TC) is actively maintaining
>DocBook. If you have a construct for which there is no suitable tag,
>and the problem domain you are working in is not too far afield,
>chances are the TC will address the issue.

The problem is that this approach doesn't scale well.  That's what I'm 
trying to address.


>4. If you need a new element and either can't wait for the TC to
>consider it, are if the TC rejects your use case for some reason, you
>can always add it yourself.

I'd like to see people start maintaining sets of application-specific 
customizations + stylesheets, for DocBook.  Then, people could assemble 
packages, which include these customization modules and their associated 
stylesheet modules.  Of course, without namespaces, this approach won't 
scale very well.


> >>The entire design of DocBook is geared to make it possible for
> >>you to write customization layers that provide the exact markup
> >>that you need.

I see this as the core competency of the TC.  If they do a good job with 
document structure & meta info (and they have), then there can be 
customizations for dozens of fields of every sort.


>On the one hand, there are publications for which vastly more
>presentational information is required (layout-driven magazine
>publication, for example). I don't think DocBook should go there.

Actually, it seems like you could probably even use DocBook for 
layout-intensive publications, like magazines and newspapers.  You'd use 
DocBook for the article sources, then use a separate page layout tool, into 
which the text flows are imported.  You wouldn't put any images or maybe 
even sidebars in your DocBook source - all that would be relegated to your 
layout tool (the text for sidebars could be separate DocBook documents).


>If you really wanted to keep structure and presentation separate,
>and let's say you wanted to use DocBook for the structural part, my
>"off the top of my head" solution would be to design a new
>vocabulary for describing highly detailed presentational semantics
>and then point from that document back into the "semantic" DocBook
>document.

Right, and the layout tool I described could be used to produce files of 
that type.  Perhaps it could be even implemented on top of XSL, since you'd 
want some very declarative/general mechanisms for styling objects embedded 
in the text flows (e.g. headings, various inline elements, etc.).


>On the other hand, there are publications that have highly detailed
>semantic constructs that aren't used in computer software and
>hardware documentation.

Right.  And I believe that up to about 250 of DocBook's elements would be 
useful in such a context.  So, all the architecture work that went into 
creating those, as well as the associated stylesheet components, 
documentation, etc. could be leveraged for potentially tens or hundreds of 
other fields.


>cleanly into DocBook. I think if we tried to make DocBook the kitchen
>sink of semantic markup, we'd end up with 2000 elements and the whole
>enterprise would collapse under its own weight.

I think two things are happening.  DocBook is maturing, which is a good 
thing, because it's also reaching the limits of the ability of the TC to 
maintain and extend it.  Of course, that's a very uninformed opinion, so I 
could be completely wrong.  There's no doubt about the fact that it's 
getting big enough to intimidate new users.  Organizing documentation in a 
more logical and structured fashion could go a ways towards addressing this.


>My recommendation, if you want to use DocBook in another community,
>would be that you find a few other people in that community that
>share your interest and design the semantic constructs that you
>need. Then make a customization layer of DocBook that discards the
>things you don't need and add the things you do. (I'd be happy to
>participate, at least as an observer, in such a process.)

What would be great is to improve/update Ch. 5 of TDG to include more 
guidelines for designing a new module and architecting the associated 
stylesheet customizations (keep it high-level, assuming people know XSL) and 
documentation.

Also, maybe I'm missing where this is addressed, but Ch. 5 of TDG seems like 
it could use a section "Alternatives to Customizing DocBook", which could 
describe use of the 'role' attribute.


>| I agree (with you and those people).  Though I disagree with the
>| approach of Simplified DocBook (possibly because it's intended to
>| solve some problems I'm not concerned with).  I think a more
>| appropriate solution would be to partition the elements into a
>| document domain group, and a number of different problem
>| domain-specific groups (e.g. publishing meta, program sourcecode
>| doc, program usage doc, hw/sw concepts, and misc.).  Put them in
>| separate schemas, and maybe even namespaces.  Also, document them
>| in separate groups.
>
>Looking at this pragmatically, I observe that what you're suggesting
>would be *a lot* of work and it wouldn't directly benefit DocBook's
>principal community in any direct way.

I disagree.  For one thing, software often is written to solve problems in a 
domain other than computer hardware/software.  Making it easier for people 
to add one or more customization modules specific to other fields should be 
seen as being in line with the TC's goals.  But, you do have a point.  The 
biggest advantages of this effort would be felt slightly further outside of 
the TC's purview.

It would also benefit the core DocBook user community, by virtue of the fact 
that the overall user community could quickly grow by an order of magnitude, 
or more.  This translates into better tools, better support, and better 
documentation for all users.


>That isn't a good reason not to do it, but it does mean that I want
>to wait until there's at least one other community that would
>directly benefit from this exercise.

Are you certain that's not already the case?


>| However, here's a suggestion: rather than simply structuring it
>| that way, internally, why not do one or both of the following:
>| 	* Document it that way, rather than just lumping all the
>| 	  elements together
>| 	* provide a release of the DTD and/or stylesheets without
>| 	  any of the HW/SW-specific stuff.
>
>I tell you what. If you take the list of elements in DocBook and
>divide them into those two groups: foundational and HW/SW-specific,
>post your division to the list, and see if there's any disagreement,
>and if we (the readers and posters on the list) can reach a mutual
>understanding of where the dividing line is, I'll consider it.
>
>I think you'll find 100 elements in the former catagory, 100 in the
>latter, and about 100 that no one can agree on.

Those elements could always be duplicated.  I think 100 is a bit much, 
though.  Nearly all the elements seem to fit in some distinct category or 
another.  I'll send my list in a follow-up message.


>| Huh?  What do you mean by "included fragments"?  You mean like the
>| 'fileref' attribute of <imagedata> instances?  That's an example
>| of what I think it'd be nice to use a command-line XPath or XQuery
>| tool to collect.  I'll probably just end up writing an XSLT script
>| to do it, though (obviously, a separate means would be necessary
>| to collect entity references, unless XSLT 2.0 includes this info).
>
>I often use tools to extract bits of files or preprocess files to
>produce something I can include in my document. For example, this
>Makefile rule extracts a fragment of addrbook-old.xml and produces
>address.1 which I include in my source document.
>
>   address.1: addrbook-old.xml
>              xinclude -d -x "/*/address[1]" $< $@
>
>A tool that notices that mydoc.xml depends on address.1 isn't very
>useful (IMHO). And I can't think of any way to encapsulate the rule
>above in my document for an automatic tool to extract.

Actually, this is a perfect example of what I was talking about.  You have a 
naming convention such that 'address.<n>' is the nth entry of 
addrbook-old.xml.  So, you might rewrite your rule as:

    address.%: addrbook-old.xml
            xinclude -d -x "/*address[$*]" $< $@

I think that syntax might be specific to GNU Make, but the '$*' expands to 
the text that the '%' matched.

Anyhow, you have a tool that parses mydoc.xdbk into a makefile fragment that 
gets included in your makefile, so it knows that mydoc.xdbk depends on 
address.1, address.17, address.473, and address.94371, all of which the 
pattern rule tells it how to build.  This is the tool I still haven't gotten 
around to writing.


>| resolution.  There's no way I want to be forced to maintain a
>| separate list of locations for each entity I'm using in my
>| document.
>
>If -I would find it for you, why do you have to maintain it by hand?
>
>Actually, I think I need an example, I'm not sure what you're looking
>for.

That's a good point.  I think I'll write a tool that takes '-I' options and 
searches for the PUBLIC identifiers of all the external entities, and 
generates a catalog file that contains the mapping.  I know it's not what 
PUBLIC identifiers were intended for, but I am unable to put explicit 
relative or absolute paths in the entity definitions, since 
automatically-generated external entities might be generated in different 
directories and at different depths, depending on certain build options.


Matt Gruenke


_________________________________________________________________
Join the world’s largest e-mail service with MSN Hotmail. 
http://www.hotmail.com
Follow-Ups:
- Re: DOCBOOK: Re: doc domain vs. problem domain semantics
  - From: Bob Stayton <bobs@caldera.com>