docbook message

Subject: Re: [docbook] [db5] HTML tables vs. CALS tables
From: "Christian Roth" <roth@infinity-loop.de>
To: "DocBook Mailing List" <docbook@lists.oasis-open.org>
Date: Wed, 20 Sep 2006 02:03:50 +0200
Bob -

thank you for the detailed explanation.

>But they are not actually HTML tables, which would allow HTML elements 
>inside the content models of the td and th elements.

My view is different here. For me, the term "HTML tables" only pertains
to the elements building the table structure, not the content of cells.
This is similar to "CALS tables", where entry elements can contain
various DocBook-specific elements.

>Rather, they are 
>DocBook tables that borrow HTML element names. The content models of td and 
>th in the DocBook schemas are DocBook elements, not HTML elements.   So you 
>cannot cut and paste an HTML table into a DocBook document and have it 
>validate as DocBook because it would most likely contain in its table cells 
>some HTML elements that are not declared in DocBook.

The content model for HTML table cells (i.e., th and td) is actually a
parameter entity to be re-defined. Otherwise, modularizing XHTML would
not be very useful when the aim is to be able to re-use some of the
modules when you need that specific semantics in your own DTD.

This is the route we've taken with our own DTD: Since DTDs do not
support namespaces, we hard-coded the table element's namespace prefix
to "html" and added a FIXED "xmlns:html" namespace declaration to the
document root. This way, any processor (like an XSLT engine) can
immediately determine whether it's an HTML table or a CALS table element
it currently looks at (our DTD also supports using both models, also
simultaneously) and can act accordingly without examining the element
environment.

Here's the relevant code we use in our own DTD (I'm not sure whether
it's bullet proof or even 100% correct, but it seems to have worked fine
for several years now for users so far):

<!-- 
  ..........................................
  Include the XHTML Tables Module.
  ..........................................
-->

<!ENTITY % XHTML.xmlns.attrib "xmlns:html CDATA #FIXED 'http://
www.w3.org/HTML/1998/html4'"> 

<!ENTITY % xhtmldatatypes PUBLIC '-//W3C//ENTITIES XHTML Datatypes 1.0//
EN' 'http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-datatypes-1.mod'>
<!ENTITY % xhtmlattribs PUBLIC '-//W3C//ENTITIES XHTML Common Attributes
1.0//EN' 'http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-attribs-1.mod'>
<!ENTITY % XHTML.global.attrs.prefixed "IGNORE"> <!-- we do not want the
global attributes modification -->

<!ENTITY % XHTML.xmlns.attrib.prefixed "" >
%xhtmldatatypes;  <!-- instantiate -->
%xhtmlattribs;    <!-- instantiate -->

<!ENTITY % htmltables PUBLIC '-//W3C//ELEMENTS XHTML Tables 1.0//EN'
'http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-table-1.mod'>
<!--
  We redefine the qualified names to have 'html' namespace prefix.
  So, we declare qualified element type names:
-->
<!ENTITY % table.qname  "html:table" >
<!ENTITY % caption.qname  "html:caption" >
<!ENTITY % thead.qname  "html:thead" >
<!ENTITY % tfoot.qname  "html:tfoot" >
<!ENTITY % tbody.qname  "html:tbody" >
<!ENTITY % colgroup.qname  "html:colgroup" >
<!ENTITY % col.qname  "html:col" >
<!ENTITY % tr.qname  "html:tr" >
<!ENTITY % th.qname  "html:th" >
<!ENTITY % td.qname  "html:td" >

<!ENTITY % td.content "(%cell.mdl;)*" ><!-- use our own content model
"cell.mdl" here to allow our own upCast DTD elements -->
<!ENTITY % th.content "(%cell.mdl;)*" ><!-- see above -->
<!ENTITY % Flow.mix "" >               <!-- never actually used -->

%htmltables;     <!-- finally, instantiate -->


I see that DocBook 5 is Relax NG based, so the above is most probably
impossible to do (especially since the XHTML Table Module is only
available as DTD, AFAIK). Also I don't know Relax NG enough to decide
whether it would even technically allow a similar mechanism of
overriding content models of imported modules' leaf elements.

However, I still think it would really be useful to have the "borrowed"
HTML table elements in the (X)HTML namespace and not in the DocBook
namespace - if feasible - for the following reasons:

1. Clear semantics. The element brings the info with it, like e.g. "I am
an HTML tbody element and work as described for those elements." Any
documentation for DocBook can rely on what is already described for the
HTML tbody element. Software tools know which semantics its attributes
will have.

2. Immediate semantics. E.g. XSLT applications do not need to guess the
semantics of an element (like tbody or table) from ancestor, sibling or
descendant elements' existence.

3. Intended usage. My understanding is that it was actually the intent
of the modularization effort of XHTML for the elements within one module
to be used on their own. In this case, meaning there's no requirement
for the td content model to allow *all* of the other XHTML elements.

Of course, I don't know what the backward compatibility requirements for
DocBook 5 are, so putting the HTML table elements into the HTML
namespace when they weren't in there in DocBook 4 or earlier may just be
impossible. 

Best regards
Christian Roth
Follow-Ups:
- Re: [docbook] [db5] HTML tables vs. CALS tables
  - From: "Chris Chiasson" <chris@chiasson.name>
References:
- [db5] HTML tables vs. CALS tables
  - From: "Christian Roth" <roth@infinity-loop.de>
- Re: [docbook] [db5] HTML tables vs. CALS tables
  - From: "Bob Stayton" <bobs@sagehill.net>