OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Getting the HTML encoding declaration i XML output

Hello. Back in 2009, Michael Leslie asked the list (but received no answer) the following: [1]

   «Does anyone have any experience generating UTF-8 XHTML
   that can be consistently rendered in both Firefox and IE?»

And, like him, I want to use Docbook to produce HTML-compatible XHTML. However, as a (former) member of the HTML working group (and co-editor of a spec for polyglot markup - that is: XHTML that is HTML as well), I can say that the question has (since) been answered by the HTML5.x specifications: HTML-compatible XHTML(5) documents MUST NOT include the XML declaration, and they MUST be UTF-8 encoded, and the encoding must be declared using either the HTTP header, the Byte-order mark or the HTML encoding declaration. The latter - the HTML encoding declaration - comes in two variants:

 1) <meta charset="UTF-8"/>
 2) <meta http-equiv="Content-Type" content="text/html;charset=UTF-8/">

Both works equally well in Web browsers, but occationally there are some fringe, legacy implementations that only support the http-equiv variant.

The Docbook XSL book does also try to explain encoding issue of HTML and XHTML.[2] See chapter on ’Special characters’ under the heading «HTML encoding». However, the book fails to nail the solution that HTML5.x specifies.

Further more, it is (probably) well known that when the output mode of Docbook XSL is set to 'xml', then, by default, the HTML encoding declaration is not included. As a result, browsing a Docbook XSL-generated XHTML-file as text/html fails (e.g. by adding .html instead .xhtml), Web browsers receive no encoding declaration from the HTML document itself.

Hence, I propose that in next version of Docbook XSL, you allow the HTML encoding declaration (both variants) to be used. In fact, it would be best if, by default, the HTML encoding declaration always is included.

To solve my own problem, I have created the following customization (that I use with XMLmind XML editor), see below. If there is better/more generic way to do it, I would be thankful for your help (for instance, I am not sure why, in my iplementation, I had to include the namespace declaration - I’m sure that could have been avoided - anyway, it is excluded in the final output so it does not matter.)

<?xml version="1.0" encoding="UTF-8"?>
<?stylesheet-label Polyglott! ?>
  <xsl:import href="docbook5-config:xsl/xhtml5/docbook.xsl"/>
  <xsl:output indent="yes" method="xml" omit-xml-declaration="yes"/>
     From Bob Stayton:
  <xsl:template name="root.attributes">
    <xsl:call-template name="xml.language.attribute"/>
    <xsl:call-template name="language.attribute"/>
     Leif: Add the HTML encoding declaration
  <xsl:template name="user.head.content">

Btw - and not to stamp on too many toes, but: I had a look at how the TEI xsl sheets works, and they seem to have taken care of the issue: They output their HTML as XML but without the XML declaration.[3] And they include the HTML encoding declaration in both their HTML outputs as well as their Epub3 output - which seems very wise.[4] I hope that Docbook XSL follows the same lead. In fact, to me Docbook XSL’s html output mode seems like a waste of time. Better to simply produce HTML-compatible XML output.

[1] https://lists.oasis-open.org/archives/docbook-apps/200902/msg00099.html
[2] http://www.sagehill.net/docbookxsl/SpecialChars.html
[3] http://www.tei-c.org/release/doc/tei-xsl/profiles/default/html/to0.html#bt_src_O_S_to.xsl
[4] http://www.tei-c.org/release/doc/tei-xsl/profiles/default/html/to4.html#bt_src_T_metaHTMLS_......htmlhtml_param.xsl
leif halvard silli

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]