OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

docbook message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]

Subject: Re: Fwd: Re: DOCBOOK: Invisible tech symbols

On Tue, Dec 10, 2002 at 07:10:01AM -0800, jonathon wrote:
> On Tue, 10 Dec 2002, Doug du Boulay wrote:
> ]>> The file is well-formed and valid, it produces nice .html and .pdf but
> ]>> some symbols ("forall" and "and") are invisible (pdf) or look like
> ]>> small squares (html). Why is it so and how can I fix it ?
> ]>
> ]>I dont know what the official answer is, but for html, what I do is
> ]>replace all the unknown character entities with inline gifs.
> 	a)	Change the character set to either UTF-8 or UTF-16.
> 	b)	Add a line to the head section indicating that fact.
> 	c)	Use SED or PERL to change all the docbook character
> 			entitites to UTF-8 or UTF-16 character entities.
> 	d)	Trust that your visitors are using browsers that understand
> 			XML, XHTML and CSS.
> 	That only works with HTML.  I can't help with the pdf stuff.

By default, the stylesheets use the HTML output encoding of
iso-8859-1.  The <meta> tag at the top of each HTML file
indicates the encoding.  In that encoding, your special
characters are not in the encoding range and so they come
out as numerical character entities:

&#8704; &#8743; &#8745; &#8734;

Many browsers support such numerical characters references
(IE5 for example), but apparently not all.

But the DocBook XSL stylesheets can output utf-8 instead of
iso-8859-1, without any post processing.  If you are using
the chunking HTML stylesheet, add the stylesheet parameter
"chunker.output.encoding=utf-8".  If you are doing single
HTML file output, you need to create a short customization
stylesheet that changes the encoding:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";

<xsl:import href="../docbook-xsl-1.58.1/html/docbook.xsl"/>
<xsl:output method="html" encoding="utf-8" indent="no"/>


Use this instead of html/docbook.xsl in your process.
Then when you look in your HTML output file, instead of
numerical character entities you will see unicode
characters (unreadable in many text editors), such as:

b~H~@ b~H' b~ H) b~H~^.

The <meta> tag at the top will indicate the utf-8 encoding.
Most browsers these days can display utf-8 encoded documents
if they have the <meta> tag telling them that the file
is utf-8.


Bob Stayton                                 400 Encinal Street
Publications Architect                      Santa Cruz, CA  95060
Technical Publications                      voice: (831) 427-7796
The SCO Group                               fax:   (831) 429-1887
                                            email: bobs@sco.com

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]

Powered by eList eXpress LLC