OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

docbook message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [docbook] How to get a proper UTF-8 HTML with umlaut


Hi,

Futher I think the source is not read well.
The mutation form ü to ü is maybe cause by
reading the source as ascii not as UTF-8.

Indeed, the source is not being read correctly. It is being read not as ascii but as ISO-8859-1 (Latin1). The lowercase ü character is encoded in UTF-8 as hex sequence C3 BC. The C3 and BC characters do not exist in ASCII encoding, but they do in ISO-8859-1 as A-tilde and fraction 1/4. If the source is interpreted as ISO-8859-1, then that byte sequence would be interpreted as those two characters, not one. In the ISO named entities, C3 is à and BC is ¼ which is what you are seeing in your output for ü. (You can see these entity declarations in the DocBook 4.5 DTD distribution in the "ent" directory files.)

I'm not able to duplicate your output using xsltproc and any combination of encodings or xsltproc options. I did not think xsltproc could even output named entities like à but I could be wrong.

Something is going wrong with the parser reading your files. I would examine your xsltproc setup, try xsltproc on another system that is independent of the first, and try Saxon 6 as an alternative processor.

Bob Stayton
Sagehill Enterprises
bobs@sagehill.net

--------------------------------------------------
From: <markus.sticker.epos@zf.com>
Sent: Monday, June 03, 2013 7:12 AM
To: <docbook@lists.oasis-open.org>
Subject: AW: AW: AW: AW: AW: AW: [docbook] How to get a proper UTF-8 HTML with umlaut

Hi Markus,

I have set this parameter before.
(See the former mails)

So there must be some switches for
setting the entity translation of.

As you can see all special characters are
translated to HTML entities.

Futher I think the source is not read well.
The mutation form ü to ü is maybe cause by
reading the source as ascii not as UTF-8.

BR
Markus





-----Ursprüngliche Nachricht-----
Von: Markus Hoenicka [mailto:markus.hoenicka@mhoenicka.de]
Gesendet: Montag, 3. Juni 2013 15:52
An: docbook@lists.oasis-open.org
Betreff: Re: AW: AW: AW: AW: AW: [docbook] How to get a proper UTF-8 HTML with umlaut

Am 2013-06-03 15:39, schrieb markus.sticker.epos@zf.com:
Hi Markus,

This result is the same as in docbook 5 ... your output is ISO-8859-1
:-( That's the default in docbook

BR
Markus


I'm sorry, I was too quick with this test.

I've now processed the document with the following command line, using chunked output and UTF-8 as you requested:

xsltproc --output output/ --stringparam chunker.output.encoding UTF-8 /usr/share/sgml/docbook/xsl-stylesheets/html/chunk.xsl refdbtest.xml

The result is the same for me, except that everything is UTF-8 now. The umlauts are there in the html source and they're displayed ok in a web browser. See attached html output.

regards
Markus

--
Markus Hoenicka
http://www.mhoenicka.de
AQ score 38





---------------------------------------------------------------------
To unsubscribe, e-mail: docbook-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: docbook-help@lists.oasis-open.org



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]