[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [docbook] How to get a proper UTF-8 HTML with umlaut
Hi,
Futher I think the source is not read well. The mutation form ü to ü is maybe cause by reading the source as ascii not as UTF-8.
Indeed, the source is not being read correctly. It is being read not as ascii but as ISO-8859-1 (Latin1). The lowercase ü character is encoded in UTF-8 as hex sequence C3 BC. The C3 and BC characters do not exist in ASCII encoding, but they do in ISO-8859-1 as A-tilde and fraction 1/4. If the source is interpreted as ISO-8859-1, then that byte sequence would be interpreted as those two characters, not one. In the ISO named entities, C3 is à and BC is ¼ which is what you are seeing in your output for ü. (You can see these entity declarations in the DocBook 4.5 DTD distribution in the "ent" directory files.)
I'm not able to duplicate your output using xsltproc and any combination of encodings or xsltproc options. I did not think xsltproc could even output named entities like à but I could be wrong.
Something is going wrong with the parser reading your files. I would examine your xsltproc setup, try xsltproc on another system that is independent of the first, and try Saxon 6 as an alternative processor.
Bob Stayton Sagehill Enterprises bobs@sagehill.net -------------------------------------------------- From: <markus.sticker.epos@zf.com> Sent: Monday, June 03, 2013 7:12 AM To: <docbook@lists.oasis-open.org>Subject: AW: AW: AW: AW: AW: AW: [docbook] How to get a proper UTF-8 HTML with umlaut
Hi Markus, I have set this parameter before. (See the former mails) So there must be some switches for setting the entity translation of. As you can see all special characters are translated to HTML entities. Futher I think the source is not read well. The mutation form ü to ü is maybe cause by reading the source as ascii not as UTF-8. BR Markus -----Ursprüngliche Nachricht----- Von: Markus Hoenicka [mailto:markus.hoenicka@mhoenicka.de] Gesendet: Montag, 3. Juni 2013 15:52 An: docbook@lists.oasis-open.orgBetreff: Re: AW: AW: AW: AW: AW: [docbook] How to get a proper UTF-8 HTML with umlautAm 2013-06-03 15:39, schrieb markus.sticker.epos@zf.com:Hi Markus, This result is the same as in docbook 5 ... your output is ISO-8859-1 :-( That's the default in docbook BR MarkusI'm sorry, I was too quick with this test.I've now processed the document with the following command line, using chunked output and UTF-8 as you requested:xsltproc --output output/ --stringparam chunker.output.encoding UTF-8 /usr/share/sgml/docbook/xsl-stylesheets/html/chunk.xsl refdbtest.xmlThe result is the same for me, except that everything is UTF-8 now. The umlauts are there in the html source and they're displayed ok in a web browser. See attached html output.regards Markus -- Markus Hoenicka http://www.mhoenicka.de AQ score 38
--------------------------------------------------------------------- To unsubscribe, e-mail: docbook-unsubscribe@lists.oasis-open.orgFor additional commands, e-mail: docbook-help@lists.oasis-open.org
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]