OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [docbook-apps] Japanese Docbook/XML to PDF conversion trouble


Did you try:  

    <xsl:output encoding="UTF-8"> or
    <xsl:output encoding="Shift_JIS">

This has worked for me in going from UTF-8 encoded Japanese DocBook documents > via xsltproc > FO > via XSLCmd > PDF.

XSLCmd is a product of Antenna House: http://www.antennahouse.com/

I don't have any experience with dblatex.


Vincent Hennebert <vincent.hennebert@anyware-tech.com>

03/27/2007 09:07 PM

Re: [docbook-apps] Japanese Docbook/XML to PDF conversion trouble

Hi François,

Francois Gouget a écrit :
> Hi,
> I'm having trouble converting Japanese Docbook/XML documents to PDF. I
> have attached a very simple test document as an example (you can also
> download it from http://fgouget.free.fr/tmp/docbook/foo.xml in case it
> gets mangled in the mail).
> I have tried many toolchains but the best I got is a document where the
> Japanese text looks like this:
>    Japanese: &#12454;&#12451;&#12470;&#12540;&#12489;

You get that with dblatex, see below my comments.

> I have also checked with a third party PDF document to make sure my xpdf
> was able to display Japanese characters and it is. I suspect the problem
> is in the TeX to PDF conversion, perhaps something to do with missing
> fonts, but I have no idea how to fix it. If someone could put me on the
> right track I would be grateful.
> Here are the commands I have tested (using Debian testing):
>  * with dblatex 0.2-2
>    dblatex -o foo-dblatex.pdf foo.xml
>    See: http://fgouget.free.fr/tmp/docbook/foo-dblatex.pdf

The problem is that the XSLT processor is converting the UTF-8
characters of your source file into numeric entities, which left as is
would result into an invalid LaTeX file. dblatex assumes you want them
to appear on the final document, so it escapes them.
Either you find a way to convert those entities into the right LaTeX
commands; that would be an intermediate step between the XSLT processing
and the LaTeX run. Not that easy.
Otherwise you tell your XSLT processor to encode the output in UTF-8, so
that the Japanese characters appear as is instead of being converted
into entities. dblatex should already be configured to handle such
characters nicely.
In any case you should find help on dblatex-related mailing lists, where
I'm sure the problem has already been raised (and solved).

>  * with fop 0.20.5-8
>    $ xsltproc -o foo-fop1.fo
> /usr/share/xml/docbook/stylesheet/nwalsh/fo/docbook.xsl foo.xml
>    Making portrait pages on USletter paper (8.5inx11in)
>    $ fop -fo foo-fop-docbook.fo -pdf foo-fop-docbook.pdf
>    [INFO] Using org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser as
> SAX2 Parser
>    [INFO] FOP 0.20.5
>    [INFO] Using org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser as
> SAX2 Parser
>    [INFO] building formatting object tree
>    [INFO] setting up fonts
>    [ERROR] property - "background-position-horizontal" is not
> implemented yet.
>    [ERROR] property - "background-position-vertical" is not implemented
> yet.
>    [INFO] JAI support was not installed (read: not present at build
> time). Trying to use Jimi instead
>    Error creating background image: Error creating FopImage object
> (http://docbook.sourceforge.net/release/images/draft.png) : Jimi image
> library not available
>    [...lots more Jimi errors but that's ok...]
>    [ERROR] Unknown enumerated value for property 'relative-align': baseline
>    [ERROR] Error in relative-align property value 'baseline':
> org.apache.fop.fo.expr.PropertyException: No conversion defined
>    [...and a bunch more of these...]
>    [INFO] [1]
>    [INFO] Parsing of document complete, stopping renderer
>    But in the end I do get a PDF file, except the Japanese line looks
> like this:
>      Japanese: #####
>    See: http://fgouget.free.fr/tmp/docbook/foo-fop-docbook.fo
>         http://fgouget.free.fr/tmp/docbook/foo-fop-docbook.pdf

This is because you haven't configured FOP to use the rights fonts. It
will fall back to the default base-14 fonts for PDF, which don't contain
any glyphs for Japanese characters. Using any truetype font containing
Japanese glyphs should work, see here how to configure them:
If you have trouble you can get further help on the fop-users mailing list.
And BTW, you should really try the latest FOP 0.93 version which handles
DocBook documents much better...

>  * with db2latex 0.8pre1-5
>    $ xsltproc -o foo-db2latex.tex
> /usr/share/xml/docbook/stylesheet/db2latex/latex/docbook.xsl foo.xml
>    $ pdftex foo-db2latex.tex
>    This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)
>    entering extended mode
>    (./foo-db2latex.tex
>    ! Undefined control sequence.
>    l.7 \documentclass
>                      [pdftex,,a4paper,10pt,twoside,openright,]{report}
>    And I get no pdf.

Well, you better forget db2latex, which is no longer maintained and has
been superseeded by dblatex. Anyway you would face the same entities
issues as for dblatex.

> My LaTeX backend is TeTex:
> tetex-base                      3.0.dfsg.3-5
> tetex-bin                       3.0-30
> tetex-extra                     3.0.dfsg.3-5
> tex-common                      1.0.1
> Maybe I need to switch to another LaTeX package like jtex (how?) or
> Texlive?

The default LaTeX distribution from Debian should work.


To unsubscribe, e-mail: docbook-apps-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-help@lists.oasis-open.org


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]