docbook-apps message

Subject: Re: DOCBOOK-APPS: XEP for $80
From: Nikolai Grigoriev <grig@renderx.com>
To: jaccoud@petrobras.com.br, docbook-apps@lists.oasis-open.org
Date: Wed, 16 Oct 2002 19:56:06 +0400
Marcelo,

> I wrote a better Portuguese hyphenation file to use in FOP
> (should be shipping with 0.25), but it would not work in XEP
> because in it the hyphenation points cannot be prioritized
> as they can in TEX and FOP

I can't actually see the point. A good hyphenation file should mark
permitted break points, and disable breaks that are undesirable.
Do you mean that your patterns produce spurious hyphenations
that don't happen in TeX only because line-breaking algorithm
always finds a better alternative?

> In fact, since the whole idea of Liang's algorithm is based on this,

Liang's algorithm uses priorites in hyphenation patterns: these
priorities are certainly respected. XEP's hyphenator finds all
breaks permitted by Liang's algorithm plus additional constraints
from hyphenation-{push|remain}-character-count, and only those breaks.

What differs XEP from TeX is its line-breaking algorithm that triggers
hyphenation. We don't use global optimization - our approach considers
only single lines. It is in this point that we drop all priorities - all
hyphenation point permitted by Liang's patterns are considered
equivalent. But my impression is that FOP does the same (I may be wrong).

The phrase in the documentation that generated so much controversy
was meant to say exactly this. I realize that it is actually misstated:
line breaking is not Liang's part, and the algorithm of pattern processing
in XEP is exactly Liang's, with no omissions. Please accept my
apologies: we will correct the documentation in the next release.

Anyhow, XEP does not hyphenate until it actually has to: if it can get through
by slightly adjusting inter-character or inter-word spaces, it does. (You can
notice that XEP almost never hyphenates long lines; and if it has to, it tends
to split words in the middle). So, for an ordinary text, the penalty for
treating all hyphenation points as being equivalent is actually negligible. 
(I'd like to stress once again that we don't produce hyphenations that are 
not permitted by patterns).

> I can't understand how such a simplified algorithm could perform well,
> unless it discards most of the valid hyphenation points. Have you tested
> XEP's hyphenation algorithm with a bunch of hard-to-hyphenate words?
> (For example, several words with tricky combinations of consonants
> and vowels around? ).

Certainly, there are some words that don't hyphenate well (e.g. "names-pace"
for English :-)). You have to put them into \hyphenation {}.

> I had an extra job for being too
> purist in orthographic matters, but with XEP I couldn't even start, since
> we cannot correct or change its algorithm.

I do suggest that you try it yourself. There's too much rumor
around one unhappy phrase in the docs. If your pattern file
fails to produce hyphenation results that are as good as FOP's,
we will be glad to  investigate the reason, and find a solution.

Best regards,
Nikolai Grigoriev
RenderX
References:
- Re: DOCBOOK-APPS: XEP for $80
  - From: jaccoud@petrobras.com.br