docbook-apps message

Subject: Re: [docbook-apps] Writing mode, xsl-fo output

From: "Bob Stayton" <bobs@sagehill.net>
To: "maxwell" <maxwell@umiacs.umd.edu>
Date: Fri, 1 Apr 2011 16:51:19 -0700

Hi Mike,
Yes, my answer was the brief version, and you are right, is not that simple.  In my 
experience with formatting Hebrew and Arabic texts, I have found that Antenna House 
and XEP handle the Unicode direction and mixing of punctuation quite well *except* 
with inline text that mixes direction, particularly when an English phrase or word is 
mixed in with a rtl text and there is punctuation at the boundary between directions. 
In those cases where the formatter does not automatically get it right, you sometimes 
have to force the correct order.  The following custom template worked well in those 
cases, allowing the author to put the English text and its punctuation inside <phrase 
@lang="en"> so the stylesheet could apply the template:

<xsl:template match="phrase[@lang = 'en']">
  <fo:bidi-override language="en"
                    unicode-bidi="embed"
                    direction="ltr">
    <xsl:apply-templates/>
  </fo:bidi-override>
</xsl:template>

This template forces whatever is in the phrase element to be formatted ltr, regardless 
of the Unicode range.

Bob Stayton
Sagehill Enterprises
bobs@sagehill.net

----- Original Message ----- 
From: "maxwell" <maxwell@umiacs.umd.edu>
To: "Bob Stayton" <bobs@sagehill.net>
Cc: "Dave Pawson" <davep@dpawson.co.uk>; <docbook-apps@lists.oasis-open.org>
Sent: Friday, April 01, 2011 11:08 AM
Subject: Re: [docbook-apps] Writing mode, xsl-fo output

On Fri, 1 Apr 2011 10:40:16 -0700, "Bob Stayton" <bobs@sagehill.net>
wrote:
> But when you say "some rl-tb" text, do you mean a mixed language
document?
> In that case, the writing mode value should be for the dominant
language,
> since the document's writing mode determines the page layout..
> Any inline translated text should get the
> correct text direction based on its Unicode character range.

That last sentence--that the writing direction can be determined by
inspecting the characters--is a common intuition (it was once my own
intuition).  But it isn't quite that simple, since some symmetrical
punctuation marks belong sometimes to L2R text, and sometimes to R2L text.
For example, an ASCII period at the end of a run of R2L text might belong
at the left end of the R2L text, or--if the R2L text is at the end of an
L2R text--it might belong at the right end of the L2R text (and therefore
at the right end of the R2L text).

Unsymmetrical punctuation marks sometimes exist as distinct L2R and R2L
code points in Unicode, like the ASCII comma vs. the Arabic comma U+060C.
But Parentheses (which of course are asymmetrical) are also sometimes used
inside runs of R2L text--I've seen them in Urdu, for example.  Here I
believe the ASCII open parenthesis is used as an Urdu close paren, and vice
versa.

Space characters of course also fall into this category of ambiguous
direction, although that's generally handled correctly by algorithmic
methods.

There's been considerable discussion of this general issue (whether it's
possible to algorithmically determine the ends of an R2L run inside an L2R
run, or vice versa) over on the XeTeX mailing list.  The opinion of Those
Who Know seems to be that it is not 100% decidable.

   Mike Maxwell

---------------------------------------------------------------------
To unsubscribe, e-mail: docbook-apps-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-help@lists.oasis-open.org

Follow-Ups:
- Re: [docbook-apps] Writing mode, xsl-fo output
  - From: Dave Pawson <davep@dpawson.co.uk>

References:
- Writing mode, xsl-fo output
  - From: Dave Pawson <davep@dpawson.co.uk>
- Re: [docbook-apps] Writing mode, xsl-fo output
  - From: "Bob Stayton" <bobs@sagehill.net>
- Re: [docbook-apps] Writing mode, xsl-fo output
  - From: maxwell <maxwell@umiacs.umd.edu>