[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [docbook-apps] Writing mode, xsl-fo output
Hi Mike, Yes, my answer was the brief version, and you are right, is not that simple. In my experience with formatting Hebrew and Arabic texts, I have found that Antenna House and XEP handle the Unicode direction and mixing of punctuation quite well *except* with inline text that mixes direction, particularly when an English phrase or word is mixed in with a rtl text and there is punctuation at the boundary between directions. In those cases where the formatter does not automatically get it right, you sometimes have to force the correct order. The following custom template worked well in those cases, allowing the author to put the English text and its punctuation inside <phrase @lang="en"> so the stylesheet could apply the template: <xsl:template match="phrase[@lang = 'en']"> <fo:bidi-override language="en" unicode-bidi="embed" direction="ltr"> <xsl:apply-templates/> </fo:bidi-override> </xsl:template> This template forces whatever is in the phrase element to be formatted ltr, regardless of the Unicode range. Bob Stayton Sagehill Enterprises bobs@sagehill.net ----- Original Message ----- From: "maxwell" <maxwell@umiacs.umd.edu> To: "Bob Stayton" <bobs@sagehill.net> Cc: "Dave Pawson" <davep@dpawson.co.uk>; <docbook-apps@lists.oasis-open.org> Sent: Friday, April 01, 2011 11:08 AM Subject: Re: [docbook-apps] Writing mode, xsl-fo output On Fri, 1 Apr 2011 10:40:16 -0700, "Bob Stayton" <bobs@sagehill.net> wrote: > But when you say "some rl-tb" text, do you mean a mixed language document? > In that case, the writing mode value should be for the dominant language, > since the document's writing mode determines the page layout.. > Any inline translated text should get the > correct text direction based on its Unicode character range. That last sentence--that the writing direction can be determined by inspecting the characters--is a common intuition (it was once my own intuition). But it isn't quite that simple, since some symmetrical punctuation marks belong sometimes to L2R text, and sometimes to R2L text. For example, an ASCII period at the end of a run of R2L text might belong at the left end of the R2L text, or--if the R2L text is at the end of an L2R text--it might belong at the right end of the L2R text (and therefore at the right end of the R2L text). Unsymmetrical punctuation marks sometimes exist as distinct L2R and R2L code points in Unicode, like the ASCII comma vs. the Arabic comma U+060C. But Parentheses (which of course are asymmetrical) are also sometimes used inside runs of R2L text--I've seen them in Urdu, for example. Here I believe the ASCII open parenthesis is used as an Urdu close paren, and vice versa. Space characters of course also fall into this category of ambiguous direction, although that's generally handled correctly by algorithmic methods. There's been considerable discussion of this general issue (whether it's possible to algorithmically determine the ends of an R2L run inside an L2R run, or vice versa) over on the XeTeX mailing list. The opinion of Those Who Know seems to be that it is not 100% decidable. Mike Maxwell --------------------------------------------------------------------- To unsubscribe, e-mail: docbook-apps-unsubscribe@lists.oasis-open.org For additional commands, e-mail: docbook-apps-help@lists.oasis-open.org
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]