[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [docbook-apps] Writing mode, xsl-fo output
On Fri, April 1, 2011 7:08 pm, maxwell wrote: > On Fri, 1 Apr 2011 10:40:16 -0700, "Bob Stayton" <bobs@sagehill.net> > wrote: >> But when you say "some rl-tb" text, do you mean a mixed language > document? >> In that case, the writing mode value should be for the dominant > language, >> since the document's writing mode determines the page layout.. >> Any inline translated text should get the >> correct text direction based on its Unicode character range. > > That last sentence--that the writing direction can be determined by > inspecting the characters--is a common intuition (it was once my own > intuition). But it isn't quite that simple, since some symmetrical > punctuation marks belong sometimes to L2R text, and sometimes to R2L text. The conventional approach is to implement the Unicode Bidirectional Algorithm [1] (or use a library that already implements it). It may not be perfect -- every so often you'll meet people who say it isn't good enough -- but since it's up to revision 23 so far, you'll see they're still trying to make it as perfect as possible. > For example, an ASCII period at the end of a run of R2L text might belong > at the left end of the R2L text, or--if the R2L text is at the end of an > L2R text--it might belong at the right end of the L2R text (and therefore > at the right end of the R2L text). The BIDI algorithm has rules about resolving direction among characters with strong, weak, and neutral directionality. > Unsymmetrical punctuation marks sometimes exist as distinct L2R and R2L > code points in Unicode, like the ASCII comma vs. the Arabic comma U+060C. > But Parentheses (which of course are asymmetrical) are also sometimes used > inside runs of R2L text--I've seen them in Urdu, for example. Here I > believe the ASCII open parenthesis is used as an Urdu close paren, and > vice > versa. If you're using the BIDI algorithm, you'd always enter the open parentheses as the '(' character even when it will be shown with its mirrored glyph ')'. See http://www.unicode.org/reports/tr9/#Mirroring > Space characters of course also fall into this category of ambiguous > direction, although that's generally handled correctly by algorithmic > methods. > > There's been considerable discussion of this general issue (whether it's > possible to algorithmically determine the ends of an R2L run inside an L2R > run, or vice versa) over on the XeTeX mailing list. The opinion of Those > Who Know seems to be that it is not 100% decidable. Which is why there's also characters for explicit overrides. XML and other markup languages count as "higher level protocols" for the purposes of the BIDI algorithm, and a 'dir' attribute or similar should be used instead of the override characters. See http://www.unicode.org/reports/tr20/#Bidi Regards, Tony Graham Mentea. [1] http://www.unicode.org/reports/tr9/
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]