[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [dita] [dita-translation] TC/DITA/Translation Subcommittee Proposals
Gershon L Joseph wrote:
> 1. The Unicode standard defines a default direction for each language. For
> example, for English this default direction is LTR and for Hebrew it's RTL.
To expand on Gershon's explanation a little bit:
The directionality is actually defined for each character (not the
language, as Unicode doesn't deal directly with languages but rather
with scripts (that is, when Unicode talks about "Arabic" they mean the
script named Arabic, not the language Arabic, which happens to use the
Arabic script). For most languages there is a direct mapping to a script
(but not always, although those cases usually fall outside the set of
languages used for technical publications). Thus, while we usually
informally talk about "languages" when dealing with things at the
character level we usually really mean "scripts".
With respect to controlling (or not controlling) directionality, in
addition to the cases where you have to adjust the default
directionality is the case where you have characters that are naturally
paired and that, by default, are rendered to reflect the current
directionality, i.e., parens "(" and square brackets "[".
For example, given this source data:
<p>(arabic characters)</p>
The rendered result would be:
(sretcarahc cibara)
not:
)sretcarahc cibara(
However, when the parens contain or are adjacent to left-to-right
characters then things can get confused (and confusing). I've seen this
most when you have numbers or latin-script words embeded in
right-to-left text within parens, i.e.:
<p>arabic characters (XSL-FO) arabic characters</p>
In this case the directionality of the enclosing Arabic characters
causes the parens to be "flipped", giving results like this:
sretcarahc cibara )XSL-FO( sretcarahc cibara
or
sretcarahc cibara (XSL-FO( sretcarahc cibara
This can be addressed by using LRO or RLO characters or by using LRE or
RLE (left-to-right embedding and right-to-left embedding) characters in
the input data stream.
However, this is complicated by the fact that many tools do not
implement the Unicode bi-di rules correctly. In particular we found that
tools that rely on the Windows libraries for doing right-to-left
rendering give different results from tools (mostly FO implementations)
that implement the algorithm themselves (and presumably better).
However, the algorithm is difficult enough to understand that its hard
to prove that a given implementation is or isn't correct. So you are
often faced with just hacking the source data until you get the result
you want.
In my work with rendering Hebrew and Arabic technical documents where
the translators are creating the non-English text *and* they can't
modify the markupm their only option is to add the actual Unicode
characters to the data stream (which they can always do).
Cheers,
Eliot
--
W. Eliot Kimber
Professional Services
Innodata Isogen
9390 Research Blvd, #410
Austin, TX 78759
(512) 372-8841
ekimber@innodata-isogen.com
www.innodata-isogen.com
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]