OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

dita-translation message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: RE: [dita-translation] Draft proposal for dir attribute

Or it may have been me :)

The most obvious area where taking direction from the lang attribute breaks
down is with punctuation in numbers. For example, when lang="he", if you
type the following characters in the order shown:
The standard Unicode rules for Hebrew will result in:
Since Hebrew direction is RTL, the full stop (period) follows the text that
was typed before it; but with numbers the result is not what the author
expects! There are many such issues, this is the most obvious one.

Most technical manuals written in Hebrew contain English words or phrases.
If the English phrase appears at the beginning or end of the sentence, again
the result is often not what the author intends. Here's an example where the
author types:
<p xml:lang="he">HebrewText HebrewText <ph
If the phrase within the <p> wraps, what often happens is the Hebrew text
and the full stop appear on the first line, with the English text on the
second line. The full stop is not kept with the English phrase.

However if we specify:
<p xml:lang="he" dir="rtl">HebrewText HebrewText <ph
Then the full stop will correctly follow the English text phrase, even when
it wraps onto a new line.

You could argue that the latter example is a problem with
authoring/publishing tools, but it's encountered in most tools that support
Middle Eastern languages -- they simply have no idea what the user's means
without additional directionality specified.

Does this help clarify the issue?

Best Regards,

-----Original Message-----
From: Farwell, Kevin [mailto:Kevin.Farwell@lionbridge.com] 
Sent: Saturday, March 04, 2006 1:37 AM
To: Robert D Anderson
Cc: bhertz@sdl.com; Bryan Schnabel; Charles Pau; Lieske, Christian; Dave A
Schell; dita-translation@lists.oasis-open.org; dpooley@sdl.com; Felix
Sasaki; gershon@tech-tav.com; Richard Ishida; Jennifer Linton;
mambrose@sdl.com; patrickk@scriptware.nl; pcarey@lexmark.com; Reynolds,
Peter; rfletcher@sdl.com; Munshi, Sukumar;
tony.jewtushenko@productinnovator.com; Yves Savourel
Subject: RE: [dita-translation] Draft proposal for dir attribute


I think that might have been me talking about the tags. I was actually
saying that indicating direction in a wrapper was not enough, but the
language tag has the same limitation. 

Overriding defaults is indeed rare, but the number of weak or neutral
characters is higher than one might think, and it always seems like the
number of times they show up in a file is impossibly high. Those are the
ones that cause all the problems by bending to the will of the bullies
around them.

The approach we've taken is sort of what you describe. We have an automated
process that goes through and finds the problem characters, looks at what's
before or after them, and drops in direction codes according to what we
believe to be the right thing in that case. It works pretty well, but it
still needs refining. My belief is that there should be no case that can't
be solved, but we achieved that yet. It's also my belief that we haven't
encountered all the cases yet, and may never, so this will never be totally

I'd like to answer the call for examples, and I've put out the word to a
colleague that happens to be preparing some Arabic text today, but it being
Friday afternoon I think getting anything by Monday morning is not likely. I
can say generally phone numbers, paired punctuation marks, and cases where
neutral punctuation abuts directional text, like an English word directly
after a Hebrew word that has parentheses around it or a dash between a
letter and a number are the kinds of things that break.
If concrete examples of these surface in the next couple of hours, I'll send
them along.


-----Original Message-----
From: Robert D Anderson [mailto:robander@us.ibm.com]
Sent: Friday, March 03, 2006 11:23 AM
To: Farwell, Kevin
Cc: bhertz@sdl.com; Bryan Schnabel; Charles Pau; Lieske, Christian; Dave A
Schell; dita-translation@lists.oasis-open.org; dpooley@sdl.com; Felix
Sasaki; gershon@tech-tav.com; Richard Ishida; Jennifer Linton;
mambrose@sdl.com; patrickk@scriptware.nl; pcarey@lexmark.com; Reynolds,
Peter; rfletcher@sdl.com; Munshi, Sukumar;
tony.jewtushenko@productinnovator.com; Yves Savourel
Subject: RE: [dita-translation] Draft proposal for dir attribute

On the call this Monday, somebody mentioned that there are cases where
xml:lang is not enough to determine the direction of the text. In those
cases, my current understanding is that setting the attribute means
something like "The contents of this element do not follow the default
language rules for the currently specified xml:lang attribute." Thus, the
only time anybody needs to set the value is when they need to override the
current default, which is probably somewhat rare. Is that correct? In those
rare cases, is there any way we could identify the change and automatically
set the direction on output? My guess is no, otherwise the display
mechanisms would already display them correctly.

If I am correct about why this is used, would it be possible for someone to
give an example of where this override behavior is most often used? I think
examples were given on Monday relating to numbers or punctuation, but I am
not sure. Like Kevin, some of us were a little confused by the by the Hebrew
example, because both values showed up the same in our browsers.
Perhaps this is just an indication that browsers do not yet actually support
the dir attribute.

I would like to call for examples of use cases (particularly any that show
up often) to help us determine if this is critical for DITA 1.1, or whether
we can spend more time considering it for 1.2.


Robert D Anderson
Authoring Tools Development
Chief Architect, DITA Open Toolkit
(507) 253-8787, T/L 553-8787


             "Farwell, Kevin"


                                       Robert D

             03/03/2006 02:47          Anderson/Rochester/IBM@IBMUS,

             AM                        <gershon@tech-tav.com>

                                       <bhertz@sdl.com>, "Bryan Schnabel"  
                                       <bryan.s.schnabel@tek.com>, Charles 
                                       Pau/Cambridge/IBM@Lotus, "Lieske,


                                       <christian.lieske@sap.com>, Dave A  

                                       org>, <dpooley@sdl.com>, "Felix

                                       Sasaki" <fsasaki@w3.org>, "Richard  
                                       Ishida" <ishida@w3.org>, "Jennifer  



                                       <pcarey@lexmark.com>, "Reynolds,



                                       <rfletcher@sdl.com>, "Munshi,



                                       com>, "Yves Savourel"


                                       RE: [dita-translation] Draft

                                       proposal for dir attribute








I alluded to this on the phone the other day, but after thinking about it
for a few days I'm getting kind of sour on directional tags of any kind.
The question of whether overrides are needed got me to thinking the basic
idea of directional tags is based on presentation only and is display tool
specific. As such, any tagging method could either do no good or do harm in
all the various output tools there are. It seems to me that applying tagging
according to whether the content is going to Internet Explorer or Antenna
House or something else is very much against the notion of separating
content and format. Since the two consider bi-directional text differently,
tagging for one doesn't guarantee the text will work in the other.

It is true that the various tools will handle bi-directional text in almost
random ways, so some degree of directional control is needed, but it should
be applied at output time, not stored in the source XML. For example, in the
snippet included in the bi-directional model page, the first instance of the
Hebrew word "Hebrew" displays right-to-left in IE on Windows and Safari on
Mac with no tagging at all because of the Unicode range the characters are
in. The second one does too, but I can't figure out if that one is supposed
to run backwards to demonstrate the difference between logical and display
order and it's just typed in wrong. Otherwise it doesn't demonstrate much of

Still, punctuation and other characters must be handled, so control is
needed. The only thing that seems to work consistently is the use of the
Unicode directional characters. They don't necessarily rely on nesting,
which has a lot of advantages. Control can be applied to set of characters
before the next neutral character or a span, depending on what's needed.
Relying on spans can run into problems like a period displaying
right-to-left (which isn't so dramatic) at the right end of a Hebrew word
(which is). If the span is just around the period, nothing happens; if the
span is around the word and the period, you might get the same result
because a period is neutral character and is ignored in directional
controls. Also, I suppose just because it's fun, most tools treat Arabic and
Hebrew differently, so the controls can't be the same for all languages.

Anyway, since it's too late to make a long story short, I'll just repeat
that I think directional control is too messy to rely on tagging to manage.
Authors or translators would have to know what display tool the content is
destined for and also know all the specifics of the letters, numbers, and
punctuation of the language in question and English or any other
left-to-right language. If it's handled at output time, the specific tools
won't necessarily be known, but at least the target output will be. The
chances of getting it right get better in that case.


From: Robert D Anderson [mailto:robander@us.ibm.com]
Sent: Thu 3/2/2006 3:57 PM
To: gershon@tech-tav.com
Cc: bhertz@sdl.com; 'Bryan Schnabel'; Charles Pau; 'Lieske, Christian'; Dave
A Schell; dita-translation@lists.oasis-open.org; dpooley@sdl.com; 'Felix
Sasaki'; 'Richard Ishida'; 'Jennifer Linton'; mambrose@sdl.com;
patrickk@scriptware.nl; pcarey@lexmark.com; Reynolds, Peter;
rfletcher@sdl.com; Munshi, Sukumar; tony.jewtushenko@productinnovator.com;
'Yves Savourel'
Subject: Re: [dita-translation] Draft proposal for dir attribute

Hello again -- only two short comments on the dir attribute.

1. As the one currently responsible for maintaining and bug-fixing the DTDs,
I would strongly favor making it a universal attribute, rather than adding
it to almost everything.

2. One of the points in the write-up says:
"If the document element does not specify the dir attribute, assume left to
right (ltr). "
The previous bullet says that inline elements use the "specified language's
default text direction". Wouldn't that be the case for the document as well?
That is, if I indicate xml:lang="he-il" on my root topic element, then
everything in the topic (such as tables, notes, and
lists) should default to dir="rtl" unless otherwise specified.

I do not know about the lro and rlo values -- does anybody here have tool
experience that would indicate whether these are still needed? If tools
still require them, then we should probably add them.

Robert D Anderson
IBM Authoring Tools Development
Chief Architect, DITA Open Toolkit

             "Gershon L
                                       org>, <mambrose@sdl.com>,
             03/02/2006 01:36          <pcarey@lexmark.com>,
             PM                        <rfletcher@sdl.com>,
                                       <bhertz@sdl.com>, "'Richard
                                       Ishida'" <ishida@w3.org>,
             Please respond to
                  gershon              com>, <patrickk@scriptware.nl>,
                                       "'Lieske, Christian'"
                                       "'Jennifer Linton'"
                                       Charles Pau/Cambridge/IBM@Lotus,
                                       "'Felix Sasaki'"
                                       "'Yves Savourel'"
                                       <ysavourel@translate.com>, Dave A
                                       Schell/Raleigh/IBM@IBMUS, "'Bryan

                                       [dita-translation] Draft proposal
                                       for dir attribute

Hi all,

Here's my draft proposal for the dir attribute. I'd appreciate review
feedback via email before Monday's SC meeting so we can try closing this
item on Monday to hand off to the DITA TC.

It's a working draft that I hope will invoke input from the SC members.
Based on feedback I receive, I plan to prepare a closer to final draft
before Monday's meeting.

I think the main questions are:
1. Should dir be a universal attribute or not?
2. Should we support dir="ltr|rtl" or dir="ltr|rtl|lro|rlo" as per HTML 4.0?

Any and all feedback will be greatly appreciated.

Best Regards,

Gershon L Joseph
Member, OASIS DITA and DocBook Technical Committees Director of Technology
and Single Sourcing Tech-Tav Documentation Ltd.
office: +972-8-974-1569
mobile: +972-57-314-1170
[attachment "DirAttr.html" deleted by Robert D Anderson/Rochester/IBM]

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]