OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

dita-translation message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: Proposal - dir attribute


Hi Richard,

Thanks for this feedback. Essentially you're saying directionality is
independent of the language. I was trying to reduce the work of
authors/translators by inferring an initial direction from the xml:lang
attribute. If we remove the initial directionality defaulting based on
language, we have two choices:

* Assume a direction of LTR unless otherwise specified, regardless of the
xml:lang. This means that on an Arabic or Hebrew document, the author will
have to explicitly set dir on the root element of every document. Users
generally expect setting the language gives them the correct directionality
for the language. Perhaps tools could handle this automatically for the
user.

* Require the user to set dir on the root element, and have it optional
everywhere else.

I think we should adopt the same approach as we did for xml:lang, in that
the attribute is optional, but best practice recommends using it on the root
element of each document.

Treating directionality independently from language definitely simplifies
the algorithm, and probably makes it easier for users to understand too.

Does everyone agree that we should separate directionality from language?

Any other comments?


Best Regards,
Gershon

-----Original Message-----
From: Richard Ishida [mailto:ishida@w3.org] 
Sent: Monday, March 27, 2006 2:45 PM
To: gershon@tech-tav.com; dita-translation@lists.oasis-open.org;
mambrose@sdl.com; pcarey@lexmark.com; rfletcher@sdl.com; bhertz@sdl.com;
tony.jewtushenko@productinnovator.com; 'Lieske, Christian'; 'Jennifer
Linton'; 'Munshi, Sukumar'; 'Charles Pau'; dpooley@sdl.com; 'Reynolds,
Peter'; 'Felix Sasaki'; 'Yves Savourel'; 'Dave A Schell'; 'Bryan Schnabel'
Cc: 'Richard Ishida'
Subject: RE: Proposal - dir attribute

Hello Gershon,

[Please forward this email to lists I am unable to reach.]

I have the following comments on the proposal:

[1]
===
"1. xml:lang attribute on the document element or, if not specified, default
language assumed by the processor. Directionality is determined by the
Unicode bidirectional algorithm for this language.

"2.	xml:lang attribute on any element that overrides the inherited
language. Again, directionality is determined by the Unicode bidirectional
algorithm for the specified language."

xml:lang should be used to declare language only, and not directionality,
because:

(a) It does *not* give information about directionality.  Direction cannot
be inferred from that information.  For example, Azerbaijani is written LTR
in Azerbaijan (Latin script) but RTL in Iran (Arabic script) - yet it is
still xml:lang="az" in both cases. The same applies for non-standard
orthographies (for example an IPA transcription of Hebrew in W3C's Speech
Synthesis Markup Language should be labelled xml:lang="he", but *not*
dir="rtl").

(b) dir could be used inline within the same paragraph with values of ltr in
one place and rtl in another. xml:lang is not designed for this type of use,
and so dir is needed anyway - why complicate matters by having two ways to
designate directionality, one of which is incapable of actually doing most
of the job? Better reduce confusion and scope for error by having simple,
clear semantics to the attributes.

(c) The Unicode bidi algorithm works on the basis of the Unicode character
semantics as modified by directional embedding directives (ie. dir), not the
language expressed in xml:lang, so 'the Unicode bidirectional algorithm for
this language' doesn't make sense.

(d) In a document that is generally in English you may have a small table
that contains only Hebrew or Arabic text.  Although it would make sense to
use xml:lang on the table markup, so that you don't have to repeat it, you
would probably *not* want the table columns to flow from right to left (as
would usually be the case when using dir="rtl" on the table), since this is
an English document. If xml:lang was associated with direction, you would
probably have no control over that.  Same goes for list items. 

(e) Note that, once you have established the general directionality of the
document, you *don't* have to specify dir for every instance of RTL text.
If I wanted to display the following HTML text that in memory reads"
	<p>He said 'arabic arabic arabic arabic' to me.</p> in any of the
major desktop browsers today, no dir is needed for the bidi algorithm to
correctly render the text as
	<p>He said 'cibara cibara cibara cibara' to me.</p> Although
xml:lang might be useful to identify the extent of the arabic language, that
declaration has nothing to do with the correct ordering of characters. 


[2]
===
"Text direction cannot be sufficiently specified by the xml:lang attribute
alone"

So really I'm saying

"Text direction cannot be sufficiently specified by the xml:lang attribute
at all."


[3]
===
Not sure whether it's worth clarifying this particularly in the text
(particularly since you point to my article, where it is explained), but...

Note that it is not solely to deal with punctuation characters that dir is
needed. In fact, in some cases the Unicode RLM and LRM characters are a
better choice (note that RLM and LRM are *not* referring to the Unicode
characters that mirror the effect of dir!).  

dir is most often needed to ensure the correct order of directional runs, as
in the quote 'W3C ,werbeh werbeh' in an overall LTR context, where the bidi
algorithm would have put the W3C over to the right.


[4]
===
"This attribute, when set to "ltr" or "rtl", overrides the default Unicode
bidirectional algorithm on neutral characters (such as spaces and
punctuation)."

It doesn't actually override the algorithm - only the rlo and lro do that,
and as mentioned above, it's use is not limited to neutral characters.  You
could say, instead, "This attribute, when set to "ltr" or "rtl", is intended
to resolve cases of ambiguous directionality in bidirectional text."


[5]
===
"This attribute is usually used in conjunction with the xml:lang attribute,
to override the default Unicode bidirectional algorithm that applies to the
specified language."

Again, I would remove this.


[6]
===
"then if the document element specifies the xml:lang attribute, the Unicode
Bidirectional Algorithm must be applied to the specified language"

The bidi algorithm should be applied and work independently of whether or
not a language has been declared, since it operates on the basis of the
characters in the text.  Again, this and other references to use of xml:lang
for direction are inappropriate.


[7]
===
"Directionality is inferred from the xml:lang value. Every language has an
associated directionality (left-to-right or right-to-left, also termed LTR
or RTL). For example, for English this default direction is LTR and for
Hebrew it's RTL."

Again, I strongly disagree with this.  I think this should say something
like:

"The default direction of a document is LTR.  This can be overridden by use
of the dir attribute set to "rtl"."

Note also that it is currently not defined what should happen if the
language of a document is not defined by the author, which is another reason
to use the wording I suggest.


Hope that helps,
Richard.


============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]