[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: Proposal - dir attribute
Hi Richard, Thanks for this feedback. Essentially you're saying directionality is independent of the language. I was trying to reduce the work of authors/translators by inferring an initial direction from the xml:lang attribute. If we remove the initial directionality defaulting based on language, we have two choices: * Assume a direction of LTR unless otherwise specified, regardless of the xml:lang. This means that on an Arabic or Hebrew document, the author will have to explicitly set dir on the root element of every document. Users generally expect setting the language gives them the correct directionality for the language. Perhaps tools could handle this automatically for the user. * Require the user to set dir on the root element, and have it optional everywhere else. I think we should adopt the same approach as we did for xml:lang, in that the attribute is optional, but best practice recommends using it on the root element of each document. Treating directionality independently from language definitely simplifies the algorithm, and probably makes it easier for users to understand too. Does everyone agree that we should separate directionality from language? Any other comments? Best Regards, Gershon -----Original Message----- From: Richard Ishida [mailto:ishida@w3.org] Sent: Monday, March 27, 2006 2:45 PM To: gershon@tech-tav.com; dita-translation@lists.oasis-open.org; mambrose@sdl.com; pcarey@lexmark.com; rfletcher@sdl.com; bhertz@sdl.com; tony.jewtushenko@productinnovator.com; 'Lieske, Christian'; 'Jennifer Linton'; 'Munshi, Sukumar'; 'Charles Pau'; dpooley@sdl.com; 'Reynolds, Peter'; 'Felix Sasaki'; 'Yves Savourel'; 'Dave A Schell'; 'Bryan Schnabel' Cc: 'Richard Ishida' Subject: RE: Proposal - dir attribute Hello Gershon, [Please forward this email to lists I am unable to reach.] I have the following comments on the proposal: [1] === "1. xml:lang attribute on the document element or, if not specified, default language assumed by the processor. Directionality is determined by the Unicode bidirectional algorithm for this language. "2. xml:lang attribute on any element that overrides the inherited language. Again, directionality is determined by the Unicode bidirectional algorithm for the specified language." xml:lang should be used to declare language only, and not directionality, because: (a) It does *not* give information about directionality. Direction cannot be inferred from that information. For example, Azerbaijani is written LTR in Azerbaijan (Latin script) but RTL in Iran (Arabic script) - yet it is still xml:lang="az" in both cases. The same applies for non-standard orthographies (for example an IPA transcription of Hebrew in W3C's Speech Synthesis Markup Language should be labelled xml:lang="he", but *not* dir="rtl"). (b) dir could be used inline within the same paragraph with values of ltr in one place and rtl in another. xml:lang is not designed for this type of use, and so dir is needed anyway - why complicate matters by having two ways to designate directionality, one of which is incapable of actually doing most of the job? Better reduce confusion and scope for error by having simple, clear semantics to the attributes. (c) The Unicode bidi algorithm works on the basis of the Unicode character semantics as modified by directional embedding directives (ie. dir), not the language expressed in xml:lang, so 'the Unicode bidirectional algorithm for this language' doesn't make sense. (d) In a document that is generally in English you may have a small table that contains only Hebrew or Arabic text. Although it would make sense to use xml:lang on the table markup, so that you don't have to repeat it, you would probably *not* want the table columns to flow from right to left (as would usually be the case when using dir="rtl" on the table), since this is an English document. If xml:lang was associated with direction, you would probably have no control over that. Same goes for list items. (e) Note that, once you have established the general directionality of the document, you *don't* have to specify dir for every instance of RTL text. If I wanted to display the following HTML text that in memory reads" <p>He said 'arabic arabic arabic arabic' to me.</p> in any of the major desktop browsers today, no dir is needed for the bidi algorithm to correctly render the text as <p>He said 'cibara cibara cibara cibara' to me.</p> Although xml:lang might be useful to identify the extent of the arabic language, that declaration has nothing to do with the correct ordering of characters. [2] === "Text direction cannot be sufficiently specified by the xml:lang attribute alone" So really I'm saying "Text direction cannot be sufficiently specified by the xml:lang attribute at all." [3] === Not sure whether it's worth clarifying this particularly in the text (particularly since you point to my article, where it is explained), but... Note that it is not solely to deal with punctuation characters that dir is needed. In fact, in some cases the Unicode RLM and LRM characters are a better choice (note that RLM and LRM are *not* referring to the Unicode characters that mirror the effect of dir!). dir is most often needed to ensure the correct order of directional runs, as in the quote 'W3C ,werbeh werbeh' in an overall LTR context, where the bidi algorithm would have put the W3C over to the right. [4] === "This attribute, when set to "ltr" or "rtl", overrides the default Unicode bidirectional algorithm on neutral characters (such as spaces and punctuation)." It doesn't actually override the algorithm - only the rlo and lro do that, and as mentioned above, it's use is not limited to neutral characters. You could say, instead, "This attribute, when set to "ltr" or "rtl", is intended to resolve cases of ambiguous directionality in bidirectional text." [5] === "This attribute is usually used in conjunction with the xml:lang attribute, to override the default Unicode bidirectional algorithm that applies to the specified language." Again, I would remove this. [6] === "then if the document element specifies the xml:lang attribute, the Unicode Bidirectional Algorithm must be applied to the specified language" The bidi algorithm should be applied and work independently of whether or not a language has been declared, since it operates on the basis of the characters in the text. Again, this and other references to use of xml:lang for direction are inappropriate. [7] === "Directionality is inferred from the xml:lang value. Every language has an associated directionality (left-to-right or right-to-left, also termed LTR or RTL). For example, for English this default direction is LTR and for Hebrew it's RTL." Again, I strongly disagree with this. I think this should say something like: "The default direction of a document is LTR. This can be overridden by use of the dir attribute set to "rtl"." Note also that it is currently not defined what should happen if the language of a document is not defined by the author, which is another reason to use the wording I suggest. Hope that helps, Richard. ============ Richard Ishida Internationalization Lead W3C (World Wide Web Consortium)
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]