[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [dita-translation] Draft proposal for dir attribute
Or it may have been me :) The most obvious area where taking direction from the lang attribute breaks down is with punctuation in numbers. For example, when lang="he", if you type the following characters in the order shown: 123.456-789 The standard Unicode rules for Hebrew will result in: 789-456.123 Since Hebrew direction is RTL, the full stop (period) follows the text that was typed before it; but with numbers the result is not what the author expects! There are many such issues, this is the most obvious one. Most technical manuals written in Hebrew contain English words or phrases. If the English phrase appears at the beginning or end of the sentence, again the result is often not what the author intends. Here's an example where the author types: <p xml:lang="he">HebrewText HebrewText <ph xml:lang="en">EnglishPhrase</ph>.</p> If the phrase within the <p> wraps, what often happens is the Hebrew text and the full stop appear on the first line, with the English text on the second line. The full stop is not kept with the English phrase. However if we specify: <p xml:lang="he" dir="rtl">HebrewText HebrewText <ph xml:lang="en">EnglishPhrase</ph>.</p> Then the full stop will correctly follow the English text phrase, even when it wraps onto a new line. You could argue that the latter example is a problem with authoring/publishing tools, but it's encountered in most tools that support Middle Eastern languages -- they simply have no idea what the user's means without additional directionality specified. Does this help clarify the issue? Best Regards, Gershon -----Original Message----- From: Farwell, Kevin [mailto:Kevin.Farwell@lionbridge.com] Sent: Saturday, March 04, 2006 1:37 AM To: Robert D Anderson Cc: bhertz@sdl.com; Bryan Schnabel; Charles Pau; Lieske, Christian; Dave A Schell; dita-translation@lists.oasis-open.org; dpooley@sdl.com; Felix Sasaki; gershon@tech-tav.com; Richard Ishida; Jennifer Linton; mambrose@sdl.com; patrickk@scriptware.nl; pcarey@lexmark.com; Reynolds, Peter; rfletcher@sdl.com; Munshi, Sukumar; tony.jewtushenko@productinnovator.com; Yves Savourel Subject: RE: [dita-translation] Draft proposal for dir attribute Hi, I think that might have been me talking about the tags. I was actually saying that indicating direction in a wrapper was not enough, but the language tag has the same limitation. Overriding defaults is indeed rare, but the number of weak or neutral characters is higher than one might think, and it always seems like the number of times they show up in a file is impossibly high. Those are the ones that cause all the problems by bending to the will of the bullies around them. The approach we've taken is sort of what you describe. We have an automated process that goes through and finds the problem characters, looks at what's before or after them, and drops in direction codes according to what we believe to be the right thing in that case. It works pretty well, but it still needs refining. My belief is that there should be no case that can't be solved, but we achieved that yet. It's also my belief that we haven't encountered all the cases yet, and may never, so this will never be totally finished. I'd like to answer the call for examples, and I've put out the word to a colleague that happens to be preparing some Arabic text today, but it being Friday afternoon I think getting anything by Monday morning is not likely. I can say generally phone numbers, paired punctuation marks, and cases where neutral punctuation abuts directional text, like an English word directly after a Hebrew word that has parentheses around it or a dash between a letter and a number are the kinds of things that break. If concrete examples of these surface in the next couple of hours, I'll send them along. Kevin -----Original Message----- From: Robert D Anderson [mailto:robander@us.ibm.com] Sent: Friday, March 03, 2006 11:23 AM To: Farwell, Kevin Cc: bhertz@sdl.com; Bryan Schnabel; Charles Pau; Lieske, Christian; Dave A Schell; dita-translation@lists.oasis-open.org; dpooley@sdl.com; Felix Sasaki; gershon@tech-tav.com; Richard Ishida; Jennifer Linton; mambrose@sdl.com; patrickk@scriptware.nl; pcarey@lexmark.com; Reynolds, Peter; rfletcher@sdl.com; Munshi, Sukumar; tony.jewtushenko@productinnovator.com; Yves Savourel Subject: RE: [dita-translation] Draft proposal for dir attribute On the call this Monday, somebody mentioned that there are cases where xml:lang is not enough to determine the direction of the text. In those cases, my current understanding is that setting the attribute means something like "The contents of this element do not follow the default language rules for the currently specified xml:lang attribute." Thus, the only time anybody needs to set the value is when they need to override the current default, which is probably somewhat rare. Is that correct? In those rare cases, is there any way we could identify the change and automatically set the direction on output? My guess is no, otherwise the display mechanisms would already display them correctly. If I am correct about why this is used, would it be possible for someone to give an example of where this override behavior is most often used? I think examples were given on Monday relating to numbers or punctuation, but I am not sure. Like Kevin, some of us were a little confused by the by the Hebrew example, because both values showed up the same in our browsers. Perhaps this is just an indication that browsers do not yet actually support the dir attribute. I would like to call for examples of use cases (particularly any that show up often) to help us determine if this is critical for DITA 1.1, or whether we can spend more time considering it for 1.2. Thanks- Robert D Anderson Authoring Tools Development Chief Architect, DITA Open Toolkit (507) 253-8787, T/L 553-8787 "Farwell, Kevin" <Kevin.Farwell@li onbridge.com> To Robert D 03/03/2006 02:47 Anderson/Rochester/IBM@IBMUS, AM <gershon@tech-tav.com> cc <bhertz@sdl.com>, "Bryan Schnabel" <bryan.s.schnabel@tek.com>, Charles Pau/Cambridge/IBM@Lotus, "Lieske, Christian" <christian.lieske@sap.com>, Dave A Schell/Raleigh/IBM@IBMUS, <dita-translation@lists.oasis-open. org>, <dpooley@sdl.com>, "Felix Sasaki" <fsasaki@w3.org>, "Richard Ishida" <ishida@w3.org>, "Jennifer Linton" <jennifer.linton@comtech-serv.com>, <mambrose@sdl.com>, <patrickk@scriptware.nl>, <pcarey@lexmark.com>, "Reynolds, Peter" <Peter.Reynolds@lionbridge.com>, <rfletcher@sdl.com>, "Munshi, Sukumar" <Sukumar.Munshi@lionbridge.com>, <tony.jewtushenko@productinnovator. com>, "Yves Savourel" <ysavourel@translate.com> Subject RE: [dita-translation] Draft proposal for dir attribute Hello, I alluded to this on the phone the other day, but after thinking about it for a few days I'm getting kind of sour on directional tags of any kind. The question of whether overrides are needed got me to thinking the basic idea of directional tags is based on presentation only and is display tool specific. As such, any tagging method could either do no good or do harm in all the various output tools there are. It seems to me that applying tagging according to whether the content is going to Internet Explorer or Antenna House or something else is very much against the notion of separating content and format. Since the two consider bi-directional text differently, tagging for one doesn't guarantee the text will work in the other. It is true that the various tools will handle bi-directional text in almost random ways, so some degree of directional control is needed, but it should be applied at output time, not stored in the source XML. For example, in the snippet included in the bi-directional model page, the first instance of the Hebrew word "Hebrew" displays right-to-left in IE on Windows and Safari on Mac with no tagging at all because of the Unicode range the characters are in. The second one does too, but I can't figure out if that one is supposed to run backwards to demonstrate the difference between logical and display order and it's just typed in wrong. Otherwise it doesn't demonstrate much of anything. Still, punctuation and other characters must be handled, so control is needed. The only thing that seems to work consistently is the use of the Unicode directional characters. They don't necessarily rely on nesting, which has a lot of advantages. Control can be applied to set of characters before the next neutral character or a span, depending on what's needed. Relying on spans can run into problems like a period displaying right-to-left (which isn't so dramatic) at the right end of a Hebrew word (which is). If the span is just around the period, nothing happens; if the span is around the word and the period, you might get the same result because a period is neutral character and is ignored in directional controls. Also, I suppose just because it's fun, most tools treat Arabic and Hebrew differently, so the controls can't be the same for all languages. Anyway, since it's too late to make a long story short, I'll just repeat that I think directional control is too messy to rely on tagging to manage. Authors or translators would have to know what display tool the content is destined for and also know all the specifics of the letters, numbers, and punctuation of the language in question and English or any other left-to-right language. If it's handled at output time, the specific tools won't necessarily be known, but at least the target output will be. The chances of getting it right get better in that case. Kevin From: Robert D Anderson [mailto:robander@us.ibm.com] Sent: Thu 3/2/2006 3:57 PM To: gershon@tech-tav.com Cc: bhertz@sdl.com; 'Bryan Schnabel'; Charles Pau; 'Lieske, Christian'; Dave A Schell; dita-translation@lists.oasis-open.org; dpooley@sdl.com; 'Felix Sasaki'; 'Richard Ishida'; 'Jennifer Linton'; mambrose@sdl.com; patrickk@scriptware.nl; pcarey@lexmark.com; Reynolds, Peter; rfletcher@sdl.com; Munshi, Sukumar; tony.jewtushenko@productinnovator.com; 'Yves Savourel' Subject: Re: [dita-translation] Draft proposal for dir attribute Hello again -- only two short comments on the dir attribute. 1. As the one currently responsible for maintaining and bug-fixing the DTDs, I would strongly favor making it a universal attribute, rather than adding it to almost everything. 2. One of the points in the write-up says: "If the document element does not specify the dir attribute, assume left to right (ltr). " The previous bullet says that inline elements use the "specified language's default text direction". Wouldn't that be the case for the document as well? That is, if I indicate xml:lang="he-il" on my root topic element, then everything in the topic (such as tables, notes, and lists) should default to dir="rtl" unless otherwise specified. I do not know about the lro and rlo values -- does anybody here have tool experience that would indicate whether these are still needed? If tools still require them, then we should probably add them. Robert D Anderson IBM Authoring Tools Development Chief Architect, DITA Open Toolkit "Gershon L Joseph" <gershon@tech-tav To .com> <dita-translation@lists.oasis-open. org>, <mambrose@sdl.com>, 03/02/2006 01:36 <pcarey@lexmark.com>, PM <rfletcher@sdl.com>, <bhertz@sdl.com>, "'Richard Ishida'" <ishida@w3.org>, Please respond to <tony.jewtushenko@productinnovator. gershon com>, <patrickk@scriptware.nl>, "'Lieske, Christian'" <christian.lieske@sap.com>, "'Jennifer Linton'" <jennifer.linton@comtech-serv.com>, <Sukumar.Munshi@lionbridge.com>, Charles Pau/Cambridge/IBM@Lotus, <dpooley@sdl.com>, <Peter.Reynolds@lionbridge.com>, "'Felix Sasaki'" <fsasaki@w3.org>, "'Yves Savourel'" <ysavourel@translate.com>, Dave A Schell/Raleigh/IBM@IBMUS, "'Bryan Schnabel'" <bryan.s.schnabel@tek.com> cc Subject [dita-translation] Draft proposal for dir attribute Hi all, Here's my draft proposal for the dir attribute. I'd appreciate review feedback via email before Monday's SC meeting so we can try closing this item on Monday to hand off to the DITA TC. It's a working draft that I hope will invoke input from the SC members. Based on feedback I receive, I plan to prepare a closer to final draft before Monday's meeting. I think the main questions are: 1. Should dir be a universal attribute or not? 2. Should we support dir="ltr|rtl" or dir="ltr|rtl|lro|rlo" as per HTML 4.0? Any and all feedback will be greatly appreciated. Best Regards, Gershon --- Gershon L Joseph Member, OASIS DITA and DocBook Technical Committees Director of Technology and Single Sourcing Tech-Tav Documentation Ltd. office: +972-8-974-1569 mobile: +972-57-314-1170 http://www.tech-tav.com [attachment "DirAttr.html" deleted by Robert D Anderson/Rochester/IBM]
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]