OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

dita-translation message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: RE: RE: [dita-translation] Draft proposal for dir attribute

Title: Re: [dita-translation] Draft proposal for dir attribute
I would debate the definition of the word "support" when it comes to bi-directional text. I've attached some screen shots from three editors, Epic, XMetal, and Oxygen, as well as a screen shot of the PDF output generated from the XML. I picked a simple sample, and you can see no editor got it right. Epic and Oxygen got the Hebrew right and the English and period wrong and the whole sentence order wrong, but XMetal just reads the file logically. I also included a shot from MS Word, which is a very capable bi-directional DTP tool (although I would never recommend it as an XML tool), and it couldn't handle the XML either. A key feature in Word is the cursor direction setting, which makes typing a lot easier. Oxygen has a text direction indicator, but no a setting that can override the direction Unicode insists on. With those samples as a guide, I'd be wary of tagging the file according to what appears in an editor. In the interest of science, should you choose to repeat my experiments, all of these tests were done in a English OS. Perhaps the results would be different in a native OS, but I'm skeptical.
The next question is what "100% Unicode compliant" means. Many XML tools and text editors can display every Unicode character, so they might make the claim. Other tools can display a collection of RTL Unicode values as a word, but the sentence order is still LTR, so they might make a stronger claim. Still more, like Epic and Oxygen, get the sentence order of RTL text right, but can't do anything with LTR mixed in. That's better, but still not what I suspect you mean by 100%. Again, my opinion is that trusting an editor is pretty risky.
If we suppose there could be an editor really does represent all bi-di text as it should be, I'm not sure what would keep us from supposing the output tools do too. In that case, there would be no need for markup of any kind. Since no such editor exists, and no such output tool, we are stuck with requiring some kind of markup. However, there is no requirement on where that markup exists. It can be in the XML, which I think is not sufficient and can actually get in the way depending on what the target output is (and is different for Hebrew and Arabic scripts), or it can be integrated into the output process, in which case it is always tuned to the right output.
As a description of what I'm concerned with, I offer he_ppm.html. This file features an English phrase and that phrase translated into Hebrew. The first version of the translation has no corrective markup at all. The second and third translations show two different ways to add corrections. The first was added automatically and the second was added with much consternation by me. This represents my second try. The first had five spans, but I threw that all away and got it down to three. Looking at the representation of the string in Epic (epicppm.bmp), I can't imagine intuiting where the spans should go.
I don't think leaving direction tags off limits DITA at all. There are two levels of directional control, the direction and alignment of the whole document and the direction of characters inside paragraphs. Both are only a concern in the output. Unless there would be a case where an entire Hebrew or Arabic document should be output LTR, that setting can be added to the output with either a parameter passed at rendition time or a condition in the XSL that sets the document to RTL if the value of the language attribute calls for it. That control can also be managed conditionally on individual paragraphs, so there needn't be a control on the paragraph other than the language.
At the string level, controls must be applied according to the characters in the string. It is impossible to tell how the strings will come out until the output is viewed in its final form, so I think it builds inefficiency into the system to attempt to mark up the XML. I imagine an author tagging up a sentence, running HTML, viewing it, going back to the XML to make corrections, running HTML, viewing it, and so on. Then, once it's all straight, the boss comes in and says the output has to work in another browser with different CSS support and Unicode support (check the file in Safari or some other browser that isn't IE). The author is faced with making a copy of the file for each deliverable or overwriting the deliverable specific markup each time the file must be output. Neither is appealing.
One last point to help explain my position. I do a lot of work with bi-directional XML. Between me and other folks that sit around me, we have got maybe a couple of million words processed in the past few years. We don't use directional spans. I think it would be doing users a huge disservice to add tagging functionality that doesn't really translate to output functionality and translates to inefficient work. If the directional controls are added, they should be accompanied by a disclaimer that essentially says, "Your mileage may vary."

From: Gershon L Joseph [mailto:gershon@tech-tav.com]
Sent: Thursday, March 09, 2006 5:52 AM
To: Farwell, Kevin; 'Robert D Anderson'
Cc: bhertz@sdl.com; 'Bryan Schnabel'; 'Charles Pau'; 'Lieske, Christian'; 'Dave A Schell'; dita-translation@lists.oasis-open.org; dpooley@sdl.com; 'Felix Sasaki'; 'Richard Ishida'; 'Jennifer Linton'; mambrose@sdl.com; patrickk@scriptware.nl; pcarey@lexmark.com; Reynolds, Peter; rfletcher@sdl.com; Munshi, Sukumar; tony.jewtushenko@productinnovator.com; 'Yves Savourel'
Subject: RE: RE: [dita-translation] Draft proposal for dir attribute

Hi Kevin,
In my experience, the authoring tool (if it's 100% Unicode compliant) will display the multilingual text the same way it will be rendered, so the user would apply the dir attribute correctly. Of course, if the author is not using an authoring tool that supports Unicode, he/she will have to guess how to apply the dir attribute, which probably won't work. Since the industry leading XML editors (XMetaL Author and Arbortext Editor to name but a few) now support BIDI, I think we can safely assume that the authors writing Hebrew or Arabic will mark up the direction correctly, as needed according to what they see in the editor window.
If DITA simply does not provide the dir attribute, we essentially remove DITA as an option for RTL languages. Since we are seeing more and more interest in XML (and DITA) for Hebrew and Arabic authoring in Israel and other ME countries, I think we should include dir in DITA 1.1.

Best Regards,

From: Farwell, Kevin [mailto:Kevin.Farwell@lionbridge.com]
Sent: Friday, March 03, 2006 10:48 AM
To: Robert D Anderson; gershon@tech-tav.com
Cc: bhertz@sdl.com; Bryan Schnabel; Charles Pau; Lieske, Christian; Dave A Schell; dita-translation@lists.oasis-open.org; dpooley@sdl.com; Felix Sasaki; Richard Ishida; Jennifer Linton; mambrose@sdl.com; patrickk@scriptware.nl; pcarey@lexmark.com; Reynolds, Peter; rfletcher@sdl.com; Munshi, Sukumar; tony.jewtushenko@productinnovator.com; Yves Savourel
Subject: [Norton AntiSpam] RE: [dita-translation] Draft proposal for dir attribute

I alluded to this on the phone the other day, but after thinking about it for a few days I'm getting kind of sour on directional tags of any kind. The question of whether overrides are needed got me to thinking the basic idea of directional tags is based on presentation only and is display tool specific. As such, any tagging method could either do no good or do harm in all the various output tools there are. It seems to me that applying tagging according to whether the content is going to Internet Explorer or Antenna House or something else is very much against the notion of separating content and format. Since the two consider bi-directional text differently, tagging for one doesn't guarantee the text will work in the other.
It is true that the various tools will handle bi-directional text in almost random ways, so some degree of directional control is needed, but it should be applied at output time, not stored in the source XML. For example, in the snippet included in the bi-directional model page, the first instance of the Hebrew word "Hebrew" displays right-to-left in IE on Windows and Safari on Mac with no tagging at all because of the Unicode range the characters are in. The second one does too, but I can't figure out if that one is supposed to run backwards to demonstrate the difference between logical and display order and it's just typed in wrong. Otherwise it doesn't demonstrate much of anything.
Still, punctuation and other characters must be handled, so control is needed. The only thing that seems to work consistently is the use of the Unicode directional characters. They don't necessarily rely on nesting, which has a lot of advantages. Control can be applied to set of characters before the next neutral character or a span, depending on what's needed. Relying on spans can run into problems like a period displaying right-to-left (which isn't so dramatic) at the right end of a Hebrew word (which is). If the span is just around the period, nothing happens; if the span is around the word and the period, you might get the same result because a period is neutral character and is ignored in directional controls. Also, I suppose just because it's fun, most tools treat Arabic and Hebrew differently, so the controls can't be the same for all languages.
Anyway, since it's too late to make a long story short, I'll just repeat that I think directional control is too messy to rely on tagging to manage. Authors or translators would have to know what display tool the content is destined for and also know all the specifics of the letters, numbers, and punctuation of the language in question and English or any other left-to-right language. If it's handled at output time, the specific tools won't necessarily be known, but at least the target output will be. The chances of getting it right get better in that case.

From: Robert D Anderson [mailto:robander@us.ibm.com]
Sent: Thu 3/2/2006 3:57 PM
To: gershon@tech-tav.com
Cc: bhertz@sdl.com; 'Bryan Schnabel'; Charles Pau; 'Lieske, Christian'; Dave A Schell; dita-translation@lists.oasis-open.org; dpooley@sdl.com; 'Felix Sasaki'; 'Richard Ishida'; 'Jennifer Linton'; mambrose@sdl.com; patrickk@scriptware.nl; pcarey@lexmark.com; Reynolds, Peter; rfletcher@sdl.com; Munshi, Sukumar; tony.jewtushenko@productinnovator.com; 'Yves Savourel'
Subject: Re: [dita-translation] Draft proposal for dir attribute

Hello again -- only two short comments on the dir attribute.

1. As the one currently responsible for maintaining and bug-fixing the
DTDs, I would strongly favor making it a universal attribute, rather than
adding it to almost everything.

2. One of the points in the write-up says:
"If the document element does not specify the dir attribute, assume left to
right (ltr). "
The previous bullet says that inline elements use the "specified language's
default text direction". Wouldn't that be the case for the document as
well? That is, if I indicate xml:lang="he-il" on my root topic element,
then everything in the topic (such as tables, notes, and lists) should
default to dir="rtl" unless otherwise specified.

I do not know about the lro and rlo values -- does anybody here have tool
experience that would indicate whether these are still needed? If tools
still require them, then we should probably add them.

Robert D Anderson
IBM Authoring Tools Development
Chief Architect, DITA Open Toolkit

             "Gershon L                                                   
             <gershon@tech-tav                                          To
             .com>                     <dita-translation@lists.oasis-open.
                                       org>, <mambrose@sdl.com>,          
             03/02/2006 01:36          <pcarey@lexmark.com>,              
             PM                        <rfletcher@sdl.com>,               
                                       <bhertz@sdl.com>, "'Richard        
                                       Ishida'" <ishida@w3.org>,          
             Please respond to         <tony.jewtushenko@productinnovator.
                  gershon              com>, <patrickk@scriptware.nl>,    
                                       "'Lieske, Christian'"              
                                       "'Jennifer Linton'"                
                                       Charles Pau/Cambridge/IBM@Lotus,   
                                       "'Felix Sasaki'" <fsasaki@w3.org>, 
                                       "'Yves Savourel'"                  
                                       <ysavourel@translate.com>, Dave A  
                                       Schell/Raleigh/IBM@IBMUS, "'Bryan  
                                       [dita-translation] Draft proposal  
                                       for dir attribute                  

Hi all,

Here's my draft proposal for the dir attribute. I'd appreciate review
feedback via email before Monday's SC meeting so we can try closing this
item on Monday to hand off to the DITA TC.

It's a working draft that I hope will invoke input from the SC members.
Based on feedback I receive, I plan to prepare a closer to final draft
before Monday's meeting.

I think the main questions are:
1. Should dir be a universal attribute or not?
2. Should we support dir="ltr|rtl" or dir="ltr|rtl|lro|rlo" as per HTML

Any and all feedback will be greatly appreciated.

Best Regards,

Gershon L Joseph
Member, OASIS DITA and DocBook Technical Committees
Director of Technology and Single Sourcing
Tech-Tav Documentation Ltd.
office: +972-8-974-1569
mobile: +972-57-314-1170
[attachment "DirAttr.html" deleted by Robert D Anderson/Rochester/IBM]







13 ppm (Letter-size), 12 ppm (A4-size)

13 עמודים לדקה (גודל Letter), 12 עמודים לדקה (גודל A4)

13 עמודים לדקה (גודל Letter)‏, 12 עמודים לדקה (גודל A4)

13 עמודים לדקה (גודל Letter), 12 עמודים לדקה (גודל A4)

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]