OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

dita-translation message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: RE: RE: [dita-translation] Draft proposal for dir attribute

Title: Re: [dita-translation] Draft proposal for dir attribute
Looks like you have not used XMetaL's new BIDI release (I'm not sure if it's freely available yet). Maybe it's the only XML editor that works correctly (similar to Word)? I don't understand the argument that it's an output only issue. When working in documents with both English and Hebrew (we do lots of those), or both English and Arabic, not having the ability to control the direction makes XML useless to us. May as well work natively in MS Word, where we can control the direction correctly at input time.
We have lots of users who create multilingual documents who are currently investigating moving to XML. I'm not going to push the case for DITA to support the dir attribute, since it's probably only necessary in the ME, and we can continue using DocBook for these projects.
Adding dir to the DTD has very little overhead on the DTD side. How much work is required on the toolkit side remains to be seen (I need to test some samples, which I was planning to do once DITA 1.1 with the dir attribute was designed). If the group feels they would rather do without dir, so be it.
Best Regards,

From: Farwell, Kevin [mailto:Kevin.Farwell@lionbridge.com]
Sent: Friday, March 10, 2006 3:48 AM
To: gershon@tech-tav.com; Robert D Anderson
Cc: bhertz@sdl.com; Bryan Schnabel; Charles Pau; Lieske, Christian; Dave A Schell; dita-translation@lists.oasis-open.org; dpooley@sdl.com; Felix Sasaki; Richard Ishida; Jennifer Linton; mambrose@sdl.com; patrickk@scriptware.nl; pcarey@lexmark.com; Reynolds, Peter; rfletcher@sdl.com; Munshi, Sukumar; tony.jewtushenko@productinnovator.com; Yves Savourel
Subject: RE: RE: [dita-translation] Draft proposal for dir attribute

I would debate the definition of the word "support" when it comes to bi-directional text. I've attached some screen shots from three editors, Epic, XMetal, and Oxygen, as well as a screen shot of the PDF output generated from the XML. I picked a simple sample, and you can see no editor got it right. Epic and Oxygen got the Hebrew right and the English and period wrong and the whole sentence order wrong, but XMetal just reads the file logically. I also included a shot from MS Word, which is a very capable bi-directional DTP tool (although I would never recommend it as an XML tool), and it couldn't handle the XML either. A key feature in Word is the cursor direction setting, which makes typing a lot easier. Oxygen has a text direction indicator, but no a setting that can override the direction Unicode insists on. With those samples as a guide, I'd be wary of tagging the file according to what appears in an editor. In the interest of science, should you choose to repeat my experiments, all of these tests were done in a English OS. Perhaps the results would be different in a native OS, but I'm skeptical.
The next question is what "100% Unicode compliant" means. Many XML tools and text editors can display every Unicode character, so they might make the claim. Other tools can display a collection of RTL Unicode values as a word, but the sentence order is still LTR, so they might make a stronger claim. Still more, like Epic and Oxygen, get the sentence order of RTL text right, but can't do anything with LTR mixed in. That's better, but still not what I suspect you mean by 100%. Again, my opinion is that trusting an editor is pretty risky.
If we suppose there could be an editor really does represent all bi-di text as it should be, I'm not sure what would keep us from supposing the output tools do too. In that case, there would be no need for markup of any kind. Since no such editor exists, and no such output tool, we are stuck with requiring some kind of markup. However, there is no requirement on where that markup exists. It can be in the XML, which I think is not sufficient and can actually get in the way depending on what the target output is (and is different for Hebrew and Arabic scripts), or it can be integrated into the output process, in which case it is always tuned to the right output.
As a description of what I'm concerned with, I offer he_ppm.html. This file features an English phrase and that phrase translated into Hebrew. The first version of the translation has no corrective markup at all. The second and third translations show two different ways to add corrections. The first was added automatically and the second was added with much consternation by me. This represents my second try. The first had five spans, but I threw that all away and got it down to three. Looking at the representation of the string in Epic (epicppm.bmp), I can't imagine intuiting where the spans should go.
I don't think leaving direction tags off limits DITA at all. There are two levels of directional control, the direction and alignment of the whole document and the direction of characters inside paragraphs. Both are only a concern in the output. Unless there would be a case where an entire Hebrew or Arabic document should be output LTR, that setting can be added to the output with either a parameter passed at rendition time or a condition in the XSL that sets the document to RTL if the value of the language attribute calls for it. That control can also be managed conditionally on individual paragraphs, so there needn't be a control on the paragraph other than the language.
At the string level, controls must be applied according to the characters in the string. It is impossible to tell how the strings will come out until the output is viewed in its final form, so I think it builds inefficiency into the system to attempt to mark up the XML. I imagine an author tagging up a sentence, running HTML, viewing it, going back to the XML to make corrections, running HTML, viewing it, and so on. Then, once it's all straight, the boss comes in and says the output has to work in another browser with different CSS support and Unicode support (check the file in Safari or some other browser that isn't IE). The author is faced with making a copy of the file for each deliverable or overwriting the deliverable specific markup each time the file must be output. Neither is appealing.
One last point to help explain my position. I do a lot of work with bi-directional XML. Between me and other folks that sit around me, we have got maybe a couple of million words processed in the past few years. We don't use directional spans. I think it would be doing users a huge disservice to add tagging functionality that doesn't really translate to output functionality and translates to inefficient work. If the directional controls are added, they should be accompanied by a disclaimer that essentially says, "Your mileage may vary."

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]