dita-translation message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: RE: RE: [dita-translation] Draft proposal for dir attribute
- From: "Gershon L Joseph" <gershon@tech-tav.com>
- To: "'Farwell, Kevin'" <Kevin.Farwell@lionbridge.com>,"'Robert D Anderson'" <robander@us.ibm.com>
- Date: Fri, 10 Mar 2006 10:07:13 +0200
Title: Re: [dita-translation] Draft proposal for dir attribute
Looks
like you have not used XMetaL's new BIDI release (I'm not sure if it's freely
available yet). Maybe it's the only XML editor that works correctly (similar to
Word)? I don't understand the argument that it's an output only issue. When
working in documents with both English and Hebrew (we do lots of those), or both
English and Arabic, not having the ability to control the direction makes
XML useless to us. May as well work natively in MS Word, where we can control
the direction correctly at input time.
We
have lots of users who create multilingual documents who are currently
investigating moving to XML. I'm not going to push the case for DITA to support
the dir attribute, since it's probably only necessary in the ME, and we can
continue using DocBook for these projects.
Adding
dir to the DTD has very little overhead on the DTD side. How much work is
required on the toolkit side remains to be seen (I need to test some samples,
which I was planning to do once DITA 1.1 with the dir attribute was designed).
If the group feels they would rather do without dir, so be
it.
Best Regards,
Gershon
From: Farwell, Kevin
[mailto:Kevin.Farwell@lionbridge.com]
Sent: Friday, March 10, 2006
3:48 AM
To: gershon@tech-tav.com; Robert D Anderson
Cc:
bhertz@sdl.com; Bryan Schnabel; Charles Pau; Lieske, Christian; Dave A Schell;
dita-translation@lists.oasis-open.org; dpooley@sdl.com; Felix Sasaki; Richard
Ishida; Jennifer Linton; mambrose@sdl.com; patrickk@scriptware.nl;
pcarey@lexmark.com; Reynolds, Peter; rfletcher@sdl.com; Munshi, Sukumar;
tony.jewtushenko@productinnovator.com; Yves Savourel
Subject: RE: RE:
[dita-translation] Draft proposal for dir attribute
Hi,
I would debate the definition of the word "support" when it
comes to bi-directional text. I've attached some screen shots from three
editors, Epic, XMetal, and Oxygen, as well as a screen shot of the PDF output
generated from the XML. I picked a simple sample, and you can see no editor got
it right. Epic and Oxygen got the Hebrew right and the English and period wrong
and the whole sentence order wrong, but XMetal just reads the file
logically. I also included a shot from MS Word, which is a very capable
bi-directional DTP tool (although I would never recommend it as an XML tool),
and it couldn't handle the XML either. A key feature in Word is the cursor
direction setting, which makes typing a lot easier. Oxygen has a text direction
indicator, but no a setting that can override the direction Unicode insists on.
With those samples as a guide, I'd be wary of tagging the file according to what
appears in an editor. In the interest of science, should you choose to repeat my
experiments, all of these tests were done in a English OS. Perhaps the results
would be different in a native OS, but I'm skeptical.
The next question is what "100% Unicode compliant" means.
Many XML tools and text editors can display every Unicode character, so they
might make the claim. Other tools can display a collection of RTL Unicode values
as a word, but the sentence order is still LTR, so they might make a stronger
claim. Still more, like Epic and Oxygen, get the sentence order of RTL text
right, but can't do anything with LTR mixed in. That's better, but still not
what I suspect you mean by 100%. Again, my opinion is that trusting an editor is
pretty risky.
If we suppose there could be an editor really does
represent all bi-di text as it should be, I'm not sure what would keep us from
supposing the output tools do too. In that case, there would be no need for
markup of any kind. Since no such editor exists, and no such output tool, we are
stuck with requiring some kind of markup. However, there is no requirement on
where that markup exists. It can be in the XML, which I think is not sufficient
and can actually get in the way depending on what the target output is (and is
different for Hebrew and Arabic scripts), or it can be integrated into the
output process, in which case it is always tuned to the right
output.
As a description of what I'm concerned with, I offer
he_ppm.html. This file features an English phrase and that phrase translated
into Hebrew. The first version of the translation has no corrective markup at
all. The second and third translations show two different ways to add
corrections. The first was added automatically and the second was
added with much consternation by me. This represents my second try. The
first had five spans, but I threw that all away and got it down to three.
Looking at the representation of the string in Epic (epicppm.bmp), I can't
imagine intuiting where the spans should go.
I don't think leaving direction tags off limits DITA at
all. There are two levels of directional control, the direction and alignment of
the whole document and the direction of characters inside paragraphs. Both are
only a concern in the output. Unless there would be a case where an entire
Hebrew or Arabic document should be output LTR, that setting can be added to the
output with either a parameter passed at rendition time or a condition in the
XSL that sets the document to RTL if the value of the language attribute
calls for it. That control can also be managed conditionally on
individual paragraphs, so there needn't be a control on the paragraph other
than the language.
At the string level, controls must be applied according to
the characters in the string. It is impossible to tell how the strings will come
out until the output is viewed in its final form, so I think it builds
inefficiency into the system to attempt to mark up the XML. I imagine an author
tagging up a sentence, running HTML, viewing it, going back to the XML to make
corrections, running HTML, viewing it, and so on. Then, once it's all straight,
the boss comes in and says the output has to work in another browser
with different CSS support and Unicode support (check the file in Safari or some
other browser that isn't IE). The author is faced with making a copy of the file
for each deliverable or overwriting the deliverable specific markup each time
the file must be output. Neither is appealing.
One last point to help explain my position. I do a lot of
work with bi-directional XML. Between me and other folks that sit around me, we
have got maybe a couple of million words processed in the past few years.
We don't use directional spans. I think it would be doing users a huge
disservice to add tagging functionality that doesn't really translate to output
functionality and translates to inefficient work. If the directional
controls are added, they should be accompanied by a disclaimer that essentially
says, "Your mileage may vary."
Kevin
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]