dita-translation message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: RE: RE: [dita-translation] Draft proposal for dir attribute
- From: "Farwell, Kevin" <Kevin.Farwell@lionbridge.com>
- To: <gershon@tech-tav.com>,"Robert D Anderson" <robander@us.ibm.com>
- Date: Thu, 9 Mar 2006 20:48:00 -0500
Title: Re: [dita-translation] Draft proposal for dir attribute
Hi,
I would debate the definition of the word "support" when it
comes to bi-directional text. I've attached some screen shots from three
editors, Epic, XMetal, and Oxygen, as well as a screen shot of the PDF output
generated from the XML. I picked a simple sample, and you can see no editor got
it right. Epic and Oxygen got the Hebrew right and the English and period wrong
and the whole sentence order wrong, but XMetal just reads the file
logically. I also included a shot from MS Word, which is a very capable
bi-directional DTP tool (although I would never recommend it as an XML tool),
and it couldn't handle the XML either. A key feature in Word is the cursor
direction setting, which makes typing a lot easier. Oxygen has a text direction
indicator, but no a setting that can override the direction Unicode insists on.
With those samples as a guide, I'd be wary of tagging the file according to what
appears in an editor. In the interest of science, should you choose to repeat my
experiments, all of these tests were done in a English OS. Perhaps the results
would be different in a native OS, but I'm skeptical.
The next question is what "100% Unicode compliant" means.
Many XML tools and text editors can display every Unicode character, so they
might make the claim. Other tools can display a collection of RTL Unicode values
as a word, but the sentence order is still LTR, so they might make a stronger
claim. Still more, like Epic and Oxygen, get the sentence order of RTL text
right, but can't do anything with LTR mixed in. That's better, but still not
what I suspect you mean by 100%. Again, my opinion is that trusting an editor is
pretty risky.
If we suppose there could be an editor really does
represent all bi-di text as it should be, I'm not sure what would keep us from
supposing the output tools do too. In that case, there would be no need for
markup of any kind. Since no such editor exists, and no such output tool, we are
stuck with requiring some kind of markup. However, there is no requirement on
where that markup exists. It can be in the XML, which I think is not sufficient
and can actually get in the way depending on what the target output is (and is
different for Hebrew and Arabic scripts), or it can be integrated into the
output process, in which case it is always tuned to the right
output.
As a description of what I'm concerned with, I offer
he_ppm.html. This file features an English phrase and that phrase translated
into Hebrew. The first version of the translation has no corrective markup at
all. The second and third translations show two different ways to add
corrections. The first was added automatically and the second was
added with much consternation by me. This represents my second try. The
first had five spans, but I threw that all away and got it down to three.
Looking at the representation of the string in Epic (epicppm.bmp), I can't
imagine intuiting where the spans should go.
I don't think leaving direction tags off limits DITA at
all. There are two levels of directional control, the direction and alignment of
the whole document and the direction of characters inside paragraphs. Both are
only a concern in the output. Unless there would be a case where an entire
Hebrew or Arabic document should be output LTR, that setting can be added to the
output with either a parameter passed at rendition time or a condition in the
XSL that sets the document to RTL if the value of the language attribute
calls for it. That control can also be managed conditionally on
individual paragraphs, so there needn't be a control on the paragraph other
than the language.
At the string level, controls must be applied according to
the characters in the string. It is impossible to tell how the strings will come
out until the output is viewed in its final form, so I think it builds
inefficiency into the system to attempt to mark up the XML. I imagine an author
tagging up a sentence, running HTML, viewing it, going back to the XML to make
corrections, running HTML, viewing it, and so on. Then, once it's all straight,
the boss comes in and says the output has to work in another browser
with different CSS support and Unicode support (check the file in Safari or some
other browser that isn't IE). The author is faced with making a copy of the file
for each deliverable or overwriting the deliverable specific markup each time
the file must be output. Neither is appealing.
One last point to help explain my position. I do a lot of
work with bi-directional XML. Between me and other folks that sit around me, we
have got maybe a couple of million words processed in the past few years.
We don't use directional spans. I think it would be doing users a huge
disservice to add tagging functionality that doesn't really translate to output
functionality and translates to inefficient work. If the directional
controls are added, they should be accompanied by a disclaimer that essentially
says, "Your mileage may vary."
Kevin
Hi
Kevin,
In my
experience, the authoring tool (if it's 100% Unicode compliant) will display the
multilingual text the same way it will be rendered, so the user would apply the
dir attribute correctly. Of course, if the author is not using an authoring tool
that supports Unicode, he/she will have to guess how to apply the dir attribute,
which probably won't work. Since the industry leading XML editors (XMetaL Author
and Arbortext Editor to name but a few) now support BIDI, I think we can safely
assume that the authors writing Hebrew or Arabic will mark up the direction
correctly, as needed according to what they see in the editor
window.
If
DITA simply does not provide the dir attribute, we essentially remove DITA as an
option for RTL languages. Since we are seeing more and more interest in XML (and
DITA) for Hebrew and Arabic authoring in Israel and other ME countries, I think
we should include dir in DITA 1.1.
Best Regards,
Gershon
Hello,
I alluded to this on the phone the other
day, but after thinking about it for a few days I'm getting kind of sour on
directional tags of any kind. The question of whether overrides are needed got
me to thinking the basic idea of directional tags is based on presentation
only and is display tool specific. As such, any tagging method
could either do no good or do harm in all the various output tools
there are. It seems to me that applying tagging according to whether
the content is going to Internet Explorer or Antenna House or
something else is very much against the notion of separating content and
format. Since the two consider bi-directional text differently, tagging for one
doesn't guarantee the text will work in the other.
It is true that the various tools will
handle bi-directional text in almost random ways, so some degree of directional
control is needed, but it should be applied at output time, not stored in the
source XML. For example, in the snippet included in the bi-directional model
page, the first instance of the Hebrew word "Hebrew" displays right-to-left in
IE on Windows and Safari on Mac with no tagging at all because of the Unicode
range the characters are in. The second one does too, but I can't figure out if
that one is supposed to run backwards to demonstrate the difference between
logical and display order and it's just typed in wrong. Otherwise it doesn't
demonstrate much of anything.
Still, punctuation and other characters
must be handled, so control is needed. The only thing that seems to work
consistently is the use of the Unicode directional characters. They don't
necessarily rely on nesting, which has a lot of advantages. Control can be
applied to set of characters before the next neutral character or a span,
depending on what's needed. Relying on spans can run into problems like a period
displaying right-to-left (which isn't so dramatic) at the right end of
a Hebrew word (which is). If the span is just around the period, nothing
happens; if the span is around the word and the period, you might get the same
result because a period is neutral character and is ignored in directional
controls. Also, I suppose just because it's fun, most tools treat
Arabic and Hebrew differently, so the controls can't be the same for all
languages.
Anyway, since it's too late to make a long
story short, I'll just repeat that I think directional control is too messy to
rely on tagging to manage. Authors or translators would have to know what
display tool the content is destined for and also know all the specifics of
the letters, numbers, and punctuation of the language in question and English or
any other left-to-right language. If it's handled at output time, the specific
tools won't necessarily be known, but at least the target output will be. The
chances of getting it right get better in that case.
Kevin
From: Robert D Anderson
[mailto:robander@us.ibm.com]
Sent: Thu 3/2/2006 3:57 PM
To:
gershon@tech-tav.com
Cc: bhertz@sdl.com; 'Bryan Schnabel'; Charles
Pau; 'Lieske, Christian'; Dave A Schell; dita-translation@lists.oasis-open.org;
dpooley@sdl.com; 'Felix Sasaki'; 'Richard Ishida'; 'Jennifer Linton';
mambrose@sdl.com; patrickk@scriptware.nl; pcarey@lexmark.com; Reynolds, Peter;
rfletcher@sdl.com; Munshi, Sukumar; tony.jewtushenko@productinnovator.com; 'Yves
Savourel'
Subject: Re: [dita-translation] Draft proposal for dir
attribute
Hello again -- only two short comments on the dir
attribute.
1. As the one currently responsible for maintaining and
bug-fixing the
DTDs, I would strongly favor making it a universal attribute,
rather than
adding it to almost everything.
2. One of the points in
the write-up says:
"If the document element does not specify the dir
attribute, assume left to
right (ltr). "
The previous bullet says that
inline elements use the "specified language's
default text direction".
Wouldn't that be the case for the document as
well? That is, if I indicate
xml:lang="he-il" on my root topic element,
then everything in the topic (such
as tables, notes, and lists) should
default to dir="rtl" unless otherwise
specified.
I do not know about the lro and rlo values -- does anybody
here have tool
experience that would indicate whether these are still needed?
If tools
still require them, then we should probably add them.
Robert
D Anderson
IBM Authoring Tools Development
Chief Architect, DITA Open
Toolkit
"Gershon
L
Joseph"
<gershon@tech-tav
To
.com>
<dita-translation@lists.oasis-open.
org>,
<mambrose@sdl.com>,
03/02/2006 01:36
<pcarey@lexmark.com>,
PM
<rfletcher@sdl.com>,
<bhertz@sdl.com>,
"'Richard
Ishida'"
<ishida@w3.org>,
Please respond to
<tony.jewtushenko@productinnovator.
gershon
com>,
<patrickk@scriptware.nl>,
"'Lieske,
Christian'"
<christian.lieske@sap.com>,
"'Jennifer
Linton'"
<jennifer.linton@comtech-serv.com>,
<Sukumar.Munshi@lionbridge.com>,
Charles
Pau/Cambridge/IBM@Lotus,
<dpooley@sdl.com>,
<Peter.Reynolds@lionbridge.com>,
"'Felix Sasaki'"
<fsasaki@w3.org>,
"'Yves
Savourel'"
<ysavourel@translate.com>, Dave
A
Schell/Raleigh/IBM@IBMUS,
"'Bryan
Schnabel'"
<bryan.s.schnabel@tek.com>
cc
Subject
[dita-translation] Draft
proposal
for dir
attribute
Hi
all,
Here's my draft proposal for the dir attribute. I'd appreciate
review
feedback via email before Monday's SC meeting so we can try closing
this
item on Monday to hand off to the DITA TC.
It's a working draft
that I hope will invoke input from the SC members.
Based on feedback I
receive, I plan to prepare a closer to final draft
before Monday's
meeting.
I think the main questions are:
1. Should dir be a universal
attribute or not?
2. Should we support dir="ltr|rtl" or dir="ltr|rtl|lro|rlo"
as per HTML
4.0?
Any and all feedback will be greatly
appreciated.
Best Regards,
Gershon
---
Gershon L
Joseph
Member, OASIS DITA and DocBook Technical Committees
Director of
Technology and Single Sourcing
Tech-Tav Documentation Ltd.
office:
+972-8-974-1569
mobile: +972-57-314-1170
http://www.tech-tav.com
[attachment
"DirAttr.html" deleted by Robert D
Anderson/Rochester/IBM]
oxygen.bmp
word.bmp
xmetal.bmp
PDF.bmp
epic.bmp
epicppm.bmp
13 ppm (Letter-size), 12 ppm (A4-size)
13 עמודים לדקה (גודל Letter), 12 עמודים לדקה (גודל A4)
13 עמודים לדקה (גודל Letter), 12 עמודים לדקה (גודל A4)
13 עמודים לדקה (גודל Letter), 12 עמודים לדקה (גודל A4)
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]