dita message

Subject: RE: [dita] [dita-translation] TC/DITA/Translation Subcommittee Proposals

From: "JoAnn Hackos" <joann.hackos@comtech-serv.com>
To: "Grosso, Paul" <pgrosso@ptc.com>,<dita@lists.oasis-open.org>
Date: Tue, 14 Mar 2006 08:04:22 -0700

Paul et al,
The SC had a considerable debate about the LRO and RLO values. I've
enclosed the last series of emails for you and others to review. I've
also asked some of the experts in this to explain the decision.  

Unfortunately, Gershon Joseph cannot be on the call today. He has been
leading the discussions. He has also been testing Xmetal's new BIDI
release (or pre-release).

JoAnn

-----Original Message-----
From: Grosso, Paul [mailto:pgrosso@ptc.com] 
Sent: Tuesday, March 14, 2006 7:49 AM
To: JoAnn Hackos; dita@lists.oasis-open.org
Subject: RE: [dita] [dita-translation] TC/DITA/Translation Subcommittee
Proposals

 
> -----Original Message-----
> From: JoAnn Hackos [mailto:joann.hackos@comtech-serv.com] 
> Sent: Tuesday, 2006 March 14 8:26
> To: dita@lists.oasis-open.org
> Subject: [dita] [dita-translation] TC/DITA/Translation 
> Subcommittee Proposals
>  
> 
> From: JoAnn Hackos, chair DITA/Translation Subcommittee 
>  
> The DITA/Translation Subcommittee approved the following proposals to
> the DITA TC on March 13, 2006.
>  
> DIR Attribute
> Proposal: That the DITA 1.1 specification include the DIR 
> attribute as a
> universal attribute with the values of LTR, RTL, LRO, and RLO. No
> default value is to be specified for the DITA DTD.
>
>  
> Discussion: The DIR attribute is used by authors of languages such as
> Hebrew and Arabic to ensure that correct directionality on the output,
> especially when the standard directionality has to be modified to
> accommodate some special use of the language. The reason to include it
> is to ensure that tools for authoring and for transforms generate the
> correct directionality. There was discussion that the results of this
> would often be unpredictable and produce different effects 
> for different
> browers. The SC will now work on a statement of best practices for
> authors and tools vendors to develop a way to handle the dir attribute
> properly.
>  

I agree that some more explanation will be necessary
before we can agree to put this in the DITA spec.  

In particular, while most people may know about LTR and 
RTL (since it is part of HTML), many may not know what 
the processing expectations are for LRO and RLO (unless 
they get into the details of bidi-override in either the 
CSS or XSL-FO specifications).  The subject of language 
direction and bidi-override is complex enough without 
forcing people to read the entire CSS or XSL-FO spec plus 
the Unicode spec just to be able to use the DITA dir attribute.

paul

--- Begin Message ---

From: "Gershon L Joseph" <gershon@tech-tav.com>
To: "Farwell, Kevin" <Kevin.Farwell@lionbridge.com>,"Robert D Anderson" <robander@us.ibm.com>
Date: Fri, 10 Mar 2006 01:07:13 -0700

Looks like you have not used XMetaL's new BIDI release (I'm not sure if it's freely available yet). Maybe it's the only XML editor that works correctly (similar to Word)? I don't understand the argument that it's an output only issue. When working in documents with both English and Hebrew (we do lots of those), or both English and Arabic, not having the ability to control the direction makes XML useless to us. May as well work natively in MS Word, where we can control the direction correctly at input time.
 
We have lots of users who create multilingual documents who are currently investigating moving to XML. I'm not going to push the case for DITA to support the dir attribute, since it's probably only necessary in the ME, and we can continue using DocBook for these projects.
 
Adding dir to the DTD has very little overhead on the DTD side. How much work is required on the toolkit side remains to be seen (I need to test some samples, which I was planning to do once DITA 1.1 with the dir attribute was designed). If the group feels they would rather do without dir, so be it.
 
Best Regards,
Gershon

________________________________

From: Farwell, Kevin [mailto:Kevin.Farwell@lionbridge.com] 
Sent: Friday, March 10, 2006 3:48 AM
To: gershon@tech-tav.com; Robert D Anderson
Cc: bhertz@sdl.com; Bryan Schnabel; Charles Pau; Lieske, Christian; Dave A Schell; dita-translation@lists.oasis-open.org; dpooley@sdl.com; Felix Sasaki; Richard Ishida; Jennifer Linton; mambrose@sdl.com; patrickk@scriptware.nl; pcarey@lexmark.com; Reynolds, Peter; rfletcher@sdl.com; Munshi, Sukumar; tony.jewtushenko@productinnovator.com; Yves Savourel
Subject: RE: RE: [dita-translation] Draft proposal for dir attribute


Hi,
 
I would debate the definition of the word "support" when it comes to bi-directional text. I've attached some screen shots from three editors, Epic, XMetal, and Oxygen, as well as a screen shot of the PDF output generated from the XML. I picked a simple sample, and you can see no editor got it right. Epic and Oxygen got the Hebrew right and the English and period wrong and the whole sentence order wrong, but XMetal just reads the file logically. I also included a shot from MS Word, which is a very capable bi-directional DTP tool (although I would never recommend it as an XML tool), and it couldn't handle the XML either. A key feature in Word is the cursor direction setting, which makes typing a lot easier. Oxygen has a text direction indicator, but no a setting that can override the direction Unicode insists on. With those samples as a guide, I'd be wary of tagging the file according to what appears in an editor. In the interest of science, should you choose to repeat my experiments, all of these tests were done in a English OS. Perhaps the results would be different in a native OS, but I'm skeptical.
 
The next question is what "100% Unicode compliant" means. Many XML tools and text editors can display every Unicode character, so they might make the claim. Other tools can display a collection of RTL Unicode values as a word, but the sentence order is still LTR, so they might make a stronger claim. Still more, like Epic and Oxygen, get the sentence order of RTL text right, but can't do anything with LTR mixed in. That's better, but still not what I suspect you mean by 100%. Again, my opinion is that trusting an editor is pretty risky.
 
If we suppose there could be an editor really does represent all bi-di text as it should be, I'm not sure what would keep us from supposing the output tools do too. In that case, there would be no need for markup of any kind. Since no such editor exists, and no such output tool, we are stuck with requiring some kind of markup. However, there is no requirement on where that markup exists. It can be in the XML, which I think is not sufficient and can actually get in the way depending on what the target output is (and is different for Hebrew and Arabic scripts), or it can be integrated into the output process, in which case it is always tuned to the right output.
 
As a description of what I'm concerned with, I offer he_ppm.html. This file features an English phrase and that phrase translated into Hebrew. The first version of the translation has no corrective markup at all. The second and third translations show two different ways to add corrections. The first was added automatically and the second was added with much consternation by me. This represents my second try. The first had five spans, but I threw that all away and got it down to three. Looking at the representation of the string in Epic (epicppm.bmp), I can't imagine intuiting where the spans should go.
 
I don't think leaving direction tags off limits DITA at all. There are two levels of directional control, the direction and alignment of the whole document and the direction of characters inside paragraphs. Both are only a concern in the output. Unless there would be a case where an entire Hebrew or Arabic document should be output LTR, that setting can be added to the output with either a parameter passed at rendition time or a condition in the XSL that sets the document to RTL if the value of the language attribute calls for it. That control can also be managed conditionally on individual paragraphs, so there needn't be a control on the paragraph other than the language.
 
At the string level, controls must be applied according to the characters in the string. It is impossible to tell how the strings will come out until the output is viewed in its final form, so I think it builds inefficiency into the system to attempt to mark up the XML. I imagine an author tagging up a sentence, running HTML, viewing it, going back to the XML to make corrections, running HTML, viewing it, and so on. Then, once it's all straight, the boss comes in and says the output has to work in another browser with different CSS support and Unicode support (check the file in Safari or some other browser that isn't IE). The author is faced with making a copy of the file for each deliverable or overwriting the deliverable specific markup each time the file must be output. Neither is appealing.
 
One last point to help explain my position. I do a lot of work with bi-directional XML. Between me and other folks that sit around me, we have got maybe a couple of million words processed in the past few years. We don't use directional spans. I think it would be doing users a huge disservice to add tagging functionality that doesn't really translate to output functionality and translates to inefficient work. If the directional controls are added, they should be accompanied by a disclaimer that essentially says, "Your mileage may vary."
 
Kevin

--- End Message ---