dita-translation message

Subject: Re: [dita-translation] FW: DITA Open Toolkit sample files workedokay with thexliffRoundTrip tool (was FW: Translation Workflow)

From: "Rodolfo M. Raya" <rmraya@heartsome.net>
To: dita-translation@lists.oasis-open.org
Date: Wed, 14 Jun 2006 10:46:44 -0300

Hi All,

Yesterday I was playing with Robert Anderson's latest DTD prototype when Bryan's message arrived to my mail box.

Bryan did a a very nice job with his tool. It shows that conversion from DITA to XLIFF using a simple XSL transformation is possible.

I prepared a set of XLIFF files with my program (JoAnn attached it to her message, dita_xliff.zip) to show an alternative type of XLIFF files. The main differences between both implementations are:

* Bryan uses the XLIFF file as holder for untranslatable information. Everything can be reconstructed from the attributes and group elements that his files contain. I used skeleton files (auxiliary external files) to store untranslatable stuff.

* Segmentation is done at block (element) level in Bryan's files. I used sentence level segmentation.

As soon as time permits, I'll post an example of legacy TM recovery using Bryan's translation of DITA sample files as TM data for my program.

Best regards,
Rodolfo

On Wed, 2006-06-14 at 06:58 -0600, JoAnn Hackos wrote:

See the attached information from Bryan Schnabel and Rodolfo Raya.
JoAnn

From: Rodolfo M. Raya [mailto:rmraya@heartsome.net]
Sent: Tuesday, June 13, 2006 9:25 PM
To: bryan.s.schnabel@exgate.tek.com
Cc: dond@us.ibm.com; JoAnn Hackos; azydron@xml-intl.com; tony.jewtushenko@productinnovator.com
Subject: Re: DITA Open Toolkit sample files worked okay with thexliffRoundTrip tool (was FW: Translation Workflow)

On Tue, 2006-06-13 at 17:38 -0700, bryan.s.schnabel@exgate.tek.com wrote:

Hi Don,

I'm sorry that it took me months, instead of days to do the test you asked for. You asked me to download the samples from the DITA Open Toolkit, and try them in my xliffRoundTrip Tool, to see what we get.

I finally did that. I found that I could do them easily enough, one at a time. But I found it better to modify the program a little bit to do several at a time. I cooked up a little commandline routine that reads the mapfile in the sample directory, creates one XLIFF file (with a <file> element for each file), runs a little pseudo translation, then transforms the pseudo-translated XLIFF file back into the appropriate translated DITA chunks.

I zipped the application and sent it to you, along with a powerpoint file that kind of walks you through it. (sorry for the large zip file; I included the jar file for Saxon 8, because the modification required XSLT 2.0)

In glancing at some of the recent DITA SC notes, it looks like Andrzej and Rodolfo are making an actual tool to that will do the job (I'm sure much better than my little test application).

But it was kind of fun to run the file through and see what came out.

You're welcome to try the little sample I sent for kicks if you want. Again, I'm very sorry I wasn't able to get right on this.

Thanks,

Bryan

Hi Bryan,

Good work!

Checking the XLIFF file that you sent I found segments like this:

                <trans-unit id="d3e5" xmrk:ancs="1">
                  <source>
                     <x id="prolog-x-mch2-d3e5" xmrk:ancs="1"/>
                  </source>
                  <target>
                     <x id="prolog-x-mch2-d3e5" xmrk:ancs="1"/>
                  </target>
               </trans-unit>

Would it be possible to avoid creating a segment with only an inline tag in it?

I also noticed that you copied source text to target. Is it possible to leave the target element empty or to omit the target element? If not, could the state attribute of the target be set to a value that indicates that it is a dummy target?

A TM engine may not add translations to a segment if the target is not empty. Without state indication, the TM engine cannot tell if the text in the target is bad and can be overwritten or if the text is a good translation. Also, it would be easier to apply segmentation rules to a segment if there isn't a target to worry about. Additional segmentation will be necessary, as the stylesheet creates segments at block level, not at sentence level.

The attachment contains a set of individual XLIFF files that I generated from the the DITA samples and a unified XLIFF with all segments. The larger XLIFF was generated merging the individual files from the translation project, keeping one <file> element for each DITA document. All files were segmented at sentence level.

Best regards,
Rodolfo

--
The information in this e-mail is intended strictly for the addressee, without prejudices, as a confidential document. Should it reach you, not being the addressee, it is not to be made accessible to any other unauthorised person or copied, distributed or disclosed to any other third party as this would constitute an unlawful act under certain circumstances, unless prior approval is given for its transmission. The content of this e-mail is solely that of the sender and not necessarily that of Heartsome.

References:
- FW: DITA Open Toolkit sample files worked okay with thexliffRoundTrip tool (was FW: Translation Workflow)
  - From: "JoAnn Hackos" <joann.hackos@comtech-serv.com>