[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: DITA SC Agenda Monday 12 2006
Agenda for Monday 12 June 2006
Note: Please be certain to review the ITS documents from Yves Savourel before the meeting tomorrow.
I11:00 am - 12:00 am Eastern Standard Team (-5 GMT)
DITA Technical Committtee teleconference
USA Toll Free Number: 866-566-4838
USA Toll Number: +1-210-280-1707
PASSCODE: 185771
Roll Call
Approve Minutes from 6 June 2006 (enclosed for those who are not TC
members)
http://www.oasis-open.org/apps/org/workgroup/dita-translation/
<http://www.oasis-open.org/apps/org/workgroup/dita-translation/>
Returning Business:
1) Discussion item from Yves Savourel
As you may know, the W3C has recently published the Last Call Working Draft for ITS (See [1]) as well as the First Working Draft of a companion document: "Best Practices for XML Internationalization" (See [2]).
[1] <http://www.w3.org/TR/2006/WD-its-20060518/>
[2] <http://www.w3.org/TR/2006/WD-xml-i18n-bp-20060518/>
The second document include examples of how to use ITS with a few specific document types (See the "ITS Applied to Existing Formats"
section). In the next draft we would like to include DITA in that list.
The attached file is a try for the possible default rules to process DITA with ITS. We would appreciate very much if some of you had the time to review it and make sure we have not made any mistake, or forgot anything. For example, I'm not sure if the dir attribute should be there or not. I'm also not sure if we have all subflow elements listed. Maybe we need two rule sets: on for the current version of DITA and one for the upcoming one (although if there is no conflict and a single rule set could be used that would be better).
The specification document [1] should help you understand each element in these rules. The Last Call review for the specification ends on June-30. The Best Practices document will still go through several drafts.
2) Discuss Gershon Joseph's draft of the best practice for legacy TM
Attached to this email for non-TC members.
3) Management of conref blocks for translations
Standarized (boilerplate) text is often kept in one or more .dita files used as a source for conrefs across a document set.
All boilerplate content for a language must be
stand-alone.
Boilerplate text must be stand-alone phrases to avoid problems
translating
it into some languages, where it does not fit into the surrounding
text.
Depending on the conref target, the conref target should
be
translated before the parent document that refers to the conref
is
translated.
Conreffing to an inline element may result
in a badly translated phrase with
respect to its surrounding content,
so we
should probably be against this. Examples:
singular/plural,
prepositions, acronyms e.g. ABS (antilock breaking system) so if
you
conref to the text itself, the translated text may not read
correctly.
Action Item: Andrjez will provide examples to the group for discussion.
4) XLIFF transforms
Discuss plans for Rodolfo's tests of their XLIFF transforms and possible release to open source. Ask if there is a proposed date.
Andrzej and Rodolfo have successfully converted DITA to XLIFF and
back.
Rodolfo plans to publish
their converter as open source.
New Business:
4) Handling multi-language documents
Charles Pau and others to provide examples to the list for discussion
JoAnn T. Hackos, PhD
President
Comtech Services,
Inc.
710 Kipling
Street, Suite 400
Denver CO 80215
303-232-7586
joann.hackos@comtech-serv.com
--- Begin Message ---Title: [dita-translation] DITA Translation Subcommittee Meeting Minutes: 5 June 2006
- From: "Gershon L Joseph" <gershon@tech-tav.com>
- To: <dita-translation@lists.oasis-open.org>,<mambrose@sdl.com>,<pcarey@lexmark.com>,<rfletcher@sdl.com>,<bhertz@sdl.com>,"Richard Ishida" <ishida@w3.org>,<tony.jewtushenko@productinnovator.com>,"Lieske, Christian" <christian.lieske@sap.com>,"Jennifer Linton" <jennifer.linton@comtech-serv.com>,"Munshi, Sukumar" <Sukumar.Munshi@lionbridge.com>,"Charles Pau" <charles_pau@us.ibm.com>,<dpooley@sdl.com>,"Reynolds, Peter" <Peter.Reynolds@lionbridge.com>,"Felix Sasaki" <fsasaki@w3.org>,"Yves Savourel" <ysavourel@translate.com>,"Dave A Schell" <dschell@us.ibm.com>,"Bryan Schnabel" <bryan.s.schnabel@tek.com>,<Howard.Schwartz@trados.com>,<kara@ca.ibm.com>
- Date: Wed, 7 Jun 2006 02:27:42 -0600
Best Regards,
Gershon
---
Gershon L Joseph
Member, OASIS DITA and DocBook Technical Committees
Director of Technology and Single Sourcing
Tech-Tav Documentation Ltd.
office: +972-8-974-1569
mobile: +972-57-314-1170
http://www.tech-tav.com
DITA Translation Subcommittee Meeting Minutes: 5 June 2006 (Recorded by Gershon Joseph <gershon@tech-tav.com>) The DITA Translation Subcommittee met on Monday, 5 June 2006 at 08:00am PT for 60 minutes. 1. Roll call Present: Kevin Farwell, JoAnn Hackos, Gershon Joseph, Charles Pau, Rodolfo Raya, Felix Sasaki, Yves Savourel, David Walters, Andrzej Zydron, Kara Warburton Regrets: Don Day 2. Accepted the minutes of the previous meeting. http://lists.oasis-open.org/archives/dita-translation/200605/msg00016.html Moved by Rodolfo, seconded by Yves, no objections. 3. Returning Business: 3.1 Discussion item from Yves Savourel "As you may know, the W3C has recently published the Last Call Working Draft for ITS (See [1]) as well as the First Working Draft of a companion document: "Best Practices for XML Internationalization" (See [2]). [1] http://www.w3.org/TR/2006/WD-its-20060518/ [2] http://www.w3.org/TR/2006/WD-xml-i18n-bp-20060518/ The second document includes examples of how to use ITS with a few specific document types (See the "ITS Applied to Existing Formats" section). In the next draft we would like to include DITA in that list. The attached file is a try for the possible default rules to process DITA with ITS. We would appreciate very much if some of you had the time to review it and make sure we have not made any mistakes, or forgotten anything. For example, I'm not sure if the dir attribute should be there or not. I'm also not sure if we have all subflow elements listed. Maybe we need two rule sets: one for the current version of DITA and one for the upcoming one (although if there is no conflict and a single rule set could be used that would be better). The specification document [1] should help you understand each element in these rules. The Last Call review for the specification ends on June-30. The Best Practices document will still go through several drafts." ACTION for everyone to review the ITS proposals for discussion next week. 3.2 Discussion item from Andrzej Zydron "LISA OSCAR's latest standard GMX/V (Global Information Metrics eXchange - Volume) has been approved and is going through its final public comment phase. GMX/V tackles the issue of word and character counts and how to exchange localization volume information via an XML vocabulary. GMX/V finally provides a verifiable, industry standard for word and character counts. GMX/V mandates XLIFF as the canonical form for word and character counts. GMX/V can be viewed at the following location: http://www.lisa.org/standards/gmx/GMX-V.html Localization tool providers have been consulted and have contributed to this standard. We would appreciate your views/comments on GMX/V." Andrzej gave an overview of the standard and background, and requested SC members review the standard. 4. New Business: Decide the Best Practices that we need to consider. 1) Possibly to maximize usage of conref (reusable blocks)... From Nancy Harrison: "Boilerplate text is often kept in one or more .dita files used as a source for conrefs across a document set. How should authors / implementers / processors deal with multiple sets of boilerplate files automatically? DocBook names every file containing generated text with a language extension (two letter only), including English. A similar scheme, but probably with locale, not just country, would work for DITA documents as well." Andrzej: All boilerplate content for a language must be stand-alone. Boilerplate text must be stand-alone phrases to avoid problems translating it into some languages, where it does not fit into the surrounding text. ACTION: Charles will provide an example of typical boilerplate fragments JoAnn: What about a conref to non-boilerplate text? How would this affect the translation workflow? Andrzej: Dependency on the conref target, which would need to be translated before the parent document that refers to the conref is translated. Again, conreffing to an inline element may result in badly translated phrase with respect to its surrounding content, so we should probably be against this. Examples: singular/plural, prepositions, acronyms e.g. ABS (antilock breaking system) so if you conref to the text itself, the translated text may not read correctly. ACTION: Andrzej to send examples to the group for discussion. 2) Handling multi-language documents [we did not discuss this further this week, but some members did send examples to the list for discussion on-list and at next week's meting] 3) Not a best practice, but the DITA to XLIFF and back mechanism needs to be completed. Andrzej and Rodolfo have successfully converted DITA to XLIFF and back. Rodolfo plans to publish their converter as open source. 4) Gershon: what's the best practice for translations for users who move from legacy documentation system to DITA? Andrzej: It should still be possible to run against the previous TM. Inlines may not match, or may fuzzy match. As long as memories are aligned at the sentence level, it should work (at least leverage matching) Kevin confirmed that using TM as-is will give you 10-20% less matching than if you tweak the XLIFF to better match the DITA. Rodolfo: A good TM engine should help you recover 70% of the inline tags, which is the main problem. Kevin: so long as they're matched tags; however conditional text marked up in legacy tools (e.g. FrameMaker) will only be fuzzy matched (at best). ACTION Gershon to write a draft proposal (with Rodolfo) and submit it to the list for input and technical assistance. Meeting adjourned at 09:00am PT. ------ End Message ---
--- Begin Message ---Title: First try at the legacy TM best practice
- From: "Gershon L Joseph" <gershon@tech-tav.com>
- To: "Rodolfo M. Raya" <rodolfo@heartsome.net>
- Date: Wed, 7 Jun 2006 03:18:20 -0600
Hi Rodolfo,
Here's what I've come up with. Please add the missing information. If you
prefer to discuss by phone, I've sent you a request to add you as a Skype
contact. My time zone is 7 hours ahead of New York time, or 10 hours ahead
of San Francisco time. I'm not sure how available I'll be today due to other
conference calls I've scheduled, but I should be available tomorrow until at
least 17:00 my time, later if needed.
Best Regards,
Gershon
---
Gershon L Joseph
Member, OASIS DITA and DocBook Technical Committees
Director of Technology and Single Sourcing
Tech-Tav Documentation Ltd.
office: +972-8-974-1569
mobile: +972-57-314-1170
http://www.tech-tav.com
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"> <article> <title>Best Practice for Leveraging Legacy Translation Memory when Migrating to DITA</title> <section> <title>Statement of Problem</title> <para>Many organizations have previously translated content that was authored in non-DITA tools (such as Word and FrameMaker). When migrating their legacy content into the new DITA authoring environment, what does the organization do about their legacy translation memory? This legacy translation memory (TM) was created with large financial investment that can't easily be thrown away simply because a new authoring architecture is being adopted.</para> <para>This article describes best practices that will help organizations to use their legacy TM for futre translation projects that are authored in DITA, in order to minimize the expense of translating DITA-based content.</para> </section> <section> <title>Recommended Best Practices</title> <para>If you keep the following points in mind, you should be able to maximize your existing translation memory when you send your DITA documents for translation:</para> <itemizedlist> <listitem> <para>Ensure your translation service providor uses a tool that support TMX [is this correct?]. This will ensure you can migrate your TM between TM tools that support the industry standard for TM interchange. This is required not only to free you from dependence on a single translation service provider, but also to allow you to tweak your TM to better match your DITA-based XML source documents you'll be sending for translation.</para> </listitem> <listitem> <para>Provided the structure of the DITA-based content has not changed radically compared to the legacy documents, the TM software should fully match most block elements. As long as the legacy TM aligns with the DITA source at the sentence level, the translation software should be able to fully leverage matching.</para> </listitem> <listitem> <para>Inline elements may not match at all, or may only fuzzy match. If the TM is preprocessed to prepare it for the DITA-based translation project, then inline elements should fully match. Note that a good TM engine should help you recover 70% of the inline tags, which is the main area where matching is prone to fail.</para> </listitem> <listitem> <para>If text entities and/or conrefs are used as containers for reusable text (and they should be!), then these items may not fully match (only fuzzy match). However, since each of these items needs to be translated only once, and should at least fuzzy match, it should not result in significant translation expense.</para> </listitem> <listitem> <para>If you tweak the XLIFF (exported from the legacy TM to better align it with the new DITA content), you should realize an improvement of 10-20% on TM matching. Whether it's worth the effort and expense in doing this depends on the size of the DITA documents to be translated. The idea is [Rofolfo, please correct me if I'm wrong!] to export the TM to XLIFF, process the XLIFF (usually via XSLT) to better align it with your DITA content, and then import the XLIFF back into your TM. Thus, your TM will now be better aligned with your DITA content, which will result in more accurate matching.</para> </listitem> <listitem> <para>One area where things can go wrong is if the legacy content does not use matched tags. Since XML uses matched tags, elements may not be accurately matched. This is particularly true of conditional text marked up in legacy tools (e.g. FrameMaker), where you can expect only fuzzy matching at best, or no matching at worst. Depending on how much conditional contenct the legacy source documents contain, it may be worth preprocessing the TM to ensure all conditional tags are paired. [can this be automated, or would a human have to go through the TM or XLIFF to close the tags? What should we suggest they do here to resolve the issue?]</para> </listitem> </itemizedlist> <para>[Rodolfo, is there anything I've missed or anything else we should add?]</para> </section> </article>Title: Best Practice for Leveraging Legacy Translation Memory when Migrating to DITA--- End Message ---
Many organizations have previously translated content that was authored in non-DITA tools (such as Word and FrameMaker). When migrating their legacy content into the new DITA authoring environment, what does the organization do about their legacy translation memory? This legacy translation memory (TM) was created with large financial investment that can't easily be thrown away simply because a new authoring architecture is being adopted.
This article describes best practices that will help organizations to use their legacy TM for futre translation projects that are authored in DITA, in order to minimize the expense of translating DITA-based content.
If you keep the following points in mind, you should be able to maximize your existing translation memory when you send your DITA documents for translation:
Ensure your translation service providor uses a tool that support TMX [is this correct?]. This will ensure you can migrate your TM between TM tools that support the industry standard for TM interchange. This is required not only to free you from dependence on a single translation service provider, but also to allow you to tweak your TM to better match your DITA-based XML source documents you'll be sending for translation.
Provided the structure of the DITA-based content has not changed radically compared to the legacy documents, the TM software should fully match most block elements. As long as the legacy TM aligns with the DITA source at the sentence level, the translation software should be able to fully leverage matching.
Inline elements may not match at all, or may only fuzzy match. If the TM is preprocessed to prepare it for the DITA-based translation project, then inline elements should fully match. Note that a good TM engine should help you recover 70% of the inline tags, which is the main area where matching is prone to fail.
If text entities and/or conrefs are used as containers for reusable text (and they should be!), then these items may not fully match (only fuzzy match). However, since each of these items needs to be translated only once, and should at least fuzzy match, it should not result in significant translation expense.
If you tweak the XLIFF (exported from the legacy TM to better align it with the new DITA content), you should realize an improvement of 10-20% on TM matching. Whether it's worth the effort and expense in doing this depends on the size of the DITA documents to be translated. The idea is [Rofolfo, please correct me if I'm wrong!] to export the TM to XLIFF, process the XLIFF (usually via XSLT) to better align it with your DITA content, and then import the XLIFF back into your TM. Thus, your TM will now be better aligned with your DITA content, which will result in more accurate matching.
One area where things can go wrong is if the legacy content does not use matched tags. Since XML uses matched tags, elements may not be accurately matched. This is particularly true of conditional text marked up in legacy tools (e.g. FrameMaker), where you can expect only fuzzy matching at best, or no matching at worst. Depending on how much conditional contenct the legacy source documents contain, it may be worth preprocessing the TM to ensure all conditional tags are paired. [can this be automated, or would a human have to go through the TM or XLIFF to close the tags? What should we suggest they do here to resolve the issue?]
[Rodolfo, is there anything I've missed or anything else we should add?]
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]