dita-translation message

Subject: Re: [dita-translation] DITA SC Agenda Monday 12 2006
From: Andrzej Zydron <azydron@xml-intl.com>
To: JoAnn Hackos <joann.hackos@comtech-serv.com>
Date: Mon, 12 Jun 2006 15:45:04 +0100
Hi JoAnn,

Please find attached a sample round trip transformation from xml through 
xml:tm to xliff, then through translation and all the way back to the 
target language document.

Best Regards,

AZ

JoAnn Hackos wrote:
> Agenda for Monday 12 June 2006
> 
> Note: Please be certain to review the ITS documents from Yves Savourel 
> before the meeting tomorrow.
> 
> I11:00 am - 12:00 am Eastern Standard Team (-5 GMT)
> 
> DITA Technical Committtee teleconference
> 
> USA Toll Free Number: 866-566-4838
> 
> USA Toll Number: +1-210-280-1707
> 
> PASSCODE: 185771
> 
> Roll Call
> 
> Approve Minutes from 6 June 2006 (enclosed for those who are not TC
> 
> members)
> 
> _http://www.oasis-open.org/apps/org/workgroup/dita-translation/_
> 
> <_http://www.oasis-open.org/apps/org/workgroup/dita-translation/_>
> 
> Returning Business:
> 
> 1) Discussion item from Yves Savourel
> 
> As you may know, the W3C has recently published the Last Call Working 
> Draft for ITS (See [1]) as well as the First Working Draft of a 
> companion document: "Best Practices for XML Internationalization" (See [2]).
> 
> [1] <_http://www.w3.org/TR/2006/WD-its-20060518/_>
> 
> [2] <_http://www.w3.org/TR/2006/WD-xml-i18n-bp-20060518/_>
> 
> The second document include examples of how to use ITS with a few 
> specific document types (See the "ITS Applied to Existing Formats"
> 
> section). In the next draft we would like to include DITA in that list.
> 
> The attached file is a try for the possible default rules to process 
> DITA with ITS. We would appreciate very much if some of you had the time 
> to review it and make sure we have not made any mistake, or forgot 
> anything. For example, I'm not sure if the dir attribute should be there 
> or not. I'm also not sure if we have all subflow elements listed. Maybe 
> we need two rule sets: on for the current version of DITA and one for 
> the upcoming one (although if there is no conflict and a single rule set 
> could be used that would be better).
> 
> The specification document [1] should help you understand each element 
> in these rules. The Last Call review for the specification ends on 
> June-30. The Best Practices document will still go through several drafts.
> 
> 2) Discuss Gershon Joseph's draft of the best practice for legacy TM
> 
> Attached to this email for non-TC members.
> 
> 3) Management of conref blocks for translations
> 
> Standarized (boilerplate) text is often kept in one or more .dita files 
> used as a source for conrefs across a document set.
> 
> All boilerplate content for a language must be stand-alone.
>             Boilerplate text must be stand-alone phrases to avoid 
> problems translating
>             it into some languages, where it does not fit into the 
> surrounding text.
> 
> Depending on the conref target, the conref target should be
>             translated before the parent document that refers to the conref
>             is translated.
> 
> Conreffing to an inline element may result
>             in a badly translated phrase with respect to its surrounding 
> content,
>             so we should probably be against this. Examples: 
> singular/plural,
>             prepositions, acronyms e.g. ABS (antilock breaking system) 
> so if
>             you conref to the text itself, the translated text may not read
>             correctly.
> 
> Action Item: Andrjez will provide examples to the group for discussion.
> 
> 4) XLIFF transforms
> 
> Discuss plans for Rodolfo's tests of their XLIFF transforms and possible 
> release to open source. Ask if there is a proposed date.
> 
> Andrzej and Rodolfo have successfully converted DITA to XLIFF and back.
>         Rodolfo plans to publish their converter as open source.
> 
> New Business:
> 
> 4) Handling multi-language documents
> 
> Charles Pau and others to provide examples to the list for discussion
> 
>  
>  
> 
> JoAnn T. Hackos, PhD
> President
> Comtech Services, Inc.
> 710 Kipling Street, Suite 400
> Denver CO 80215
> 303-232-7586
> joann.hackos@comtech-serv.com
> 
>  
> 
>  
> 
> ---------------------------------------------------------------------------------------------------
> *Text inserted by Panda Platinum 2005 Internet Security:*
> 
> This message has NOT been classified as spam. If it is unsolicited mail 
> (spam), click on the following link to reclassify it: It is spam! 
> <http://127.0.0.1:6083/Panda?ID=pav_53924&SPAM=true>
> ---------------------------------------------------------------------------------------------------
> 
> ------------------------------------------------------------------------
> 
> Subject:
> [dita-translation] DITA Translation Subcommittee Meeting Minutes: 5 June 
> 2006
> From:
> "Gershon L Joseph" <gershon@tech-tav.com>
> Date:
> Wed, 7 Jun 2006 02:27:42 -0600
> To:
> <dita-translation@lists.oasis-open.org>, <mambrose@sdl.com>, 
> <pcarey@lexmark.com>, <rfletcher@sdl.com>, <bhertz@sdl.com>, "Richard 
> Ishida" <ishida@w3.org>, <tony.jewtushenko@productinnovator.com>, 
> "Lieske, Christian" <christian.lieske@sap.com>, "Jennifer Linton" 
> <jennifer.linton@comtech-serv.com>, "Munshi, Sukumar" 
> <Sukumar.Munshi@lionbridge.com>, "Charles Pau" <charles_pau@us.ibm.com>, 
> <dpooley@sdl.com>, "Reynolds, Peter" <Peter.Reynolds@lionbridge.com>, 
> "Felix Sasaki" <fsasaki@w3.org>, "Yves Savourel" 
> <ysavourel@translate.com>, "Dave A Schell" <dschell@us.ibm.com>, "Bryan 
> Schnabel" <bryan.s.schnabel@tek.com>, <Howard.Schwartz@trados.com>, 
> <kara@ca.ibm.com>
> 
> To:
> <dita-translation@lists.oasis-open.org>, <mambrose@sdl.com>, 
> <pcarey@lexmark.com>, <rfletcher@sdl.com>, <bhertz@sdl.com>, "Richard 
> Ishida" <ishida@w3.org>, <tony.jewtushenko@productinnovator.com>, 
> "Lieske, Christian" <christian.lieske@sap.com>, "Jennifer Linton" 
> <jennifer.linton@comtech-serv.com>, "Munshi, Sukumar" 
> <Sukumar.Munshi@lionbridge.com>, "Charles Pau" <charles_pau@us.ibm.com>, 
> <dpooley@sdl.com>, "Reynolds, Peter" <Peter.Reynolds@lionbridge.com>, 
> "Felix Sasaki" <fsasaki@w3.org>, "Yves Savourel" 
> <ysavourel@translate.com>, "Dave A Schell" <dschell@us.ibm.com>, "Bryan 
> Schnabel" <bryan.s.schnabel@tek.com>, <Howard.Schwartz@trados.com>, 
> <kara@ca.ibm.com>
> 
> 
> 
> 
> Best Regards,
> Gershon
> 
> ---
> Gershon L Joseph
> Member, OASIS DITA and DocBook Technical Committees
> Director of Technology and Single Sourcing
> Tech-Tav Documentation Ltd.
> office: +972-8-974-1569
> mobile: +972-57-314-1170
> http://www.tech-tav.com
> 
> 
> ------------------------------------------------------------------------
> 
> DITA Translation Subcommittee Meeting Minutes: 5 June 2006
> 
> (Recorded by Gershon Joseph <gershon@tech-tav.com>)
> 
> The DITA Translation Subcommittee met on Monday, 5 June 2006 at 08:00am PT
> for 60 minutes.
> 
> 1.  Roll call
> 
>     Present: Kevin Farwell, JoAnn Hackos, Gershon Joseph, Charles Pau, Rodolfo 
>              Raya, Felix Sasaki, Yves Savourel, David Walters, Andrzej Zydron,
>              Kara Warburton
> 
>     Regrets: Don Day
> 
> 2.  Accepted the minutes of the previous meeting.
>     http://lists.oasis-open.org/archives/dita-translation/200605/msg00016.html
>     Moved by Rodolfo, seconded by Yves, no objections.
> 
> 3.  Returning Business:
> 
> 3.1 Discussion item from Yves Savourel
> 
>     "As you may know, the W3C has recently published the Last Call Working Draft 
>     for ITS (See [1]) as well as the First Working Draft of a companion 
>     document: "Best Practices for XML Internationalization" (See [2]).
>     [1] http://www.w3.org/TR/2006/WD-its-20060518/
>     [2] http://www.w3.org/TR/2006/WD-xml-i18n-bp-20060518/
> 
>     The second document includes examples of how to use ITS with a few specific 
>     document types (See the "ITS Applied to Existing Formats" section). In the 
>     next draft we would like to include DITA in that list.
>     
>     The attached file is a try for the possible default rules to process DITA 
>     with ITS. We would appreciate very much if some of you had the time to 
>     review it and make sure we have not made any mistakes, or forgotten anything. 
>     For example, I'm not sure if the dir attribute should be there or not. 
>     I'm also not sure if we have all subflow elements listed. Maybe we need 
>     two rule sets: one for the current version of DITA and one for the upcoming 
>     one (although if there is no conflict and a single rule set could be used 
>     that would be better).
> 
>     The specification document [1] should help you understand each element in 
>     these rules. The Last Call review for the specification ends on June-30. 
>     The Best Practices document will still go through several drafts."
> 
>     ACTION for everyone to review the ITS proposals for discussion next week.
> 
> 3.2 Discussion item from Andrzej Zydron
> 
>     "LISA OSCAR's latest standard GMX/V (Global Information Metrics eXchange
>     - Volume) has been approved and is going through its final public comment 
>     phase. GMX/V tackles the issue of word and character counts and how to 
>     exchange localization volume information via an XML vocabulary. 
> 
>     GMX/V finally provides a verifiable, industry standard for word and 
>     character counts. GMX/V mandates XLIFF as the canonical form for word 
>     and character counts.
> 
>     GMX/V can be viewed at the following location:
>     http://www.lisa.org/standards/gmx/GMX-V.html
> 
>     Localization tool providers have been consulted and have contributed to 
>     this standard. We would appreciate your views/comments on GMX/V."
> 
>     Andrzej gave an overview of the standard and background, and requested
>     SC members review the standard.
> 
> 4.  New Business:
> 
>     Decide the Best Practices that we need to consider.
> 
>     1)  Possibly to maximize usage of conref (reusable blocks)...
> 
>         From Nancy Harrison:
>         "Boilerplate text is often kept in one or more .dita files used as a 
>         source for conrefs across a document set. How should authors / 
>         implementers / processors deal with multiple sets of boilerplate files 
>         automatically?  DocBook names every file containing generated text 
>         with a language extension (two letter only), including English.  A 
>         similar scheme, but probably with locale, not just country, would work 
>         for DITA documents as well."
> 
>         Andrzej: All boilerplate content for a language must be stand-alone.
>             Boilerplate text must be stand-alone phrases to avoid problems translating
>             it into some languages, where it does not fit into the surrounding text.
> 
>         ACTION: Charles will provide an example of typical boilerplate fragments
> 
>         JoAnn: What about a conref to non-boilerplate text? How would this
>             affect the translation workflow?
>         Andrzej: Dependency on the conref target, which would need to be 
>             translated before the parent document that refers to the conref 
>             is translated. Again, conreffing to an inline element may result 
>             in badly translated phrase with respect to its surrounding content, 
>             so we should probably be against this. Examples: singular/plural,
>             prepositions, acronyms e.g. ABS (antilock breaking system) so if 
>             you conref to the text itself, the translated text may not read 
>             correctly.
>         ACTION: Andrzej to send examples to the group for discussion.
> 
>     2)  Handling multi-language documents
>         [we did not discuss this further this week, but some members did send
>         examples to the list for discussion on-list and at next week's meting]
> 
>     3)  Not a best practice, but the DITA to XLIFF and back mechanism needs to 
>         be completed.
> 
>         Andrzej and Rodolfo have successfully converted DITA to XLIFF and back.
>         Rodolfo plans to publish their converter as open source.
> 
>     4)  Gershon: what's the best practice for translations for users who move 
>             from legacy documentation system to DITA?
> 
>         Andrzej: It should still be possible to run against the previous TM.
>             Inlines may not match, or may fuzzy match. As long as memories are 
>         aligned at the sentence level, it should work (at least leverage matching)
> 
>         Kevin confirmed that using TM as-is will give you 10-20% less matching 
>             than if you tweak the XLIFF to better match the DITA.
> 
>         Rodolfo: A good TM engine should help you recover 70% of the inline tags,
>         which is the main problem.
>         
>         Kevin: so long as they're matched tags; however conditional text marked 
>             up in legacy tools (e.g. FrameMaker) will only be fuzzy matched 
>             (at best).
> 
>         ACTION Gershon to write a draft proposal (with Rodolfo) and submit it 
>             to the list for input and technical assistance.
>        
> Meeting adjourned at 09:00am PT.
> 
> ---
> 
> 
> ------------------------------------------------------------------------
> 
> Subject:
> First try at the legacy TM best practice
> From:
> "Gershon L Joseph" <gershon@tech-tav.com>
> Date:
> Wed, 7 Jun 2006 03:18:20 -0600
> To:
> "Rodolfo M. Raya" <rodolfo@heartsome.net>
> 
> To:
> "Rodolfo M. Raya" <rodolfo@heartsome.net>
> CC:
> "JoAnn Hackos" <joann.hackos@comtech-serv.com>
> 
> 
> Hi Rodolfo,
> 
> Here's what I've come up with. Please add the missing information. If you
> prefer to discuss by phone, I've sent you a request to add you as a Skype
> contact. My time zone is 7 hours ahead of New York time, or 10 hours ahead
> of San Francisco time. I'm not sure how available I'll be today due to other
> conference calls I've scheduled, but I should be available tomorrow until at
> least 17:00 my time, later if needed.
> 
> Best Regards,
> Gershon
> 
> ---
> Gershon L Joseph
> Member, OASIS DITA and DocBook Technical Committees
> Director of Technology and Single Sourcing
> Tech-Tav Documentation Ltd.
> office: +972-8-974-1569
> mobile: +972-57-314-1170
> http://www.tech-tav.com
> 
> 
> ------------------------------------------------------------------------
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
> "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd";>
> <article>
>   <title>Best Practice for Leveraging Legacy Translation Memory when Migrating
>   to DITA</title>
> 
>   <section>
>     <title>Statement of Problem</title>
> 
>     <para>Many organizations have previously translated content that was
>     authored in non-DITA tools (such as Word and FrameMaker). When migrating
>     their legacy content into the new DITA authoring environment, what does
>     the organization do about their legacy translation memory? This legacy
>     translation memory (TM) was created with large financial investment that
>     can't easily be thrown away simply because a new authoring architecture is
>     being adopted.</para>
> 
>     <para>This article describes best practices that will help organizations
>     to use their legacy TM for futre translation projects that are authored in
>     DITA, in order to minimize the expense of translating DITA-based
>     content.</para>
>   </section>
> 
>   <section>
>     <title>Recommended Best Practices</title>
> 
>     <para>If you keep the following points in mind, you should be able to
>     maximize your existing translation memory when you send your DITA
>     documents for translation:</para>
> 
>     <itemizedlist>
>       <listitem>
>         <para>Ensure your translation service providor uses a tool that
>         support TMX [is this correct?]. This will ensure you can migrate your
>         TM between TM tools that support the industry standard for TM
>         interchange. This is required not only to free you from dependence on
>         a single translation service provider, but also to allow you to tweak
>         your TM to better match your DITA-based XML source documents you'll be
>         sending for translation.</para>
>       </listitem>
> 
>       <listitem>
>         <para>Provided the structure of the DITA-based content has not changed
>         radically compared to the legacy documents, the TM software should
>         fully match most block elements. As long as the legacy TM aligns with
>         the DITA source at the sentence level, the translation software should
>         be able to fully leverage matching.</para>
>       </listitem>
> 
>       <listitem>
>         <para>Inline elements may not match at all, or may only fuzzy match.
>         If the TM is preprocessed to prepare it for the DITA-based translation
>         project, then inline elements should fully match. Note that a good TM
>         engine should help you recover 70% of the inline tags, which is the
>         main area where matching is prone to fail.</para>
>       </listitem>
> 
>       <listitem>
>         <para>If text entities and/or conrefs are used as containers for
>         reusable text (and they should be!), then these items may not fully
>         match (only fuzzy match). However, since each of these items needs to
>         be translated only once, and should at least fuzzy match, it should
>         not result in significant translation expense.</para>
>       </listitem>
> 
>       <listitem>
>         <para>If you tweak the XLIFF (exported from the legacy TM to better
>         align it with the new DITA content), you should realize an improvement
>         of 10-20% on TM matching. Whether it's worth the effort and expense in
>         doing this depends on the size of the DITA documents to be translated.
>         The idea is [Rofolfo, please correct me if I'm wrong!] to export the
>         TM to XLIFF, process the XLIFF (usually via XSLT) to better align it
>         with your DITA content, and then import the XLIFF back into your TM.
>         Thus, your TM will now be better aligned with your DITA content, which
>         will result in more accurate matching.</para>
>       </listitem>
> 
>       <listitem>
>         <para>One area where things can go wrong is if the legacy content does
>         not use matched tags. Since XML uses matched tags, elements may not be
>         accurately matched. This is particularly true of conditional text
>         marked up in legacy tools (e.g. FrameMaker), where you can expect only
>         fuzzy matching at best, or no matching at worst. Depending on how much
>         conditional contenct the legacy source documents contain, it may be
>         worth preprocessing the TM to ensure all conditional tags are paired.
>         [can this be automated, or would a human have to go through the TM or
>         XLIFF to close the tags? What should we suggest they do here to
>         resolve the issue?]</para>
>       </listitem>
>     </itemizedlist>
> 
>     <para>[Rodolfo, is there anything I've missed or anything else we should
>     add?]</para>
>   </section>
> </article>
> 
> 
> ------------------------------------------------------------------------
> 
> 
>     Best Practice for Leveraging Legacy Translation Memory when
>     Migrating to DITA
> 
> ------------------------------------------------------------------------
> 
> 
>     Statement of Problem
> 
> Many organizations have previously translated content that was authored 
> in non-DITA tools (such as Word and FrameMaker). When migrating their 
> legacy content into the new DITA authoring environment, what does the 
> organization do about their legacy translation memory? This legacy 
> translation memory (TM) was created with large financial investment that 
> can't easily be thrown away simply because a new authoring architecture 
> is being adopted.
> 
> This article describes best practices that will help organizations to 
> use their legacy TM for futre translation projects that are authored in 
> DITA, in order to minimize the expense of translating DITA-based content.
> 
> 
>     Recommended Best Practices
> 
> If you keep the following points in mind, you should be able to maximize 
> your existing translation memory when you send your DITA documents for 
> translation:
> 
>     *
> 
>       Ensure your translation service providor uses a tool that support
>       TMX [is this correct?]. This will ensure you can migrate your TM
>       between TM tools that support the industry standard for TM
>       interchange. This is required not only to free you from dependence
>       on a single translation service provider, but also to allow you to
>       tweak your TM to better match your DITA-based XML source documents
>       you'll be sending for translation.
> 
>     *
> 
>       Provided the structure of the DITA-based content has not changed
>       radically compared to the legacy documents, the TM software should
>       fully match most block elements. As long as the legacy TM aligns
>       with the DITA source at the sentence level, the translation
>       software should be able to fully leverage matching.
> 
>     *
> 
>       Inline elements may not match at all, or may only fuzzy match. If
>       the TM is preprocessed to prepare it for the DITA-based
>       translation project, then inline elements should fully match. Note
>       that a good TM engine should help you recover 70% of the inline
>       tags, which is the main area where matching is prone to fail.
> 
>     *
> 
>       If text entities and/or conrefs are used as containers for
>       reusable text (and they should be!), then these items may not
>       fully match (only fuzzy match). However, since each of these items
>       needs to be translated only once, and should at least fuzzy match,
>       it should not result in significant translation expense.
> 
>     *
> 
>       If you tweak the XLIFF (exported from the legacy TM to better
>       align it with the new DITA content), you should realize an
>       improvement of 10-20% on TM matching. Whether it's worth the
>       effort and expense in doing this depends on the size of the DITA
>       documents to be translated. The idea is [Rofolfo, please correct
>       me if I'm wrong!] to export the TM to XLIFF, process the XLIFF
>       (usually via XSLT) to better align it with your DITA content, and
>       then import the XLIFF back into your TM. Thus, your TM will now be
>       better aligned with your DITA content, which will result in more
>       accurate matching.
> 
>     *
> 
>       One area where things can go wrong is if the legacy content does
>       not use matched tags. Since XML uses matched tags, elements may
>       not be accurately matched. This is particularly true of
>       conditional text marked up in legacy tools (e.g. FrameMaker),
>       where you can expect only fuzzy matching at best, or no matching
>       at worst. Depending on how much conditional contenct the legacy
>       source documents contain, it may be worth preprocessing the TM to
>       ensure all conditional tags are paired. [can this be automated, or
>       would a human have to go through the TM or XLIFF to close the
>       tags? What should we suggest they do here to resolve the issue?]
> 
> [Rodolfo, is there anything I've missed or anything else we should add?]
> 


-- 


email - azydron@xml-intl.com
smail - c/o Mr. A.Zydron
	PO Box 2167
          Gerrards Cross
          Bucks SL9 8XF
	United Kingdom
Mobile +(44) 7966 477 181
FAX    +(44) 1753 480 465
www - http://www.xml-intl.com

This message contains confidential information and is intended only
for the individual named.  If you are not the named addressee you
may not disseminate, distribute or copy this e-mail.  Please
notify the sender immediately by e-mail if you have received this
e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or error-free
as information could be intercepted, corrupted, lost, destroyed,
arrive late or incomplete, or contain viruses.  The sender therefore
does not accept liability for any errors or omissions in the contents
of this message which arise as a result of e-mail transmission.  If
verification is required please request a hard-copy version. Unless
explicitly stated otherwise this message is provided for informational
purposes only and should not be construed as a solicitation or offer.
alarm.zip
References:
- DITA SC Agenda Monday 12 2006
  - From: "JoAnn Hackos" <joann.hackos@comtech-serv.com>