dita-translation message

Subject: DITA Translation Subcommittee Meeting Minutes: 5 June 2006

From: "Gershon L Joseph" <gershon@tech-tav.com>
To: <dita-translation@lists.oasis-open.org>,<mambrose@sdl.com>,<pcarey@lexmark.com>,<rfletcher@sdl.com>,<bhertz@sdl.com>,"'Richard Ishida'" <ishida@w3.org>,<tony.jewtushenko@productinnovator.com>,"'Lieske, Christian'" <christian.lieske@sap.com>,"'Jennifer Linton'" <jennifer.linton@comtech-serv.com>,"'Munshi, Sukumar'" <Sukumar.Munshi@lionbridge.com>,"'Charles Pau'" <charles_pau@us.ibm.com>,<dpooley@sdl.com>,"'Reynolds, Peter'" <Peter.Reynolds@lionbridge.com>,"'Felix Sasaki'" <fsasaki@w3.org>,"'Yves Savourel'" <ysavourel@translate.com>,"'Dave A Schell'" <dschell@us.ibm.com>,"'Bryan Schnabel'" <bryan.s.schnabel@tek.com>,<Howard.Schwartz@trados.com>,<kara@ca.ibm.com>
Date: Wed, 7 Jun 2006 11:27:42 +0300



Best Regards,
Gershon

---
Gershon L Joseph
Member, OASIS DITA and DocBook Technical Committees
Director of Technology and Single Sourcing
Tech-Tav Documentation Ltd.
office: +972-8-974-1569
mobile: +972-57-314-1170
http://www.tech-tav.com

DITA Translation Subcommittee Meeting Minutes: 5 June 2006

(Recorded by Gershon Joseph <gershon@tech-tav.com>)

The DITA Translation Subcommittee met on Monday, 5 June 2006 at 08:00am PT
for 60 minutes.

1.  Roll call

    Present: Kevin Farwell, JoAnn Hackos, Gershon Joseph, Charles Pau, Rodolfo 
             Raya, Felix Sasaki, Yves Savourel, David Walters, Andrzej Zydron,
             Kara Warburton

    Regrets: Don Day

2.  Accepted the minutes of the previous meeting.
    http://lists.oasis-open.org/archives/dita-translation/200605/msg00016.html
    Moved by Rodolfo, seconded by Yves, no objections.

3.  Returning Business:

3.1 Discussion item from Yves Savourel

    "As you may know, the W3C has recently published the Last Call Working Draft 
    for ITS (See [1]) as well as the First Working Draft of a companion 
    document: "Best Practices for XML Internationalization" (See [2]).
    [1] http://www.w3.org/TR/2006/WD-its-20060518/
    [2] http://www.w3.org/TR/2006/WD-xml-i18n-bp-20060518/

    The second document includes examples of how to use ITS with a few specific 
    document types (See the "ITS Applied to Existing Formats" section). In the 
    next draft we would like to include DITA in that list.
    
    The attached file is a try for the possible default rules to process DITA 
    with ITS. We would appreciate very much if some of you had the time to 
    review it and make sure we have not made any mistakes, or forgotten anything. 
    For example, I'm not sure if the dir attribute should be there or not. 
    I'm also not sure if we have all subflow elements listed. Maybe we need 
    two rule sets: one for the current version of DITA and one for the upcoming 
    one (although if there is no conflict and a single rule set could be used 
    that would be better).

    The specification document [1] should help you understand each element in 
    these rules. The Last Call review for the specification ends on June-30. 
    The Best Practices document will still go through several drafts."

    ACTION for everyone to review the ITS proposals for discussion next week.

3.2 Discussion item from Andrzej Zydron

    "LISA OSCAR's latest standard GMX/V (Global Information Metrics eXchange
    - Volume) has been approved and is going through its final public comment 
    phase. GMX/V tackles the issue of word and character counts and how to 
    exchange localization volume information via an XML vocabulary. 

    GMX/V finally provides a verifiable, industry standard for word and 
    character counts. GMX/V mandates XLIFF as the canonical form for word 
    and character counts.

    GMX/V can be viewed at the following location:
    http://www.lisa.org/standards/gmx/GMX-V.html

    Localization tool providers have been consulted and have contributed to 
    this standard. We would appreciate your views/comments on GMX/V."

    Andrzej gave an overview of the standard and background, and requested
    SC members review the standard.

4.  New Business:

    Decide the Best Practices that we need to consider.

    1)  Possibly to maximize usage of conref (reusable blocks)...

        From Nancy Harrison:
        "Boilerplate text is often kept in one or more .dita files used as a 
        source for conrefs across a document set. How should authors / 
        implementers / processors deal with multiple sets of boilerplate files 
        automatically?  DocBook names every file containing generated text 
        with a language extension (two letter only), including English.  A 
        similar scheme, but probably with locale, not just country, would work 
        for DITA documents as well."

        Andrzej: All boilerplate content for a language must be stand-alone.
            Boilerplate text must be stand-alone phrases to avoid problems translating
            it into some languages, where it does not fit into the surrounding text.

        ACTION: Charles will provide an example of typical boilerplate fragments

        JoAnn: What about a conref to non-boilerplate text? How would this
            affect the translation workflow?
        Andrzej: Dependency on the conref target, which would need to be 
            translated before the parent document that refers to the conref 
            is translated. Again, conreffing to an inline element may result 
            in badly translated phrase with respect to its surrounding content, 
            so we should probably be against this. Examples: singular/plural,
            prepositions, acronyms e.g. ABS (antilock breaking system) so if 
            you conref to the text itself, the translated text may not read 
            correctly.
        ACTION: Andrzej to send examples to the group for discussion.

    2)  Handling multi-language documents
        [we did not discuss this further this week, but some members did send
        examples to the list for discussion on-list and at next week's meting]

    3)  Not a best practice, but the DITA to XLIFF and back mechanism needs to 
        be completed.

        Andrzej and Rodolfo have successfully converted DITA to XLIFF and back.
        Rodolfo plans to publish their converter as open source.

    4)  Gershon: what's the best practice for translations for users who move 
            from legacy documentation system to DITA?

        Andrzej: It should still be possible to run against the previous TM.
            Inlines may not match, or may fuzzy match. As long as memories are 
        aligned at the sentence level, it should work (at least leverage matching)

        Kevin confirmed that using TM as-is will give you 10-20% less matching 
            than if you tweak the XLIFF to better match the DITA.

        Rodolfo: A good TM engine should help you recover 70% of the inline tags,
        which is the main problem.
        
        Kevin: so long as they're matched tags; however conditional text marked 
            up in legacy tools (e.g. FrameMaker) will only be fuzzy matched 
            (at best).

        ACTION Gershon to write a draft proposal (with Rodolfo) and submit it 
            to the list for input and technical assistance.
       
Meeting adjourned at 09:00am PT.

---