[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [xliff-comment] using xliff to translate html
Hi Brian, I'll try to answer your questions: > Is it intended that the internal structure of > the source/target elements to show up as > elements in the XLIFF DOM? Yes. > If XLIFF is intended to be a holder then I'm > unclear on the advantages of forcing the > source/target data to be well formed XML. Yes, it is the intent of XLIFF to be a holder of text with possibly inline codes. One of the aims of XLIFF is to *abstract* the translation unit that have inline codes, so that, regardless what the original codes are, they can be processed in a uniform way for most localization tasks (translation memory matching, spell-cheching, word counting, terminology extraction, etc.) A small example: Original code in RTF: "The picture is {\b missing}." XLIFF content: <source>The picture is <bpt id='1'>{\b </bpt>missing<ept id='1'>}</ept>.</source> Original code in HTML: "The picture is <B>missing</B>." XLIFF content: <source>The picture is <bpt id='1'><B></bpt>missing<ept id='1'></B></ept>.</source> The idea is that, in both cases, the XLIFF content is equivalent, already parsed (from the original format point of view). In other words: text is already separated from codes. Actually, using the <g> tags you could even write the content for both formats: <source>The picture is <g id='1'>missing</g>.</source> This will allow tools to treat the inline codes without distinction. For example, we could get a 100% match when leveraging the RTF text in a HTML file. > Would there be an advantage to allowing or > making source/target data CDATA? It would > remove the requirement that the source/target > data be well formed XML. In my case this would > make the handling of HTML much much simpler. If we had a content as CDATA: <source><![CDATA[The picture is {\b missing}.]]></source> <source><![CDATA[The picture is <B>missing</B>.]]></source> all the translation tools would have to come up with their own parsing for both formats (and any other format), and this at each time they manipulate the source/target content. The need for pre-parsing come from the goal of having a common way to understand and manipulate the inline codes, regardless of the original format (HTML, RTF, MIF, RC, RESX, Java properties, JSP, Photoshop files, etc.). Keep also in mind that, like for other formats, only the inline elements of HTML (<b>, <em>, img>, etc.) will be in the source/target content, not any of the structural elements (<table>, <li>, <tr>, etc.). From a translation tool viewpoint there is no reason to treat them differently from other format. I hope this is helpful. Kind regards, -yves
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]