xliff message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: RE: [xliff] XLIFF 2.0 example files for segmentation
- From: David Walters <waltersd@us.ibm.com>
- To: Yves Savourel <ysavourel@enlaso.com>
- Date: Thu, 10 Nov 2011 09:48:20 -0600
I'm sorry my first note was so confusing, but I think that the discussion has been good.
In my original note, there were 5 different XLIFF examples shown:
- XLIFF created by an extraction tool (tool A). 2 variations provided depending on whether <segment> was required (core) or not (module).
File 1. <segment> element was not included.
File 2. <segment> was included. Translatable content of <unit> is the same as the content of <segment>.
- XLIFF file created and used in translation tool (tool B), created from step A XLIFF file.
File 3. Contains sentence segmented text and translated into Spanish.
- XLIFF file returned to product after translation.
File 4. Expected output if file 1 was used as input.
File 5. Expected output if file 2 was used as input..
Rodolfo:
<source> can be a child of both <unit> and <segment>.
David:
Yes, if <segment> were not core. <unit> would have children of either 1 <source> element, or 1+ <segment> and <ignorable> elements. I was primarily think about simplicity for the creator of XLIFF rather than the complexity of implementing the function in a tool.
Yves:
A realistic scenario, in my opinion, would be for the third file (translated and sentence-segmented) to come back to the product developer. And the merging tool should be able to work with it.
[XLIFF examples] 1 and 4, and 5 is not needed. But the merger should work with either 3 or 5.
David:
The extraction tool (tool A) and the merge tool are probably the same tool. They have to have the same processing rules in order to extract and replace the text the same way. What you are implying is that the extraction tool could create the XLIFF file using only the "core" elements, but the associated merge tool would have to be aware of all of the core and module elements in order to accurately process the translated XLIFF file. That seems like an unrealistic expectation. Can a merge tool be expected to handle every possible change to an XLIFF file which any translation tool makes to that file? That would be difficult to develop and thoroughly test.
Rodolpho:
In essence, using <segment> moves segmentation from text extraction domain to translation domain. Tools that create XLIFF files don't need to worry about segmentation as that is something that translators would be able to adjust at translation time.
David:
If tools that create XLIFF files don't need to worry about segmentation, then why is the <segment> element required. Requiring <segment> seems to be because it will make later processing (further segmentation) easier for other tools used later in the process. Will there be other "module" features which would benefit from having a "stub" element added to the core?
Rodolfo:
The elements that contain segment text and allow freedom in segmentation process should be part of the core. Infrastructure is a must have, the process is something optional.
David:
Yes, I can agree that infrastructure is critical. So you are implying that anything which may be essential for possible future processing must be part of the core? That may be difficult to use as a criteria for core versus module.
David:
A couple of general comments:
- All tools processing an XLIFF file has to assume that all core and module features are completely supported.
Is this really the case? I had not been thinking along this line. If so, what is the value of having a core and modules if you always have to support all of the modules too?
- Core.
One of the most basic XLIFF functions is when the XLIFF file is created from a non-XLIFF file. So an idea for how to determine what is "core" or "module" could be based on this:
XML elements and attributes required to define the extracted translatable text from a non-XLIFF file so that the translated content can be integrated back into that original file format.
David
Corporate Globalization Tool Development
EMail: waltersd@us.ibm.com
Phone: (507) 253-7278, T/L:553-7278, Fax: (507) 253-1721
CHKPII: http://w3-03.ibm.com/globalization/page/2011
TM file formats: http://w3-03.ibm.com/globalization/page/2083
TM markups: http://w3-03.ibm.com/globalization/page/2071
Yves Savourel ---11/09/2011 05:34:53 PM---> So the scenario I pulled from Dave's example, > file with <segment>s is refactored as file

| 
Yves Savourel <ysavourel@enlaso.com> |

| 
"'XLIFF TC'" <xliff@lists.oasis-open.org> |

| 
11/09/2011 05:34 PM |

| 
RE: [xliff] XLIFF 2.0 example files for segmentation |

| 
<xliff@lists.oasis-open.org> |
> So the scenario I pulled from Dave's example,
> file with <segment>s is refactored as file
> with different <segments>s but has to be
> restored to original <segments> is a chimera.
A bad dream indeed.
> ..then Dave's example would need to lose
> snippets two and five. Is that right?
1 and 4, and 5 is not needed. But the merger should work with either 3 or 5.
> If segmentation is moved to the translation domain
> and adjusted at translation time, aren't we saying
> that segmentation is really a function of the import
> filter (and hence beyond the scope of XLIFF),
> just as it would be for any other file format?
> I don't need to indicate segmentation in Word or
> IDML, so what makes XLIFF different in this regard?
Well the "re-" segmentation is moved out of the extraction tool domain. But there is a block-level segmentation too: You do use it in Word and IDML: paragraphs, footnotes, table cell, etc.
Likewise that "block"-level segmentation is done by the extraction tool: <unit> hold a "block" which initially is stored in a single <segment>.
Why use both <unit> and <segment> initially? Why not just <unit> like in David's snippets 1 and 4?
Because it's just more efficient. Maybe forget about block/sentence and see it that way: until it (optionally) gets segmented further that content is the "segment" for that unit. So why shouldn't be in <segment>?
Think about Word and pages: your content is in a single page or several: you decide where the page breaks are. Even when there are no page break there is a page. A <segment> is similar to a page.
Cheers,
-ys
---------------------------------------------------------------------
To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: xliff-help@lists.oasis-open.org
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]