xliff message

Subject: RE: [xliff] XLIFF 2.0 Core
From: "Rodolfo M. Raya" <rmraya@maxprograms.com>
To: <xliff@lists.oasis-open.org>
Date: Wed, 6 Apr 2011 09:20:28 -0300
Hi Yves,

To facilitate handling, the elements containing source and target text should have a common parent, something like this:

	<segment>
	   <source> text</source>
	  <target>translation</target>
	</segment>

With the above structure we can clearly associate the source text with its translation. It does not require mapping via attributes.

I don't care if the translatable chunk of text is called a "segment", "translation-unit" or whatever. However, if there is interest on differentiating between segmented and un-segmented text there should also be a very clear definition of what constitutes the text to translate, what is a translation unit and how it is different from a segment. 

For me, the real issue we need to consider is the ability to take a paragraph extracted as a unit and split it into sentences, being able to merge them again before conversion to original format without loss. This case would support changing segmentation at translation time.

Suppose we start with a simple paragraph like this:

	John D. Williams went to the park. He watched the birds. John enjoys nature.

A tool can extract the paragraph as:

	<unit>
	  <segment>
	    <source>John D. Williams went to the park. He watched the birds. John enjoys nature.</source>
	    <target></target>
	  </segment>
	</unit>

A process may think it would be better to segment on sentences and does changes the <unit> to:

	<unit>
	  <segment>
	    <source> John D.</source>
	    <target></target>
	  </segment>
	  <segment>
	    <source> Williams went to the park. </source>
	    <target></target>
	  </segment>
	  <segment>
	    <source> He watched the birds.</source>
	    <target></target>
	  </segment>
	  <segment>
	    <source> John enjoys nature.</source>
	    <target></target>
	  </segment>
	</unit>

A translator notices that there is a segmentation error and joins the two initial segments. We get this: 

	<unit>
	  <segment>
	    <source> John D. Williams went to the park. </source>
	    <target></target>
	  </segment>
	  <segment>
	    <source> He watched the birds.</source>
	    <target></target>
	  </segment>
	  <segment>
	    <source> John enjoys nature.</source>
	    <target></target>
	  </segment>
	</unit>

There we have all we need to recreate the original <unit> by merging all <source> and <target> elements. If translations for the segments were included, we could also generate a translation for the paragraph.

The example described above could be improved by adding optional elements between segments that would hold stuff we don't want translators to see, like the spaces at the start of a sentence or perhaps some formatting that applies to the whole sentence.

Do you agree on the model/processes described above? 

Regards,
Rodolfo
--
Rodolfo M. Raya   <rmraya@maxprograms.com>
Maxprograms      http://www.maxprograms.com


> -----Original Message-----
> From: Yves Savourel [mailto:ysavourel@translate.com]
> Sent: Wednesday, April 06, 2011 8:42 AM
> To: xliff@lists.oasis-open.org
> Subject: [xliff] XLIFF 2.0 Core
> 
> To follow up on the teleconference discussion about the core.
> 
> One possible way to move forward can be to define the basic unit of
> extraction and build from there.
> 
> 
> ===== Segments
> 
> I would argue that the segments needs to be part of that basic structure.
> 
> The main reason for this is that if the segmentation representation is done
> through some optional structure, such structure would not be able to be as
> simple and as integrated as if it is part of the core.
> 
> --- What if the content is not "segmented"?
> 
> It's fine: Even if no segmentation process has been applied to a content, the
> result of the extraction of an item constitutes already a segment. The
> content of an extracted unit is simply made of at least one segment.
> 
> This has several advantages:
> 
> - there is no differences between accessing a segmented content or one that
> is not.
> - any property applicable to a segment can be set at the proper level right
> from extraction.
> - there is no reason to duplicate of the content.
> 
> If there is a need to know whether a content has been through a
> segmentation process or not, we could also have an attribute for this.
> 
> 
> --- What about tools that do not handle "segments"?
> 
> Such tool would import the XLIFF data in a way that each XLIFF segment
> corresponds to one of the basic unit for that tool.
> 
> I suppose another option for such tool could be to re-assemble all the
> segments of the unit and use that as the basic unit. Segmentation change is
> one of the aspects that should not cause problem for the original tool to
> merge back the extracted text.
> 
> 
> ===== Representation
> 
> I see two possible main ways to represent this: grouping by segments or
> grouping by language.
> 
> All segments of the same language grouped together:
> 
> <unit id='1'>
>  <source>
>   <seg id='1'>source segment 1</seg>
>   <seg id='2'>source segment 2</seg>
>  </source>
>  <target>
>   <seg id='1'>target segment 1</seg>
>   <seg id='2'>target segment 2</seg>
>  </target>
> </unit>
> 
> Or all the languages of the same segment grouped together:
> 
> <unit id='1'>
>   <seg id='1'>
>    <source>source content 1</source>
>    <target>target content 1</target>
>   </seg>
>   <seg id='2'>
>    <source>source content 2</source>
>    <target>target content 2</target>
>   </seg>
> </unit>
> 
> They both have small advantages and drawbacks.
> But the important point is the extra segment level 2.0 would introduce.
> 
> Cheers,
> -ys
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-
> open.org/apps/org/workgroup/portal/my_workgroups.php
Follow-Ups:
- RE: [xliff] XLIFF 2.0 Core
  - From: Yves Savourel <ysavourel@translate.com>
References:
- XLIFF 2.0 Core
  - From: Yves Savourel <ysavourel@translate.com>