OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [xliff] Simplified XLIFF element tree


Hi,

Please find my comments below.



> -----Original Message-----
> From: Asgeir Frimannsson [mailto:asgeirf@redhat.com]
> Sent: Monday, August 23, 2010 6:09 PM
> To: xliff
> Subject: Re: [xliff] Simplified XLIFF element tree
replies inline below.
> 
> >
> > You could represent unsegmented XLIFF with something like:
> >
> > <body>
> >   <extr-text id="block-1">Sentence 1. Sentence 2.</extr-text>
> >   <extr-text id="block-2">Sentence 3. Sentence 4.</extr-text>
> > </body>
> 
> Yes, this is starting to look like something I would be comfortable with.

Good. 


 
> > And represent the segmented XLIFF with:
> >
> > <body>
> >    <extr-text id="block-1" segmented="yes">Sentence 1. Sentence
> > 2.</extr-text>
> >    <group id="block-1">
> >     <trans-unit id="block-1_seg-1">
> >       <source>Sentence 1.</source>
> >     </trans-unit>
> >     <trans-unit id="block-1_seg-2">
> >       <source>Sentence 2.</source>
> >     </trans-unit>
> >   </group>
> >    <extr-text id="block-1" segmented="yes">Sentence 1. Sentence
> > 2.</extr-text>
> >    <group id="block-2">
> >     <trans-unit id="block-2_seg-1">
> >       <source>Sentence 3.</source>
> >     </trans-unit>
> >     <trans-unit id="block-2_seg-2">
> >       <source>Sentence 4.</source>
> >     </trans-unit>
> >   </group>
> > </body>
> 
> However, the main problem I see with this approach is the lack of
> encapsulation and connectivity between extracted text and its translation
> units.


In my example each <extr-text> element is associated with a <group> using the same "id". I placed each <group> immediately after the <extr-text> element.

I deliberately placed <ext-text> separated from <trans-unit>, even outside the <group> elements. My plan is to ignore <extr-text> or whatever holds unsegmented text in third party XLIFF files at translation time and work only with <trans-units> or whatever holds a segment.
 



> Perhaps something similar to this could be created in the extraction process:
> 
> <body>
>   ...
>   <ex-unit id='block1'>
>     <content xml:space='default'>
>       This is the first sentence. This is the second sentence.
>     </content>
>   </ex-unit>
>   ...
> </body>
>
> Then a process such as segmentation could annotate this content with
> segment-markers:
> 
> <body>
>   ...
>   <ex-unit id='block1'>
>     <content xml:space='default'>
>       <m type='seg' id='seg1'>This is the first sentence.</m>
>       <m type='seg' id='seg2'>This is the second sentence.</m>
>     </content>
>   </ex-unit>
>   ...
> </body>

You are altering the unsegmented text in this approach.

 
> (Perhaps a better example would be a unit where whitespace should be
> preserved and you'd have a single space character outside of the segment
> boundaries)
> 
> From this, translation units could be managed:
> 
> <body>
>   ...
>   <ex-unit id='block1'>
>     <content xml:space='default'>
>       <m type='seg' id='seg1'>This is the first sentence.</m>
>       <m type='seg' id='seg2'>This is the second sentence.</m>
>     </content>
>     <trans-unit seg-id='seg1'>
>       <target>Første setning.</target>
>     </trans-unit>
>     <trans-unit seg-id='seg2'>
>       <target>Andre setning.</target>
>     </trans-unit>
>   </ex-unit>
>   ...
> </body>


I don't like this idea. 

Your <trans-unit> elements don't have <source> elements. In an XLIFF file each segment should have a source and a target.

Unsegmented text must be optional and independent from translatable segments. In fact, I expect it not to be present in common XLIFF files (my tools will probably never include the unsegmented text).

Placing <trans-unit> inside <ex-unit> looks very bad to me. It is like mixing unsegmented with segmented. 



> With this, structural elements such as <group> live outside of segmentation,
> and are used for their intended purpose of representing structure in the
> original content.


A structure can also mean a paragraph needed to be split into sentences. Don't forget this.


Rodolfo
--
Rodolfo M. Raya   <rmraya@maxprograms.com>
Maxprograms      http://www.maxprograms.com

 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]