[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [xliff] Simplified XLIFF element tree
Hi, Please find my comments below. > -----Original Message----- > From: Asgeir Frimannsson [mailto:asgeirf@redhat.com] > Sent: Monday, August 23, 2010 6:09 PM > To: xliff > Subject: Re: [xliff] Simplified XLIFF element tree replies inline below. > > > > > You could represent unsegmented XLIFF with something like: > > > > <body> > > <extr-text id="block-1">Sentence 1. Sentence 2.</extr-text> > > <extr-text id="block-2">Sentence 3. Sentence 4.</extr-text> > > </body> > > Yes, this is starting to look like something I would be comfortable with. Good. > > And represent the segmented XLIFF with: > > > > <body> > > <extr-text id="block-1" segmented="yes">Sentence 1. Sentence > > 2.</extr-text> > > <group id="block-1"> > > <trans-unit id="block-1_seg-1"> > > <source>Sentence 1.</source> > > </trans-unit> > > <trans-unit id="block-1_seg-2"> > > <source>Sentence 2.</source> > > </trans-unit> > > </group> > > <extr-text id="block-1" segmented="yes">Sentence 1. Sentence > > 2.</extr-text> > > <group id="block-2"> > > <trans-unit id="block-2_seg-1"> > > <source>Sentence 3.</source> > > </trans-unit> > > <trans-unit id="block-2_seg-2"> > > <source>Sentence 4.</source> > > </trans-unit> > > </group> > > </body> > > However, the main problem I see with this approach is the lack of > encapsulation and connectivity between extracted text and its translation > units. In my example each <extr-text> element is associated with a <group> using the same "id". I placed each <group> immediately after the <extr-text> element. I deliberately placed <ext-text> separated from <trans-unit>, even outside the <group> elements. My plan is to ignore <extr-text> or whatever holds unsegmented text in third party XLIFF files at translation time and work only with <trans-units> or whatever holds a segment. > Perhaps something similar to this could be created in the extraction process: > > <body> > ... > <ex-unit id='block1'> > <content xml:space='default'> > This is the first sentence. This is the second sentence. > </content> > </ex-unit> > ... > </body> > > Then a process such as segmentation could annotate this content with > segment-markers: > > <body> > ... > <ex-unit id='block1'> > <content xml:space='default'> > <m type='seg' id='seg1'>This is the first sentence.</m> > <m type='seg' id='seg2'>This is the second sentence.</m> > </content> > </ex-unit> > ... > </body> You are altering the unsegmented text in this approach. > (Perhaps a better example would be a unit where whitespace should be > preserved and you'd have a single space character outside of the segment > boundaries) > > From this, translation units could be managed: > > <body> > ... > <ex-unit id='block1'> > <content xml:space='default'> > <m type='seg' id='seg1'>This is the first sentence.</m> > <m type='seg' id='seg2'>This is the second sentence.</m> > </content> > <trans-unit seg-id='seg1'> > <target>Første setning.</target> > </trans-unit> > <trans-unit seg-id='seg2'> > <target>Andre setning.</target> > </trans-unit> > </ex-unit> > ... > </body> I don't like this idea. Your <trans-unit> elements don't have <source> elements. In an XLIFF file each segment should have a source and a target. Unsegmented text must be optional and independent from translatable segments. In fact, I expect it not to be present in common XLIFF files (my tools will probably never include the unsegmented text). Placing <trans-unit> inside <ex-unit> looks very bad to me. It is like mixing unsegmented with segmented. > With this, structural elements such as <group> live outside of segmentation, > and are used for their intended purpose of representing structure in the > original content. A structure can also mean a paragraph needed to be split into sentences. Don't forget this. Rodolfo -- Rodolfo M. Raya <rmraya@maxprograms.com> Maxprograms http://www.maxprograms.com
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]