[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [xliff] Simplified XLIFF element tree
Hello, I do not wish to add substantive input to this excellent thread yet (just lurking, and being impressed at the moment). But I would like to interject a suggestion that we take a look at samples for places where we have IDs that reference IDs, and possibly change them to IDREFs. I think as the complexity builds this will help to keep straight the hierarchy and dependencies. Thanks, Bryan -----Original Message----- From: Asgeir Frimannsson [mailto:asgeirf@redhat.com] Sent: Monday, August 23, 2010 2:09 PM To: xliff Subject: Re: [xliff] Simplified XLIFF element tree Hi Rodolfo, Please see replies inline below. ----- "Rodolfo M. Raya" <rmraya@maxprograms.com> wrote: > If you want to separate "extracted text" from "segmented text", you > can use a new element to contain unsegmented extracted text and the > traditional <trans-unit> to contain the final segments. > > You could represent unsegmented XLIFF with something like: > > <body> > <extr-text id="block-1">Sentence 1. Sentence 2.</extr-text> > <extr-text id="block-2">Sentence 3. Sentence 4.</extr-text> > </body> Yes, this is starting to look like something I would be comfortable with. > And represent the segmented XLIFF with: > > <body> > <extr-text id="block-1" segmented="yes">Sentence 1. Sentence > 2.</extr-text> > <group id="block-1"> > <trans-unit id="block-1_seg-1"> > <source>Sentence 1.</source> > </trans-unit> > <trans-unit id="block-1_seg-2"> > <source>Sentence 2.</source> > </trans-unit> > </group> > <extr-text id="block-1" segmented="yes">Sentence 1. Sentence > 2.</extr-text> > <group id="block-2"> > <trans-unit id="block-2_seg-1"> > <source>Sentence 3.</source> > </trans-unit> > <trans-unit id="block-2_seg-2"> > <source>Sentence 4.</source> > </trans-unit> > </group> > </body> However, the main problem I see with this approach is the lack of encapsulation and connectivity between extracted text and its translation units. Perhaps something similar to this could be created in the extraction process: <body> ... <ex-unit id='block1'> <content xml:space='default'> This is the first sentence. This is the second sentence. </content> </ex-unit> ... </body> Then a process such as segmentation could annotate this content with segment-markers: <body> ... <ex-unit id='block1'> <content xml:space='default'> <m type='seg' id='seg1'>This is the first sentence.</m> <m type='seg' id='seg2'>This is the second sentence.</m> </content> </ex-unit> ... </body> (Perhaps a better example would be a unit where whitespace should be preserved and you'd have a single space character outside of the segment boundaries) From this, translation units could be managed: <body> ... <ex-unit id='block1'> <content xml:space='default'> <m type='seg' id='seg1'>This is the first sentence.</m> <m type='seg' id='seg2'>This is the second sentence.</m> </content> <trans-unit seg-id='seg1'> <target>Første setning.</target> </trans-unit> <trans-unit seg-id='seg2'> <target>Andre setning.</target> </trans-unit> </ex-unit> ... </body> With this, structural elements such as <group> live outside of segmentation, and are used for their intended purpose of representing structure in the original content. > Tools that support XLIFF 1.0 and 1.1 can translate segmented files > simply ignoring the new <extr-text> element. Notice that after > segmentation has been done, the <extr-text> elements could be deleted; > in my example I added an attribute to indicate that the text has been > segmented. As far as I understand, there are no backwards-compatibility requirements for XLIFF 2.0, so we can be creative in the way this is implemented, rather than working around limitations in the old format. > Notice that in any case doing segmentation after the XLIFF has been > created means preparing a new XLIFF document. This is where I believe this approach to segmentation is fundamentally flawed. There should be no need to create a new XLIFF representation for segmented content. It should simply be a processing/annotation step. cheers, asgeir --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]