xliff message

Subject: RE: [xliff] Simplified XLIFF element tree
From: <bryan.s.schnabel@tektronix.com>
To: <asgeirf@redhat.com>, <xliff@lists.oasis-open.org>
Date: Mon, 23 Aug 2010 14:59:06 -0700
Hello,

I do not wish to add substantive input to this excellent thread yet (just lurking, and being impressed at the moment).

But I would like to interject a suggestion that we take a look at samples for places where we have IDs that reference IDs, and possibly change them to IDREFs. I think as the complexity builds this will help to keep straight the hierarchy and dependencies.

Thanks,

Bryan

-----Original Message-----
From: Asgeir Frimannsson [mailto:asgeirf@redhat.com] 
Sent: Monday, August 23, 2010 2:09 PM
To: xliff
Subject: Re: [xliff] Simplified XLIFF element tree

Hi Rodolfo,

Please see replies inline below.

----- "Rodolfo M. Raya" <rmraya@maxprograms.com> wrote:
> If you want to separate "extracted text" from "segmented text", you
> can use a new element to contain unsegmented extracted text and the
> traditional <trans-unit> to contain the final segments.
> 
> You could represent unsegmented XLIFF with something like:
> 
> <body>
>   <extr-text id="block-1">Sentence 1. Sentence 2.</extr-text>
>   <extr-text id="block-2">Sentence 3. Sentence 4.</extr-text>
> </body>

Yes, this is starting to look like something I would be comfortable with. 

> And represent the segmented XLIFF with:
> 
> <body>
>    <extr-text id="block-1" segmented="yes">Sentence 1. Sentence
> 2.</extr-text>
>    <group id="block-1">
>     <trans-unit id="block-1_seg-1">
>       <source>Sentence 1.</source>
>     </trans-unit>
>     <trans-unit id="block-1_seg-2">
>       <source>Sentence 2.</source>
>     </trans-unit>
>   </group>
>    <extr-text id="block-1" segmented="yes">Sentence 1. Sentence
> 2.</extr-text>
>    <group id="block-2">
>     <trans-unit id="block-2_seg-1">
>       <source>Sentence 3.</source>
>     </trans-unit>
>     <trans-unit id="block-2_seg-2">
>       <source>Sentence 4.</source>
>     </trans-unit>
>   </group>
> </body>

However, the main problem I see with this approach is the lack of encapsulation and connectivity between extracted text and its translation units. 

Perhaps something similar to this could be created in the extraction process:

<body>
  ...
  <ex-unit id='block1'>
    <content xml:space='default'>
      This is the first sentence. This is the second sentence.
    </content>
  </ex-unit>
  ...
</body>

Then a process such as segmentation could annotate this content with segment-markers:

<body>
  ...
  <ex-unit id='block1'>
    <content xml:space='default'>
      <m type='seg' id='seg1'>This is the first sentence.</m>
      <m type='seg' id='seg2'>This is the second sentence.</m>
    </content>
  </ex-unit>
  ...
</body>

(Perhaps a better example would be a unit where whitespace should be preserved and you'd have a single space character outside of the segment boundaries)

From this, translation units could be managed:

<body>
  ...
  <ex-unit id='block1'>
    <content xml:space='default'>
      <m type='seg' id='seg1'>This is the first sentence.</m>
      <m type='seg' id='seg2'>This is the second sentence.</m>
    </content>
    <trans-unit seg-id='seg1'>
      <target>Første setning.</target>
    </trans-unit>
    <trans-unit seg-id='seg2'>
      <target>Andre setning.</target>
    </trans-unit>
  </ex-unit>
  ...
</body>

With this, structural elements such as <group> live outside of segmentation, and are used for their intended purpose of representing structure in the original content.

> Tools that support XLIFF 1.0 and 1.1 can translate segmented files
> simply ignoring the new <extr-text> element. Notice that after
> segmentation has been done, the <extr-text> elements could be deleted;
> in my example I added an attribute to indicate that the text has been
> segmented.

As far as I understand, there are no backwards-compatibility requirements for XLIFF 2.0, so we can be creative in the way this is implemented, rather than working around limitations in the old format.

> Notice that in any case doing segmentation after the XLIFF has been
> created means preparing a new XLIFF document. 

This is where I believe this approach to segmentation is fundamentally flawed. There should be no need to create a new XLIFF representation for segmented content. It should simply be a processing/annotation step.

cheers,
asgeir

---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
References:
- Re: [xliff] Simplified XLIFF element tree
  - From: Asgeir Frimannsson <asgeirf@redhat.com>