xliff message

Subject: Re: [xliff] Simplified XLIFF element tree
From: Asgeir Frimannsson <asgeirf@redhat.com>
To: xliff <xliff@lists.oasis-open.org>
Date: Tue, 24 Aug 2010 11:37:19 -0400 (EDT)
----- "Rodolfo M. Raya" <rmraya@maxprograms.com> wrote:
> > -----Original Message-----
> > From: Asgeir Frimannsson [mailto:asgeirf@redhat.com]
> > Sent: Tuesday, August 24, 2010 9:47 AM
> > To: xliff
> > Subject: Re: [xliff] Simplified XLIFF element tree
> > 
> > > I agree with your goals. There should be one representation for
> > > segmented content, which could very well be a traditional
> <trans-unit>
> > > with a <source>/<target> pair.
> > >
> > > The 2nd goal can be achieved by keeping the optional unsegmented
> text
> > > separate from the segmented version. In other words, separate from
> the
> > > traditional <trans-unit> that could be used for the first goal.
> > 
> > One concern I have is that we replicate what we have with
> <seg-source> -
> > where we have two different representations of the source content,
> one
> > with segment markers, and one without. I'm not sure what the
> intentions of
> > this was, but I'm guessing it has to do with the ideal of keeping
> the content of
> > <source> immutable - which is a good goal.
> 
> Please separate two things: unsegmented text and translatable
> segments.
> 
> There should be only one representation of source text in translatable
> segments and it should not include segmentation markers.
> 
> If segmentation markers are added or not to unsegmented text during
> segmentation process is a very different thing. I would not alter the
> unsegmented text.

I am trying to understand how your approach would work, but find it very hard to come up with a way of working with 'optional' unsegmented content. I think we do agree that a <trans-unit> should hold the translation of a segment, and it should have access to the source-language segment. 

What do concern me with my earlier mock-example is the verbosity of the model when working with content that is typically always a single segment. For instance:

<body>
  ...
  <ex-unit id='block1'>
    <content xml:space='default'>
      <m type='seg' id='seg1'>This is the first sentence.</m>
    </content>
    <trans-unit seg-id='seg1'>
      <target>Første setning.</target>
    </trans-unit>
  </ex-unit>
  <ex-unit id='block2'>
    <content xml:space='default'>
      <m type='seg' id='seg1'>This is the second sentence.</m>
    </content>
    <trans-unit seg-id='seg1'>
      <target>Andre setning.</target>
    </trans-unit>
  </ex-unit>
  ...
</body>

In that sense, a model more similar to what we have today in trans-unit (but eliminating <seg-source>) would be easier, for instance:

extraction model:
<trans-unit>
  <source>
    This is the first sentence. This is the second sentence.
  </source>
</trans-unit>

after segmentation:

<trans-unit>
  <source>
    <seg id='seg1'>This is the first sentence.</seg>
    <seg id='seg2'>This is the second sentence.</seg>
  </source>
</trans-unit>

after translation:
<trans-unit>
  <source>
    <seg id='seg1'>This is the first sentence.</seg>
    <seg id='seg2'>This is the second sentence.</seg>
  </source>
  <target>
    <seg id='seg1'>Første setning.</seg>
    <seg id='seg2'>Andre setning.</seg>
  </target>
</trans-unit>

With this approach, segments could be optional. This is conceptually very similar to the above, and might be easier to work with. However, there probably need to be some ability to manage state and workflow on the segment level, but that could in a similar way live under the <trans-unit> element and reference the segment ids.

As I mentioned previously in the thread, defining a model for how <source> could be annotated with e.g. segment spans while retaining its immutability should be a goal, as that's probably the easiest way of eliminating <seg-source>. It would be interesting to know to what extent a segment need to be self-contained, or if the management of segments work better within the context of an extraction unit.

cheers,
asgeir
Follow-Ups:
- RE: [xliff] Simplified XLIFF element tree
  - From: Yves Savourel <ysavourel@translate.com>
- RE: [xliff] Simplified XLIFF element tree
  - From: "Rodolfo M. Raya" <rmraya@maxprograms.com>