xliff-inline message

Subject: RE: [xliff-inline] Splitting non-clonable codes

From: "Estreen, Fredrik" <Fredrik.Estreen@lionbridge.com>
To: Yves Savourel <ysavourel@enlaso.com>, "xliff-inline@lists.oasis-open.org" <xliff-inline@lists.oasis-open.org>
Date: Mon, 13 Feb 2012 16:42:39 +0000

Hi Yves, all

The problem I see with the current standard and so far the 2.0 work is that there is no standard way to see that the pair was once a <pc>/<g> tag. This is necessary to be able to undo the transform. You want to undo the conversions you once did but might not want to convert a <sc>/<ec> pair that was not originally a <pc> into a <pc> tag. There is also no wording allowing a tool to do the transform. 

I can imagine cases where you get an XLIFF file from some very picky source and want to process it through a chain of tools and then deliver the result back with as little structural change as possible compared to what you got initially. Today you can solve it by adding your own extension attributes during processing and remove before you deliver it back. But that limits your processing chain to a specific set of tools (your own) in most cases.

It might be possible that we can see that the pair was once a <pc> by other means, the extra attribute is just one way to do it. In any case I think we need to explicitly specify that an intermediate tool is allowed to do this transform and when it is ok, instead of the 1.2 wording which does not allow segmenting in the middle of the span.

I did overlook the nidStart and nidEnd pair for the <pc> which should make back conversion of the split tag easier but does not generally affect intermediate tools with little or no knowledge of native codes. I think it could also in many cases help give a better display in editors if the inline code was represented in a standard way. But it does not solve the undo or identification of the pair as once a spanning tag.

If it is possible to identify that a start / end pair was once a spanning tag that in itself convey a bit of information on the semantics of the native code. That it follow XML rules with respect to nesting and overlap for example. But this is a minor point.

The case where you want to re-group text in the translation and need to create a clone is not possible with non- clonable inline tags. That is I would say the whole point of non-clonable tags. They should ideally only represent truly non duplicable things and as such not need cloning. Or items where the back converting tool is unable to generate a copy because of constraints in the tool or format. As far as possible non placeholder tags should be clonable. The burden of this falls on the extracting and back converting tool. We can only recommend that tools avoid creating unnecessary non-clonable tags.

Perhaps we should allow tools to freely convert between the <pc> and <sc>/<ec> pair in any circumstance not just when segmenting in a span?

Regards,
Fredrik Estreen

-----Original Message-----
From: xliff-inline@lists.oasis-open.org [mailto:xliff-inline@lists.oasis-open.org] On Behalf Of Yves Savourel
Sent: den 13 februari 2012 16:22
To: xliff-inline@lists.oasis-open.org
Subject: [xliff-inline] Splitting non-clonable codes

Hi Fredrik, all

> "This is an example of <pc id="1" nid="cm1">the issue.
> Where a tag spans</pc> two sentences."
>
> I propose that when we split this into two segments we lower the <pc> 
> into a <sc>/<ec> pair. ...
> "This is an example of <sc id="sc1" nid="cm1" 
> isolated="yes" orig-g-id="1">the issue.", " ", "Where a tag spans<ec 
> id="ec1" nid="cm1" isolated="yes"
> orig-g-id="1"> two sentences."

I think we already handle this type of split.

Maybe the problem you saw was about the original code? You use nid in <pc>, but currently we have actually nidStart and nidEnd for <pc>. (See "Summary" table in http://tools.oasis-open.org/version-control/svn/xliff/trunk/inline-markup/inlineMarkupWorkingDraft.html)

So before the segmentation:

<unit>
 <segment>
  <source>This is an example of <pc id="1" nidStart='cm1s' nidEnd='cm1e'>the issue. Where a tag spans</pc> two sentences.</source>  </segment> </unit>


After the segmentation:

<unit>
 <segment>
  <source>This is an example of <sc id="1" nid="cm1s" isolated="yes"/>the issue.</source>  </segment>  <ignorable>
  <source> </source>
 </ignorable>
 <segment>
  <source>Where a tag spans<ec id="1" nid="cm1e" isolated="yes"/> two sentences.</source>  </segment> </unit>

There is only one inline code with id="1" in that <unit>, it's just represented by one <pc> element in one case, and by two elements (one <sc/> and one <ec/>) in the second case.

I don't think we need something like orig-g-id="1" to recreate the <pc> if needed later. But maybe I'm missing something.


This said. I think the non-clonable code issue still exists. But the problem is not created when we segment, it exists when a translation needs to split a single span-line code into two parts:

For example: "Äter <b>katter möss</b>?" into "Do <b>cats</b> eat <b>mice</b>?"

If instead of "<b>...</b>" we have some non-clonable code from some format (I can't think of any now, maybe MIF or IDML?), then I'm not sure how the issue could be resolved.


Cheers,
-yves



---------------------------------------------------------------------
To unsubscribe, e-mail: xliff-inline-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: xliff-inline-help@lists.oasis-open.org

Follow-Ups:
- RE: [xliff-inline] Splitting non-clonable codes
  - From: Yves Savourel <ysavourel@enlaso.com>

References:
- RE: [xliff-inline] Permissons attributes
  - From: "Estreen, Fredrik" <Fredrik.Estreen@lionbridge.com>
- Splitting non-clonable codes
  - From: Yves Savourel <ysavourel@enlaso.com>