xliff-inline message

Subject: RE: [xliff-inline] Splitting non-clonable codes

From: "Estreen, Fredrik" <Fredrik.Estreen@lionbridge.com>
To: Andrew Pimlott <andrew@spartanconsultinginc.com>, Yves Savourel <ysavourel@enlaso.com>
Date: Tue, 14 Feb 2012 22:45:01 +0000

Hi Andrew,

Most tools would not need to care. It is only used to record that a transformation was done so that it can at a later stage be undone. If the tool perform this transform it need to handle the attribute. If the tool does not handle the attribute and the fact that another tool performed the transform on its tag it would most likely error out.

The main case I see is when we need to segment in the middle of a spanning tag and later need to undo the split. As you say, this would not be needed if all tools support both methods. But in my experience that is unlikely to happen. Especially extraction / back conversion tools developed by non CAT tool developers (customers) often require the document to follow a strict subset and any deviation is a blocking problem. In these cases it might be that you sub segment the document as part of your processing and then merge back the segments to the initial segmentation when returning the files. Here it would be useful to be able to also undo the tag conversion. To know which tags need to be changed and which to leave alone some indicator is needed. This is especially true if you use a chain of unrelated tools to do the processing.

<unit>
  <segment>
    <source>This <sc id="1" nid="n1s" />is a <pc id="2" nidstart="n2s" nidend="n2e">sample. Of the</pc> problem <ec id="1" nid="n1e" />case.</source>
  </segment>
</unit>

<unit>
  <segment>
    <source>This <sc id="1" nid="n1s" isolated="yes"/>is a <sc id="2" nid="n2s" isolated="yes" />sample.</source>
  </segment>
  <ignorable>
    <source> </source>
  </ignorable>
  <segment>
    <source>Of the<ec id="2" nid="n2e" isolated="yes" /> problem <ec id="1" nid="n1e" isolated="yes" />case.</source>
  </segment>
</unit>

If you in the above case receive such a file without any additional information it is not possible to know that only the second span should be returned to <pc> form and not the first. Using an attribute to indicate that a tag was transformed allows you to identify only those you need to touch. The rule would be <sc>/ec> pair converted to <pc> get transformed="yes" and vice versa. When you reverse the operation or if you want apply it again the tag returns to it's native form. The flag will act like a toggle.

Since we need to be able to break the <pc> into a pair for this type of segmentation operation when the tag is not clone/copyable the question was raised if we should simply allow it to be done in other cases too.

I do see a potential problem in handling <sc>/ec> pairs that were converted to <pc>; if we allow that. For this to be safe the tool doing that conversion must be sure that the native code follows XML semantics and not a more loose set of rules like SGML or some other unknown format. But at least the information that it was once a pair is there as a hint. For the other direction it is not a problem. A tool that see a <sc>/<ec> pair that was originally a <pc> can be sure the native code should be treated as having XML semantics.

Regards,
Fredrik Estreen
________________________________________
From: xliff-inline@lists.oasis-open.org [xliff-inline@lists.oasis-open.org] on behalf of Andrew Pimlott [andrew@spartanconsultinginc.com]
Sent: Tuesday, February 14, 2012 9:05 PM
To: Yves Savourel
Cc: xliff-inline@lists.oasis-open.org
Subject: Re: [xliff-inline] Splitting non-clonable codes

I'm not sure I understand--what does a tool do if it sees transformed="yes" and can't handle it?  Raise an error?  How is that an improvement?  If you see transformed="yes", does that even let you determine the original form unambiguously?  (Since the flag is binary, it only works if there are only two possible forms.)

My first thought is that this line of thinking is catering to poor implementations, and we just shouldn't do it.  If the standard specifies that two representations are equivalent, implementations should honor both.

But my second thought is that if we do want to make it easier for "limited" implementations, we should specify a "normal form" that uses the simplest markup.  Then, you could put a flag at the top of the file indicating whether it is in normal form.  If a tool accepts only normal form, it can reject non-normalized input and tell you to run it through a normalizer first.

Andrew

On Tue, Feb 14, 2012 at 8:29 AM, Yves Savourel <ysavourel@enlaso.com<mailto:ysavourel@enlaso.com>> wrote:
Hi Bryan, all,

> ... I've been thinking about the quandary expressed
> by Fredrik regarding the difficulty in knowing how
> to preserve the original span elements upon the
> conversion back to original format in cases where
> a split occurred.

Actually we discussed this. One idea that Fredrik suggested would be an optional transformed='yes|no' 'yes' meaning the current notation was modified, 'no' meaning the current notation is the original.

Cheers,
-yves

---------------------------------------------------------------------
To unsubscribe, e-mail: xliff-inline-unsubscribe@lists.oasis-open.org<mailto:xliff-inline-unsubscribe@lists.oasis-open.org>
For additional commands, e-mail: xliff-inline-help@lists.oasis-open.org<mailto:xliff-inline-help@lists.oasis-open.org>

Follow-Ups:
- Re: [xliff-inline] Splitting non-clonable codes
  - From: Andrew Pimlott <andrew@spartanconsultinginc.com>

References:
- RE: [xliff-inline] Permissons attributes
  - From: "Estreen, Fredrik" <Fredrik.Estreen@lionbridge.com>
- RE: [xliff-inline] Splitting non-clonable codes
  - From: "Estreen, Fredrik" <Fredrik.Estreen@lionbridge.com>
- RE: [xliff-inline] Splitting non-clonable codes
  - From: Yves Savourel <ysavourel@enlaso.com>
- Re: [xliff-inline] Splitting non-clonable codes
  - From: Andrew Pimlott <andrew@spartanconsultinginc.com>