RE: [xliff] Section 2.6.7 "Target Content Modification"

Hi Jung,

The sentence in 2.6.7.1 should be removed. Or reworded to say that the user agent may leave the segment without a target.

On 2.6.7.2: Regarding the current rules on splitting and joining. When there is no target it is safe to split the source string into two strings. If there is a target it would need language analysis to find the right point (if possible) to split source and target so that the two new pairs still contain source and target that is linguistically connected. Since this is a very hard thing to do in general that case is forbidden. On the other hand there is no risk that you end up with linguistically non matching source and target nodes if you completely remove the target. Once you have completely removed it you can go on and split the source as you want. There is one other subtle point here, if a tool with proprietary knowledge created the target (2.6.7 “The extraction tool can create the initial target content as it sees fit.”) it means that it could have placed inline tags in it that would not be allowed to be placed by a generic tool. These would be lost if the target is removed.

I think the general idea on sub segmentation so far is that tools are allowed to do it most of the time in order to fit the requirements of their process and that this was seen as desirable in past discussions. I can see why you might want to restrict this ability in order to match your process. And I do not think you are alone in wanting to do so. This all comes down to the dynamic vs. static behavior of the <segment> and <ignorable> elements.

An alternative to the attributes on segment that I feel is cleaner is to leverage the static structure property of <unit>. If you have non sub dividable pieces that you want to preserve they should be put in a <unit> each and in the <unit> you put a single <segment>. If logical grouping is needed, instead of using segmentation you group by using the <group> element. This way you can at the end of the processing chain go over the whole document and merge all <segment> and <ignorable> nodes in each <unit> back into a single <segment>. With the result that you get back to the structure you initially created regardless of what other tools did along the processing path. The advantage here is that it need no extra processing requirements and still allow a certain amount of flexibility to the downstream tools. And the segmentation remains as the extraction tool wanted it.

Regards,

Fredrik Estreen

From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Jung Nicholas Ryoo
Sent: den 6 december 2012 16:41
To: xliff@lists.oasis-open.org
Subject: [xliff] Section 2.6.7 "Target Content Modification" - Split/Merge

Our Translation Infrastructure team has some concerns on the Section 2.6.7 "Target Content Modification".
This mail is about one of the the concerns.

Our main comments in blue, and questions in red. (For non-HTML mail readers, I used "==>" to start a comment, and brackets to mark a question).

The specification says about split/merge actions in the processing requirements.

2.6.7.1 Without an Existing Target

User agents may leave the existing target unchanged. ==> This is contradictory to the title. No existing target.
User agents may split the segment into two segments.
User agents may join the segment with the following one.

2.6.7.2 With an Existing Target

User agents may join the segment with the following segment
==> No PR about splitting segments. Does this mean it is not allowed? If <unit> has one <segment>, then will it be prohibited that agents split the segment? [We would like to prevent any split/merge actions by agent. How?]
User agents may delete the existing target and start over as if working without an existing target ==> If an existing target can be deleted, this implies the processing requirements in 2.6.7.2 can be completely ignored. [How can we prevent it?]

Content in our translation kits is pre-segmented and an agreed processing requirement is that segments should not be altered (further split or merged). As it stands there is no mechanism at unit level in XLIFF 2 that would support our use case/requirement, and we would have to create a proprietary extension to ensure our translation vendors would support that PR, which implies losing xyz (standardisation, interoperability, etc, you name it)

[Suggestions]

Attributes (canJoin, canMerge) at <segment> level to prevent changes in the number of segments inside a <unit>. By default such changes can be allowed.

Or
Alternatively a validation can be designed, but I think this approach requires more sophisticated design.

Thanks
Oracle

xliff message