[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [xliff] Section 2.6.7 "Target Content Modification" - Split/Merge
The sentence in 220.127.116.11 should be removed. Or reworded to say that the user agent may leave the segment without a target.
On 18.104.22.168: Regarding the current rules on splitting and joining. When there is no target it is safe to split the source string into two strings. If there is a target it would need language analysis to find the right point (if possible) to split source and target so that the two new pairs still contain source and target that is linguistically connected. Since this is a very hard thing to do in general that case is forbidden. On the other hand there is no risk that you end up with linguistically non matching source and target nodes if you completely remove the target. Once you have completely removed it you can go on and split the source as you want. There is one other subtle point here, if a tool with proprietary knowledge created the target (2.6.7 “The extraction tool can create the initial target content as it sees fit.”) it means that it could have placed inline tags in it that would not be allowed to be placed by a generic tool. These would be lost if the target is removed.
I think the general idea on sub segmentation so far is that tools are allowed to do it most of the time in order to fit the requirements of their process and that this was seen as desirable in past discussions. I can see why you might want to restrict this ability in order to match your process. And I do not think you are alone in wanting to do so. This all comes down to the dynamic vs. static behavior of the <segment> and <ignorable> elements.
An alternative to the attributes on segment that I feel is cleaner is to leverage the static structure property of <unit>. If you have non sub dividable pieces that you want to preserve they should be put in a <unit> each and in the <unit> you put a single <segment>. If logical grouping is needed, instead of using segmentation you group by using the <group> element. This way you can at the end of the processing chain go over the whole document and merge all <segment> and <ignorable> nodes in each <unit> back into a single <segment>. With the result that you get back to the structure you initially created regardless of what other tools did along the processing path. The advantage here is that it need no extra processing requirements and still allow a certain amount of flexibility to the downstream tools. And the segmentation remains as the extraction tool wanted it.