Subject: RE: [xliff] RE: [QUAR] RE: [xliff] RE: Inline attributes and canCopy
Hi Yves, all,
Thanks Yves, for the conversation. So, if there are matching IDs between source and target inline codes, then the type of code must match. This makes sense and is something that can be easily validated. We are in agreement then on this constraint. However, I think it would create less confusion for other implementers if the meaning of correspondence was clarified in the spec to reflect this.
Hi Ryan, all,
I guess it is like the translation itself: The specification says the <target> contains the translated text of the <source>, but there is no real way to valid this. Those inline codes are the same: If they have the same ID we can at least verify they are also of the same kind (spanning/placeholder) and that is it: that is what define them as ‘corresponding’.
As for the text “The inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist.” It just says that an ID value can be duplicated in the source and target only when the two codes are corresponding codes.
There is probably a better way to express that.
But at least it says that if a code has an ID value that does not exists in the source it is not a “corresponding code”, and therefore it’s an added one.
OK, if we go with correspondence being defined as the same kind of code and the same ID, then what is the purpose of validating that the corresponding codes have the same ID if that was already a criteria to correspondence? It makes even less sense to me now how to validate this constraint.
I don’t think the merger could necessarily resolve the correspondence because it would need the <data> for the target, which the extractor may or may not provide.
I think it would also make life a lot more difficult in general to have mismatched IDs between source and target. Also we wouldn’t be able to tell which code is new vs exist in the source.
So if two codes have the same ID and are of the same kind (spanning or placeholder) that should be enough to indicate that they are corresponding codes.
With that the merger can choose what it does with the data: use the one from the source, or the target, or even from its own internal mechanism.
Hi Yves, all,
After discussing the issue of how to determine what constitutes a ‘corresponding code’ with others I think it is probably too restrictive to even expect most of the attributes listed below to be the same.
Leveraging from TM is a frequent scenario and one cannot expect to get similar attributes for things like canDelete or subFlow. For example, the extractor that created the leveraged content may have had different rules on what can be deleted or not, or a previous version of the same code may have no sub-flow (like <a> without alt in version 1, an <a> with alt in version 2).
Such differences are frequents but they don’t mean the target code is not corresponding to the source code.
Maybe only two information in both codes need to be identical: The id value and the kind of code (spanning or placeholder).
Ultimately it is the merger agent that should decide what to do with the target codes. As long as it can associate a target code to a source one (and only id/kind-of-code are needed for this) it should be fine. Otherwise we may end up prohibiting the processing of files that are being translated and have leveraged target content.
Hi Ryan, all,
> Thanks Yves, when you have something working in Lynx, can you share
> the full implementation with me and we'll look at what we can do to
> at least have parity.
I've implemented a verification that tries to detect when two source/target codes with the same ID are not "corresponding". It’s just my best solution so far, but I’m obviously open to adjustments. I have not yet made a formal release with the change: It would be nice to get the feedback from other implementers and TC members.
Such two codes are seen as corresponding if:
- Both are either spanning (<pc>/<sc><ec>) or both standalone (<ph>)
And if they have, at least, the following properties identical:
They may have different values for all the other properties: the rational to not include the others (like data, dir, subType, etc.) is that they might be changed when they are in the target.
For the annotations: I check for identical values on:
You can see the source code here:
Note that the code performs the verification on the parsed elements, so, for example, there is no distinction between the attributes disp, dispStart and dispEnd: In the object model it's just the disp property of that tag object.
You can play with the new behavior in the online validator: http://okapi-lynx.appspot.com/validation
(The issue Nesho found with <cp hex='7FFFFFFF'/> is also fixed)
There are a few test files (bad_NotCorrespondingCode*.xlf) here:
But we should also add valid test files.