Subject: RE: [xliff] RE: Inline attributes and canCopy
I don’t think the merger could necessarily resolve the correspondence because it would need the <data> for the target, which the extractor may or may not provide.
I think it would also make life a lot more difficult in general to have mismatched IDs between source and target. Also we wouldn’t be able to tell which code is new vs exist in the source.
So if two codes have the same ID and are of the same kind (spanning or placeholder) that should be enough to indicate that they are corresponding codes.
With that the merger can choose what it does with the data: use the one from the source, or the target, or even from its own internal mechanism.
Hi Yves, all,
After discussing the issue of how to determine what constitutes a ‘corresponding code’ with others I think it is probably too restrictive to even expect most of the attributes listed below to be the same.
Leveraging from TM is a frequent scenario and one cannot expect to get similar attributes for things like canDelete or subFlow. For example, the extractor that created the leveraged content may have had different rules on what can be deleted or not, or a previous version of the same code may have no sub-flow (like <a> without alt in version 1, an <a> with alt in version 2).
Such differences are frequents but they don’t mean the target code is not corresponding to the source code.
Maybe only two information in both codes need to be identical: The id value and the kind of code (spanning or placeholder).
Ultimately it is the merger agent that should decide what to do with the target codes. As long as it can associate a target code to a source one (and only id/kind-of-code are needed for this) it should be fine. Otherwise we may end up prohibiting the processing of files that are being translated and have leveraged target content.
Hi Ryan, all,
> Thanks Yves, when you have something working in Lynx, can you share
> the full implementation with me and we'll look at what we can do to
> at least have parity.
I've implemented a verification that tries to detect when two source/target codes with the same ID are not "corresponding". It’s just my best solution so far, but I’m obviously open to adjustments. I have not yet made a formal release with the change: It would be nice to get the feedback from other implementers and TC members.
Such two codes are seen as corresponding if:
- Both are either spanning (<pc>/<sc><ec>) or both standalone (<ph>)
And if they have, at least, the following properties identical:
They may have different values for all the other properties: the rational to not include the others (like data, dir, subType, etc.) is that they might be changed when they are in the target.
For the annotations: I check for identical values on:
You can see the source code here:
Note that the code performs the verification on the parsed elements, so, for example, there is no distinction between the attributes disp, dispStart and dispEnd: In the object model it's just the disp property of that tag object.
You can play with the new behavior in the online validator: http://okapi-lynx.appspot.com/validation
(The issue Nesho found with <cp hex='7FFFFFFF'/> is also fixed)
There are a few test files (bad_NotCorrespondingCode*.xlf) here:
But we should also add valid test files.