Subject: RE: [xliff] RE: Inline attributes and canCopy
Regarding the highlighted one, assuming canDelete is yes, I understand that they can delete and re-add, but the fact that all the attributes, except id, are all the same, I would argue it is the same tag, and therefore can prevent them using a different id by failing validation.
If it is valid, then how can I ever validate this
From: Yves Savourel
Sent: 6/18/2015 7:56 PM
To: Ryan King; 'Estreen, Fredrik'; email@example.com
Subject: RE: [xliff] RE: Inline attributes and canCopy
Here is my take. But others should check too.
One note: some attributes (e.g. dataRef* can have different values between source and target)
Please tell me if I have the correct understanding then.
x and y stand for all attributes other than id, where x and y are different values.
In you example (and in all cases really), you know the relationship between source and target inline codes by their id values: A code with id=’1’ in the source corresponds to the code with id=’1’ in the target.
See the id definition in the specification:
Also, I’m not sure I understand the following text in your old message:
“Bottom line seems to be that to satisfy both constraints, we just need to make sure arbitrarily that one of the ids in target match source each time we process and validate, since we won’t really know which was the original, because our tools can’t use copyOf, because we always carry <originalData>.”
The “arbitrarily” part seems wrong: You should always know which source code corresponds to which target code because they have identical ids. And copyOf is to use when a new code is introduced in the target and has no associated originalData, it that case copyOf points to an existing code for which the merger knows how to get the original data.
Hi Frederik, all,
I’m resurrecting this thread. It seems to me that the only way to tell if a target <pc> corresponds to source <pc> is to make sure their attributes, with the exception of id, are identical, then validate the constraint. However, as Frederik mentions below, this constraint is to help mergers when <originalData> is not present so that they know which <pc> tags correspond to which original codes (which may be stored outside of the xliff). BUT if I have:
<source><pc id=”1”><pc id=”2”>text</<pc></pc></source>
With no other attributes in <pc> other than id, how do I know which id to match in source? Is the tag in target <b> or <i>? How can I apply the constraint without knowing the original data?
Thanks for the detailed explanation, Frederik! Somehow I missed the copyOf processing requirement. With your rationale, as long as we have <originalData>, we don’t really need ids for merge, which is our case. Additionally, we decode inline tags back to native codes when we store data in our TMS, so we perform normalization for matching, etc. in a non-XLIFF dependent way.
Bottom line seems to be that to satisfy both constraints, we just need to make sure arbitrarily that one of the ids in target match source each time we process and validate, since we won’t really know which was the original, because our tools can’t use copyOf, because we always carry <originalData>.
Thanks for the help,
I interpret the specification the same way as you do with respect to the IDs and agree with your added sentence clarifying it.
The rule that a an inline element that represent the exact same element in both source and target use the same ID in both locations is there to facilitate merge to native format for agents that do not put native data in the XLIFF document. Without it an agent would not be able to detect reordering or addition of codes. Safe substitution of tags in matches, in system that allow that, also need this. And it and also enables the storage of inline elements in TMs without the actual native code. Storing the native code in the TM offers more options for validation and match transformation so it may be a good thing anyway.
“copyOf” is not really optional. It is required if the copied inline element does not have associated original data.
In “22.214.171.124.1 Duplicating an existing code”:
This requirements makes sure that a merger can always know what an inline element in target means as long as it knows what the meaning of the inline codes in source is. The expectation is that a merger not storing original data would be able to learn the meaning of the source inline elements at merge time through some to XLIFF external method (database, original file, etc..)
If the inline elements have original data associated a comparison of that data will allow re-associating the copies with the originals at least to a degree needed by mergers.
Following the rules of inline IDs and copyOf also allows more tag substitution to happen in matches. I personally believe that use of “copyOf” even for codes that have original data will allow a little bit more known safe substitutions than relying on comparison of original data. Unfortunately we don’t allow that behavior. The case where it makes a difference is if you have two identical inline codes in source and three in target. Which of the source ones is the third target one a copy of? But in most situations that will not be important to know, I have so far only found one TM related situation where it would help. Making it always required would also solve your use case.
The only XLIFF solution to the problem you present is the solution you include. Perform processing at modification time to make sure that the PRs and co-constraints are met. Or operate on a slightly modified model internally where you require the use of copyOf regardless of if the code has native data or not. The have an export / cleanup step that uses copyOf to make sure that one tag has the source ID and finally removes “copyOf” information that is in violation of the spec.