RE: [xliff] RE: Inline attributes and canCopy

Hi Ryan,

Here is my take. But others should check too.

One note: some attributes (e.g. dataRef* can have different values between source and target)

Valid	Source	Target
Yes		<pc Id=1 x>	OK (added target code)
Yes	<pc Id=1 x>		OK (if canDelete='yes' for <pc id=1>)
Yes	<pc Id=1 x>	<pc Id=1 x><pc Id=2 x>	OK (added target code)
No	<pc Id=1 x>	<pc Id=1 y>	Correct: This is not valid
No	<pc Id=1 x>	<pc Id=1 y><pc Id=2 x>	Correct: This is not valid because some attributes in <pc id=1> must be the same in source and target
Yes	<pc Id=1 x>	<pc Id=1 x><pc Id=2 y>	OK
No	<pc Id=1 x>	<pc Id=2 x>	No is wrong: This is valid. If canDelete='yes' for <pc id=1> then in the target you can delete it and add a new code that has the same attributes (except of id). But it will be seen as an added code. It is probably not something one wants to do, but it is difficult to prevent it.
Yes	<pc Id=1 x>	<pc Id=2 y>	OK (if canDelete='yes' for <pc id=1>)
No	<pc Id=1 x>	<pc Id=1 x><pc Id=1 x>	Correct: This is not valid
Yes	<pc Id=1 x>	<pc Id=1 x><pc Id=2 x>	OK

Cheers,

-yves

From: Ryan King [mailto:ryanki@microsoft.com]
Sent: Friday, June 19, 2015 2:09 AM
To: Yves Savourel; Estreen, Fredrik (Fredrik.Estreen@lionbridge.com); 'xliff@lists.oasis-open.org'
Subject: RE: [xliff] RE: Inline attributes and canCopy

Hi Yves,

Please tell me if I have the correct understanding then.

Valid	Source	Target
Yes		<pc Id=1 x>
Yes	<pc Id=1 x>
Yes	<pc Id=1 x>	<pc Id=1 x><pc Id=2 x>
No	<pc Id=1 x>	<pc Id=1 y>
No	<pc Id=1 x>	<pc Id=1 y><pc Id=2 x>
Yes	<pc Id=1 x>	<pc Id=1 x><pc Id=2 y>
No	<pc Id=1 x>	<pc Id=2 x>
Yes	<pc Id=1 x>	<pc Id=2 y>
No	<pc Id=1 x>	<pc Id=1 x><pc Id=1 x>
Yes	<pc Id=1 x>	<pc Id=1 x><pc Id=2 x>

x and y stand for all attributes other than id, where x and y are different values.

Thanks,

Ryan

From: Yves Savourel [mailto:ysavourel@enlaso.com]
Sent: Tuesday, June 16, 2015 8:26 PM
To: Ryan King; 'Estreen, Fredrik'; xliff@lists.oasis-open.org
Subject: RE: [xliff] RE: Inline attributes and canCopy

Hi Ryan,

In you example (and in all cases really), you know the relationship between source and target inline codes by their id values: A code with id=’1’ in the source corresponds to the code with id=’1’ in the target.

See the id definition in the specification:

When used in <segment>, <ignorable>, <mrk>, <sm>, <pc>, <sc>, <ec>, or <ph> elements:
- The inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist.
- Except for the above exception, the value MUST be unique among all of the above within the enclosing <unit> element.

Also, I’m not sure I understand the following text in your old message:

“Bottom line seems to be that to satisfy both constraints, we just need to make sure arbitrarily that one of the ids in target match source each time we process and validate, since we won’t really know which was the original, because our tools can’t use copyOf, because we always carry <originalData>.”

The “arbitrarily” part seems wrong: You should always know which source code corresponds to which target code because they have identical ids. And copyOf is to use when a new code is introduced in the target and has no associated originalData, it that case copyOf points to an existing code for which the merger knows how to get the original data.

Cheers,

-yves

From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Ryan King
Sent: Wednesday, June 17, 2015 12:36 AM
To: Estreen, Fredrik (Fredrik.Estreen@lionbridge.com); 'xliff@lists.oasis-open.org'
Subject: [xliff] RE: Inline attributes and canCopy

Hi Frederik, all,

I’m resurrecting this thread. It seems to me that the only way to tell if a target <pc> corresponds to source <pc> is to make sure their attributes, with the exception of id, are identical, then validate the constraint. However, as Frederik mentions below, this constraint is to help mergers when <originalData> is not present so that they know which <pc> tags correspond to which original codes (which may be stored outside of the xliff). BUT if I have:

xml

text

xlf

With no other attributes in <pc> other than id, how do I know which id to match in source? Is the tag in target or ? How can I apply the constraint without knowing the original data?

Thanks,

Ryan

From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Ryan King
Sent: Thursday, January 29, 2015 9:02 PM
To: Estreen, Fredrik; 'xliff@lists.oasis-open.org'
Subject: [xliff] RE: Inline attributes and canCopy

Thanks for the detailed explanation, Frederik! Somehow I missed the copyOf processing requirement. With your rationale, as long as we have <originalData>, we don’t really need ids for merge, which is our case. Additionally, we decode inline tags back to native codes when we store data in our TMS, so we perform normalization for matching, etc. in a non-XLIFF dependent way.

Bottom line seems to be that to satisfy both constraints, we just need to make sure arbitrarily that one of the ids in target match source each time we process and validate, since we won’t really know which was the original, because our tools can’t use copyOf, because we always carry <originalData>.

Thanks for the help,

Ryan

From: Estreen, Fredrik [mailto:Fredrik.Estreen@lionbridge.com]
Sent: Thursday, January 29, 2015 6:28 PM
To: Ryan King; 'xliff@lists.oasis-open.org'
Subject: RE: Inline attributes and canCopy

Hi Ryan,

I interpret the specification the same way as you do with respect to the IDs and agree with your added sentence clarifying it.

The rule that a an inline element that represent the exact same element in both source and target use the same ID in both locations is there to facilitate merge to native format for agents that do not put native data in the XLIFF document. Without it an agent would not be able to detect reordering or addition of codes. Safe substitution of tags in matches, in system that allow that, also need this. And it and also enables the storage of inline elements in TMs without the actual native code. Storing the native code in the TM offers more options for validation and match transformation so it may be a good thing anyway.

“copyOf” is not really optional. It is required if the copied inline element does not have associated original data.

In “4.7.2.4.1 Duplicating an existing code”:

Processing Requirements

Modifiers MUST NOT clone a code that has its canCopy attribute is set to no.
The copyOf attribute MUST be used when, and only when, the base code has no associated original data.

This requirements makes sure that a merger can always know what an inline element in target means as long as it knows what the meaning of the inline codes in source is. The expectation is that a merger not storing original data would be able to learn the meaning of the source inline elements at merge time through some to XLIFF external method (database, original file, etc..)

If the inline elements have original data associated a comparison of that data will allow re-associating the copies with the originals at least to a degree needed by mergers.

Following the rules of inline IDs and copyOf also allows more tag substitution to happen in matches. I personally believe that use of “copyOf” even for codes that have original data will allow a little bit more known safe substitutions than relying on comparison of original data. Unfortunately we don’t allow that behavior. The case where it makes a difference is if you have two identical inline codes in source and three in target. Which of the source ones is the third target one a copy of? But in most situations that will not be important to know, I have so far only found one TM related situation where it would help. Making it always required would also solve your use case.

The only XLIFF solution to the problem you present is the solution you include. Perform processing at modification time to make sure that the PRs and co-constraints are met. Or operate on a slightly modified model internally where you require the use of copyOf regardless of if the code has native data or not. The have an export / cleanup step that uses copyOf to make sure that one tag has the source ID and finally removes “copyOf” information that is in violation of the spec.

Regards,

Fredrik Estreen

From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Ryan King
Sent: den 30 januari 2015 09:18
To: 'xliff@lists.oasis-open.org'
Subject: [xliff] Inline attributes and canCopy

Hi TC, we’ve run into a dilemma and require some expert guidance. The XLIFF 2.0 spec says this about Id:

· When used in <segment>, <ignorable>, <mrk>, <sm>, <pc>, <sc>, <ec>, or <ph> elements:

o The inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist.

o Except for the above exception, the value MUST be unique among all of the above within the enclosing <unit> element.

So, when an inline element is copied, I might get this example from the spec:

<source>Äter <pc id="1">katter möss</pc>?</source>

<target>Do <pc id="1">cats</pc> eat <pc id="2" copyOf="1">mice</pc>?

</target>

</segment>

</unit>

In order for this to meet the above constraint, the intended meaning is probably something like:

· inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist. *Copies of inline elements are not considered to be corresponding to the original elements enclosed within the sibling <source> element and do not need to have the same id.*

And if that is true, when we validate the constraint to make sure the *original* source and target inline element ids match, how do we know which one is the original one if copyOf is not required and they can be reordered? If I just rely on making sure at least one of the Ids match regardless of position, what happens if I deleted the original elements in <target> and add back in two new ones? Do I have to make sure at least one of the elements has an id that matches? It seems like a lot of processing just satisfy the constraint.

What is the logical reason for this constraint?

Thanks,

Ryan

xliff message