RE: [QUAR] RE: [xliff] RE: Inline attributes and canCopy

Hi Ryan, all,

I guess it is like the translation itself: The specification says the <target> contains the translated text of the <source>, but there is no real way to valid this. Those inline codes are the same: If they have the same ID we can at least verify they are also of the same kind (spanning/placeholder) and that is it: that is what define them as ‘corresponding’.

As for the text “The inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist.” It just says that an ID value can be duplicated in the source and target only when the two codes are corresponding codes.

There is probably a better way to express that.

But at least it says that if a code has an ID value that does not exists in the source it is not a “corresponding code”, and therefore it’s an added one.

Cheers,

-ys

From: Ryan King [mailto:ryanki@microsoft.com]
Sent: Monday, June 29, 2015 7:44 AM
To: Yves Savourel; xliff@lists.oasis-open.org
Subject: [QUAR] RE: [xliff] RE: Inline attributes and canCopy

Hi Yves,

OK, if we go with correspondence being defined as the same kind of code and the same ID, then what is the purpose of validating that the corresponding codes have the same ID if that was already a criteria to correspondence? It makes even less sense to me now how to validate this constraint.

Ryan

From: Yves Savourel [mailto:ysavourel@enlaso.com]
Sent: Sunday, June 28, 2015 9:35 PM
To: Ryan King; xliff@lists.oasis-open.org
Subject: RE: [xliff] RE: Inline attributes and canCopy

Hi Ryan,

I don’t think the merger could necessarily resolve the correspondence because it would need the <data> for the target, which the extractor may or may not provide.

I think it would also make life a lot more difficult in general to have mismatched IDs between source and target. Also we wouldn’t be able to tell which code is new vs exist in the source.

So if two codes have the same ID and are of the same kind (spanning or placeholder) that should be enough to indicate that they are corresponding codes.

With that the merger can choose what it does with the data: use the one from the source, or the target, or even from its own internal mechanism.

Cheers,

-ys

From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Ryan King
Sent: Sunday, June 28, 2015 7:35 AM
To: Yves Savourel; xliff@lists.oasis-open.org
Subject: RE: [xliff] RE: Inline attributes and canCopy

Hi Yves, all,

It is interesting, Yves, that you bring up the TM and merger scenario together because in my mind then there isn't any reason why I couldn't populate my target with the following from a TM:

<source>Hello <pc id="1">World</pc></source>
<target>Hola <pc id="2">Mundo</pc></target>

Especially since the merger could resolve the original data in source and then use the same original data in the target, e.g. treat them as corresponding. Or would you argue I shouldn't be able to leverage this high similarity match from my TM just because of the ID difference? The trouble would be where I might have more than one tag:

<source><pc id="1">Hello</pc> <pc id="2">World</pc></source>
<target><pc id="3">Hello</pc> <pc id="4">World</pc></target>

Now how does the merger determine corresponding? Which brings me back to my original thought that you can only really tell if a code is corresponding if they both have the same original data reference. So regarding "corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist" seems to indicate to me that once original data references are resolved to find correspondences, IDs must match...but again, what is the purpose of the matching IDs since I've already resolved the original data...

Ryan

From: Yves Savourel
Sent: ‎6/‎27/‎2015 9:20 PM
To: xliff@lists.oasis-open.org
Subject: RE: [xliff] RE: Inline attributes and canCopy

Hi everyone,

After discussing the issue of how to determine what constitutes a ‘corresponding code’ with others I think it is probably too restrictive to even expect most of the attributes listed below to be the same.

Leveraging from TM is a frequent scenario and one cannot expect to get similar attributes for things like canDelete or subFlow. For example, the extractor that created the leveraged content may have had different rules on what can be deleted or not, or a previous version of the same code may have no sub-flow (like <a> without alt in version 1, an <a> with alt in version 2).

Such differences are frequents but they don’t mean the target code is not corresponding to the source code.

Maybe only two information in both codes need to be identical: The id value and the kind of code (spanning or placeholder).

Ultimately it is the merger agent that should decide what to do with the target codes. As long as it can associate a target code to a source one (and only id/kind-of-code are needed for this) it should be fine. Otherwise we may end up prohibiting the processing of files that are being translated and have leveraged target content.

Thoughts?

-yves

_____________________________________________
From: Yves Savourel [mailto:ysavourel@enlaso.com]
Sent: Sunday, June 21, 2015 7:20 AM
To: 'xliff@lists.oasis-open.org'
Subject: [xliff] RE: Inline attributes and canCopy

Hi Ryan, all,

> Thanks Yves, when you have something working in Lynx, can you share

> the full implementation with me and we'll look at what we can do to

> at least have parity.

I've implemented a verification that tries to detect when two source/target codes with the same ID are not "corresponding". It’s just my best solution so far, but I’m obviously open to adjustments. I have not yet made a formal release with the change: It would be nice to get the feedback from other implementers and TC members.

Such two codes are seen as corresponding if:

- Both are either spanning (<pc>/<sc><ec>) or both standalone (<ph>)

And if they have, at least, the following properties identical:

- type

- canOverlap

- canDelete

- canRemove

- canReorder

- subFlows/subFlowStart/subFlowEnd

- canCopy

- copyOf

- disp/dispStart/dispEnd

- equiv/equivStart/equivEnd

They may have different values for all the other properties: the rational to not include the others (like data, dir, subType, etc.) is that they might be changed when they are in the target.

For the annotations: I check for identical values on:

- type

- translate

You can see the source code here:

https://bitbucket.org/okapiframework/xliff-toolkit/src/1349e1470f2422752f675a98cf3bcff433569e6d/okapi/libraries/lib-xliff/src/main/java/net/sf/okapi/lib/xliff2/reader/XLIFFReader.java?at=master#cl-1627

Note that the code performs the verification on the parsed elements, so, for example, there is no distinction between the attributes disp, dispStart and dispEnd: In the object model it's just the disp property of that tag object.

You can play with the new behavior in the online validator: http://okapi-lynx.appspot.com/validation

(The issue Nesho found with <cp hex='7FFFFFFF'/> is also fixed)

There are a few test files (bad_NotCorrespondingCode*.xlf) here:

https://bitbucket.org/okapiframework/xliff-toolkit/src/master/okapi/libraries/lib-xliff/src/test/resources/invalid/

But we should also add valid test files.

Thanks,

-yves

xliff message