I am a bit unsure if we get all this done in a hurry. Particularly the discussions about the version integration that Yves mentions below.
Anyway, here a preliminary, for discussion, proposal to add the capability to point to internal matches to XLIFF 2.1
Translation Candidate Reference Annotation
This annotation can be used to mark up content with a reference to other content which can be used as a translation proposal, but where
the translation is not yet known at the time of annotation.
This annotation can reference any source spans of content that are referencable via the
attribute is REQUIRED
attribute is REQUIRED and set to
attribute is REQUIRED and points to source content which can be used as translation candidate
attribute is OPTIONAL and if used represents the similarity value for the translation proposal in the range from 0.0 to 100.0
attribute is OPTIONAL
<source>He is my friend.</source>
<source><mrk id="m1" type="mtc:imatch" ref="#u=u1/s1" value="100.0">He is my friend.</mrk></source>
As you see, there are at least two problems:
Other than in the original concept of the matches module the internal matches have to cross unit boundaries
To make the referenced content of an internal match re-segmentable, it would be best to mark it with <mrk> tags, too. In case there is
a translation added to that reference, it needs <mrk> tags, too.
The question is if one should make it a requirement that the ref attribute always points to a <mrk> (to enable resegmentation of the referenced
content without breaking the match).
From: Yves Savourel [mailto:firstname.lastname@example.org]
Sent: Donnerstag, 13. November 2014 18:51
To: 'Dr. David Filip'; Schurig, Joachim
Subject: RE: [xliff] internal matches
There are 5 more day until closure of the 2.1 features. So if this has to have any chance to make it someone needs to fill a proposal very soon.
Also, will we have anyone willing to implement it? (before January).
It would also bring an interesting first case for implementing/(or not) backward compatibility with modules:
Can a 2.1 document have 2.0 Translation candidates? (or both 2.1 and 2.1)?
Does the 2.1 core schema would have to include both Translation Candidates schemes?
A lot of question for dealing with updated modules will need to be resolved (which is a good thing).
From: email@example.com [mailto:firstname.lastname@example.org]
On Behalf Of Dr. David Filip
Sent: Thursday, November 13, 2014 10:12 AM
To: Schurig, Joachim
Subject: Re: [xliff] internal matches
I think that this custom annotation would be a natural extension to the mtc module, should not be too difficult to add.
OASIS XLIFF TC Secretary, Editor, and Liaison Officer
University of Limerick, Ireland
On Thu, Nov 13, 2014 at 12:12 PM, Schurig, Joachim <Joachim.Schurig@lionbridge.com> wrote:
yes, thanks! I had been thinking about using a custom <mrk>, and it would work (by adding match quality in the value attribute), but it
is – custom and hence is not easily understood across tools of multiple parties.
I know that translation candidates are optional either, but at least if they are understood they should be understood in a common manner.
So I guess we really missed an important use case for them.
Internal matches cannot be neglected for their impact on translation cost, and to support them we now either need to implement a custom
annotation or implement a database and fuzzy matching engine in translation clients. Which renders translation candidates in the XLIFF superfluous, as we actually need to merge them then into the same database, and in that case we would have run better with
TMX embedded in the XLIFF file header (because it would avoid duplicates) and not associated with single segments/units.
I do not want to dramatize actually – I am only thinking that we missed by a hair the chance to have a match proposal mechanism that works
without a dynamic database of some kind.
Am I the only one worrying?
Yes, you are correct, I there is no official ways to link a content to another content that is the same or very
similar (a duplicate or a fuzzy duplicate).
You could probably define some kind of annotation for this:
An mrk element spanning the duplicated/repetition content with ‘ref’ pointing to the original, and possibly ‘value’
as some indicator of the type of match (exact/fuzzy). If there is a need for more info one would have to define a module for that.
I wonder if we have overlooked a use case in the translation candidates module.
As you know, with XLIFF 2.0, it is easily possible to include reference data into the translatable file, such as matches and glossary data. The agent working
on these data does not need to perform searches or comparison on the content or reference data, as all reference data can be linked to specific portions of the content data.
However, for reference data which is only to become created during the modification process I do not see currently a method to link.
Think of content data in one segment which, after translation, is a reasonable translation candidate in another segment. This relationship is easy to detect in
the XLIFF creation or enricher phase.
But because this relationship cannot be expressed properly by reference mechanisms, one still needed to include e.g. fuzzy matching logic into the translation
That is, if I did not overlook something. Did I?
Senior Technical Director Language Technology,
1240 Route des Dolines
06560 Sophia Antipolis