xliff message

Subject: Re: [xliff] XLIFF 2.0 spec - dF sanity check of translation candidates module

From: "Dr. David Filip" <David.Filip@ul.ie>
To: Yves Savourel <ysavourel@enlaso.com>
Date: Mon, 4 Mar 2013 20:34:41 +0000

Thanks Yves,

answering inline..

> I think the text/attributes/values in the current draft don't reflect the later discussions.
> See for example https://lists.oasis-open.org/archives/xliff/201212/msg00065.html
> and some of the emails (before and after) from Ryan, Shirley, Helena, etc.
OK, I see I will update the spec with the reference during this week.
But the issue of types and subtypes is still entangled with the scores
issue below.. Yet, I see value in changing the spec to reflect the
discussion for now.
>
> As for similarity: we had that discussion long ago and agreed that it was not meant to represent how good the translation was, just how close the source text of the candidate was from the source text of the original. And for the 'match-quality' (score) , if I recall well at the time the consensus was that the value was not meaningful across tool so we didn't have one.

I agree with you and Helena that scores are meaningless across tools
[until a matching standard is specified, by ULI? when?], no matter if
you call them similarity or quality etc.
IMHO the solution is not to ban the attribute but make a tool
identifier mandatory if the scores are present.
In my view both similarity and match-quality can be subsumed under
something like match suitability, both are values between 0 and 100,
so I do not see value in splitting them.. Where would I store the
mtconfidence given the current proposals?

> Some of those ideas are in https://wiki.oasis-open.org/xliff/XLIFF2.0/Feature/Translation%20Proposals
>
> cheers,
> -yves
>
>
> -----Original Message-----
> From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Dr. David Filip
> Sent: Monday, March 04, 2013 11:07 AM
> To: xliff@lists.oasis-open.org
> Subject: [xliff] XLIFF 2.0 spec - dF sanity check of translation candidates module
>
> Hi all, I have an issue with semantics of the attributes on this module.
>
> Clearly a match can carry much less attributes than an alt-trans can.
> This is in principle good because we do not want loads of garbage that no one is using. Metadata module is allowed here, but NOT general extensibility, so with the minimum of predefined attributes, we bettwr have them right.
>
> Apart from id and reference flag, we have three attributes that can be considered three dimensions specifying the match.
> We have free text origin that can be probably best used to identify the leverage source within a business process, so we are actually left with similarity and type to describe the match.
>
> I have an issue with similarity, as I think that it is too narrow.
> alt-trans had a match-quality, which was nevertheless defined as similarity of sources.
>
> Now, the issue is that match quality defined as similarity of sources is strictly speaking only relevant for one the match types, i.e. the tm.
> Value   Description
> am      Assembled Match
> ebm     Example-based Machine Translation
> idm     ID-based Match
> ice     In-Context Exact Match
> mt      Machine Translation
> tm      Translation Memory Match
>
> For all match types except the TM match, similarity of source is largely irrelevant.
>
> I think there is an easy fix. Call the attribute "match-quality" and use the source similarity just as an example of a valid use of this one.[1] Other match types might have their methods of determining match-quality, e.g. machine translation systems are capable of reporting their confidence that however has nothing to do with similarity of sources.
>
> One of the motivations to do this, would be to be able to map the ITS
> 2.0 metadata category mtConfidence.
>
> In this scenario I can use "origin" to point to the specific MT engine that self-reports its confidence, the "mt" "type" to indicate that the source is MT, and not TM or something else. But "similarity", as defined now in the spec, does not make sense. If it was defined broader as quality if match and named "match-quality", it could be actually used to report mtConfidence.
>
> BTW, why are matches allowed on <unit>? Does it make any sense with at least one segment obligatory?
>
> Thanks for your attention
> dF
>
>
> [1] [proposed new text for the replacement attribute]
>
> B.1.3.2 match-quality
>
> Match-quality - indicates how suitable the match might be for eventual use as translated <target> in the parent <segment>. The most common usage would be to indicate the similarity level between the content of the <source> child of a <match> element and the translatable text being matched. Another usage might be to self-report MT confidence on machine translated translation candidates.
>
> Value description: a decimal number between 0.0 and 100.0.
>
> Default value: undefined
>
> Used in: <match>.
>
>
> Dr. David Filip
> =======================
> LRC | CNGL | LT-Web | CSIS
> University of Limerick, Ireland
> telephone: +353-6120-2781
> cellphone: +353-86-0222-158
> facsimile: +353-6120-2734
> mailto: david.filip@ul.ie
>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>

Follow-Ups:
- RE: [xliff] XLIFF 2.0 spec - dF sanity check of translation candidates module
  - From: Yves Savourel <ysavourel@enlaso.com>
- RE: [xliff] XLIFF 2.0 spec - dF sanity check of translation candidates module
  - From: Uwe Stahlschmidt <uwes@microsoft.com>

References:
- XLIFF 2.0 spec - dF sanity check of translation candidates module
  - From: "Dr. David Filip" <David.Filip@ul.ie>
- RE: [xliff] XLIFF 2.0 spec - dF sanity check of translation candidates module
  - From: Yves Savourel <ysavourel@enlaso.com>