OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [xliff] XLIFF 2.0 spec - dF sanity check of translation candidates module


Hi David,

I think the text/attributes/values in the current draft don't reflect the later discussions.
See for example https://lists.oasis-open.org/archives/xliff/201212/msg00065.html
and some of the emails (before and after) from Ryan, Shirley, Helena, etc.

As for similarity: we had that discussion long ago and agreed that it was not meant to represent how good the translation was, just how close the source text of the candidate was from the source text of the original. And for the 'match-quality' (score) , if I recall well at the time the consensus was that the value was not meaningful across tool so we didn't have one.
Some of those ideas are in https://wiki.oasis-open.org/xliff/XLIFF2.0/Feature/Translation%20Proposals

cheers,
-yves


-----Original Message-----
From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Dr. David Filip
Sent: Monday, March 04, 2013 11:07 AM
To: xliff@lists.oasis-open.org
Subject: [xliff] XLIFF 2.0 spec - dF sanity check of translation candidates module

Hi all, I have an issue with semantics of the attributes on this module.

Clearly a match can carry much less attributes than an alt-trans can.
This is in principle good because we do not want loads of garbage that no one is using. Metadata module is allowed here, but NOT general extensibility, so with the minimum of predefined attributes, we bettwr have them right.

Apart from id and reference flag, we have three attributes that can be considered three dimensions specifying the match.
We have free text origin that can be probably best used to identify the leverage source within a business process, so we are actually left with similarity and type to describe the match.

I have an issue with similarity, as I think that it is too narrow.
alt-trans had a match-quality, which was nevertheless defined as similarity of sources.

Now, the issue is that match quality defined as similarity of sources is strictly speaking only relevant for one the match types, i.e. the tm.
Value	Description
am	Assembled Match
ebm	Example-based Machine Translation
idm	ID-based Match
ice	In-Context Exact Match
mt	Machine Translation
tm	Translation Memory Match

For all match types except the TM match, similarity of source is largely irrelevant.

I think there is an easy fix. Call the attribute "match-quality" and use the source similarity just as an example of a valid use of this one.[1] Other match types might have their methods of determining match-quality, e.g. machine translation systems are capable of reporting their confidence that however has nothing to do with similarity of sources.

One of the motivations to do this, would be to be able to map the ITS
2.0 metadata category mtConfidence.

In this scenario I can use "origin" to point to the specific MT engine that self-reports its confidence, the "mt" "type" to indicate that the source is MT, and not TM or something else. But "similarity", as defined now in the spec, does not make sense. If it was defined broader as quality if match and named "match-quality", it could be actually used to report mtConfidence.

BTW, why are matches allowed on <unit>? Does it make any sense with at least one segment obligatory?

Thanks for your attention
dF


[1] [proposed new text for the replacement attribute]

B.1.3.2 match-quality

Match-quality - indicates how suitable the match might be for eventual use as translated <target> in the parent <segment>. The most common usage would be to indicate the similarity level between the content of the <source> child of a <match> element and the translatable text being matched. Another usage might be to self-report MT confidence on machine translated translation candidates.

Value description: a decimal number between 0.0 and 100.0.

Default value: undefined

Used in: <match>.


Dr. David Filip
=======================
LRC | CNGL | LT-Web | CSIS
University of Limerick, Ireland
telephone: +353-6120-2781
cellphone: +353-86-0222-158
facsimile: +353-6120-2734
mailto: david.filip@ul.ie

---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]