xliff message

Subject: XLIFF 2.0 spec - dF sanity check of translation candidates module

From: "Dr. David Filip" <David.Filip@ul.ie>
To: xliff@lists.oasis-open.org
Date: Mon, 4 Mar 2013 18:07:13 +0000

Hi all, I have an issue with semantics of the attributes on this module.

Clearly a match can carry much less attributes than an alt-trans can.
This is in principle good because we do not want loads of garbage that
no one is using. Metadata module is allowed here, but NOT general
extensibility, so with the minimum of predefined attributes, we bettwr
have them right.

Apart from id and reference flag, we have three attributes that can be
considered three dimensions specifying the match.
We have free text origin that can be probably best used to identify
the leverage source within a business process, so we are actually left
with similarity and type to describe the match.

I have an issue with similarity, as I think that it is too narrow.
alt-trans had a match-quality, which was nevertheless defined as
similarity of sources.

Now, the issue is that match quality defined as similarity of sources
is strictly speaking only relevant for one the match types, i.e. the
tm.
Value	Description
am	Assembled Match
ebm	Example-based Machine Translation
idm	ID-based Match
ice	In-Context Exact Match
mt	Machine Translation
tm	Translation Memory Match

For all match types except the TM match, similarity of source is
largely irrelevant.

I think there is an easy fix. Call the attribute "match-quality" and
use the source similarity just as an example of a valid use of this
one.[1] Other match types might have their methods of determining
match-quality, e.g. machine translation systems are capable of
reporting their confidence that however has nothing to do with
similarity of sources.

One of the motivations to do this, would be to be able to map the ITS
2.0 metadata category mtConfidence.

In this scenario I can use "origin" to point to the specific MT engine
that self-reports its confidence, the "mt" "type" to indicate that the
source is MT, and not TM or something else. But "similarity", as
defined now in the spec, does not make sense. If it was defined
broader as quality if match and named "match-quality", it could be
actually used to report mtConfidence.

BTW, why are matches allowed on <unit>? Does it make any sense with at
least one segment obligatory?

Thanks for your attention
dF


[1] [proposed new text for the replacement attribute]

B.1.3.2 match-quality

Match-quality - indicates how suitable the match might be for eventual
use as translated <target> in the parent <segment>. The most common
usage would be to indicate the similarity level between the content of
the <source> child of a <match> element and the translatable text
being matched. Another usage might be to self-report MT confidence on
machine translated translation candidates.

Value description: a decimal number between 0.0 and 100.0.

Default value: undefined

Used in: <match>.


Dr. David Filip
=======================
LRC | CNGL | LT-Web | CSIS
University of Limerick, Ireland
telephone: +353-6120-2781
cellphone: +353-86-0222-158
facsimile: +353-6120-2734
mailto: david.filip@ul.ie

Follow-Ups:
- RE: [xliff] XLIFF 2.0 spec - dF sanity check of translation candidates module
  - From: Yves Savourel <ysavourel@enlaso.com>