[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: XLIFF 2.0 spec - dF sanity check of translation candidates module
Hi all, I have an issue with semantics of the attributes on this module. Clearly a match can carry much less attributes than an alt-trans can. This is in principle good because we do not want loads of garbage that no one is using. Metadata module is allowed here, but NOT general extensibility, so with the minimum of predefined attributes, we bettwr have them right. Apart from id and reference flag, we have three attributes that can be considered three dimensions specifying the match. We have free text origin that can be probably best used to identify the leverage source within a business process, so we are actually left with similarity and type to describe the match. I have an issue with similarity, as I think that it is too narrow. alt-trans had a match-quality, which was nevertheless defined as similarity of sources. Now, the issue is that match quality defined as similarity of sources is strictly speaking only relevant for one the match types, i.e. the tm. Value Description am Assembled Match ebm Example-based Machine Translation idm ID-based Match ice In-Context Exact Match mt Machine Translation tm Translation Memory Match For all match types except the TM match, similarity of source is largely irrelevant. I think there is an easy fix. Call the attribute "match-quality" and use the source similarity just as an example of a valid use of this one.[1] Other match types might have their methods of determining match-quality, e.g. machine translation systems are capable of reporting their confidence that however has nothing to do with similarity of sources. One of the motivations to do this, would be to be able to map the ITS 2.0 metadata category mtConfidence. In this scenario I can use "origin" to point to the specific MT engine that self-reports its confidence, the "mt" "type" to indicate that the source is MT, and not TM or something else. But "similarity", as defined now in the spec, does not make sense. If it was defined broader as quality if match and named "match-quality", it could be actually used to report mtConfidence. BTW, why are matches allowed on <unit>? Does it make any sense with at least one segment obligatory? Thanks for your attention dF [1] [proposed new text for the replacement attribute] B.1.3.2 match-quality Match-quality - indicates how suitable the match might be for eventual use as translated <target> in the parent <segment>. The most common usage would be to indicate the similarity level between the content of the <source> child of a <match> element and the translatable text being matched. Another usage might be to self-report MT confidence on machine translated translation candidates. Value description: a decimal number between 0.0 and 100.0. Default value: undefined Used in: <match>. Dr. David Filip ======================= LRC | CNGL | LT-Web | CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]