Re: [xliff] Attributes for translation candidates

xliff message

Subject: Re: [xliff] Attributes for translation candidates

From: Jung Nicholas Ryoo <jungwoo.ryoo@oracle.com>

To: Yves Savourel <ysavourel@enlaso.com>

Date: Mon, 27 Feb 2012 16:36:45 +0000

Hi Yves,

I have a couple of questions and comments on the proposal.

1) Data type of score(similarity) and quality:
* Is there any reason why the score should be an integer? In our case, it has been always a real number ranging from 0 to 100.00. You may ask us back the benefit of having them in real number though. Our scoring logic is very sophisticated. We want to sort suggestions correctly (99.9 is definitely preferred to 99). Real numbers may be better for interoperability as it is a superset of integer.

2) Score and quality?
* I understand the points of having two attributes. However, our scoring logic all consider many factors including similarity, quality, content domains and types etc. The score for our case is a combination score, so we can list the suggestions clearly in the order of our preference.

* Therefore, "similarity" is not proper for our case. I suggest to have "match-score" as a main attribute, allowing two more attributes (similarity, quality) if each tool wants to have. All these may increase confusion rather than help. 2 attributes are perfect, and 3 attributes are too many? Then my suggestion is to have the first attribute "score".

3) content-type, content-domain, match-type

* Due to cross-file/type leverage, we need to deliver content-type (xml, html, properties, etc) and content domain. Do you think "origin" can be used for that purpose?
* "type" requires a clearly defined list of values. For MT suggestions, translators should post-edit instead of translate. CATs may have specific features for MT suggestions. Therefore, XLIFF docs should use the same value in type attribute for MT suggestions.

Regards
Jung

On 27/02/2012 12:45, Yves Savourel wrote:

Hi Rodolfo, all,

1) Change the name of "score" to "similarity".
That would be clearer.

Done.

2) Define an optional module for storing the 
metadata associated with a match.

Yes, I think such metadata could be re-used for other features. For example QA annotations, etc.

Perhaps we would need to provide some directions 
for handling the combination of "score/similarity" with "quality".
It may be hard for a user to select the best match from 
two matches that have these properties:
a) similarity="60" quality="90"
b) similarity="80" quality="60"

That would be something useful. But, based on some discussions I've seen in use cases like Microsoft Translator's MatchDegree (similarity) and Rating (quality) I'm not sure there would be a single answer. Often it ends up being a user preference that needs to be decided at usage time.

This also brings the question: should we have a processing expectation that user agents should preserve the order of the matches? Also should we have specific processing expectations about how new matches should be added?

My guess is that we probably want to keep this simple: XLIFF provides the structure to hold the information, but let tools do what they want with it. For example a processing expectation that the matches must be re-written in the same order wouldn't work with a tool whose tasks is precisely to apply some ranking to the matches.

Cheers,
-yves



---------------------------------------------------------------------
To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: xliff-help@lists.oasis-open.org

--
Jung Nicholas Ryoo | Principal Software Engineer
Phone: +35318031918 | | Fax: +35318031918 |
Oracle WPTG Infrastructure

ORACLE Ireland | Block P5, Eastpoint Business Park Dublin 3
Oracle is committed to developing practices and products that help protect the environment