OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: RE: [xliff] Y22 - Translation proposals



Similarity is not defined in terms of edit distance in our spec.




Rodolfo M. Raya       rmraya@maxprograms.com


From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Helena S Chapman
Sent: Tuesday, October 02, 2012 1:53 PM
To: Yves Savourel
Cc: xliff@lists.oasis-open.org
Subject: RE: [xliff] Y22 - Translation proposals


Ideally, I'd like to see synonyms or the like to have 100 "similarity" score as well so it is not limited to a strict edit distance calculation in some sense. If we can define similarity a little more broadly, I would be more comfortable with that.

From:        Yves Savourel <ysavourel@enlaso.com>
To:        <xliff@lists.oasis-open.org>
Date:        10/02/2012 12:45 PM
Subject:        RE: [xliff] Y22 - Translation proposals
Sent by:        <xliff@lists.oasis-open.org>

Hi Helena,

> I am curious whether when we say "similarity" we also
> meant "synonymity"? For example, "big" and "large"
> often has the same meaning even within context.
> There is a situation where we do automatic
> replacements even if the words are not "similar" but with the same meaning.

I would say if the source of the match has synonyms rather than the same words as the entry source its similarity would be less than 100.
I'm not sure that answer your question though.

> What is the likelihood of other types of "cb" in maxprog
> that is not "exact-context"? or "am" type matches that
> are not composed of substrings from various segments in okp?

The idea is to have broad categories and not assume the details. For example some tool could use get context information using a fuzzy threshold, or a assembled translation may be done partly from substring tm matches, partly from MT text and partly from glossary matches.

> Having said that, I can see we might use,
> type="tm/hadoop:short"
> type="mt/lucy:long"
> where the payment of each segment match is determined
> by the length of the segment. Is that what you are thinking?

I guess that's a use case. The type customized part of the type of match is for each workflow/tool to define as it sees fit.


To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: xliff-help@lists.oasis-open.org

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]