[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [xliff] Y22 - Translation proposals
Hi Klemens, There is no way to standardize the meaning of a match quality percentage. If a tool requests a match to an MT engine like Google or Bing, the source text sent to the engine would probably be the content of <source> from <segment>. Then, the similarity of the <source> element in <match> and the one in <segment> would be 100% but the quality of the match may not be perfect. The similarity value should not be considered alone, it has to be considered in the context of the match “type”. That is something that depends on the tool and the user. Regards, Rodolfo -- From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Dr. Klemens Waldhör Maybe one could think of something like a comment or note in the header or somewhere else in the match which gives a reference/explanation which type of „quality measure“ was used. In my Araya I have since a long time what I call phrase matches. Such a match is built from term matches (I think similar to MultiCorpora) and it might be interesting for the user to know that the match quality is computed differently from a match from a tm entry. Even for tm entry matches different systems uses different algorithms, even if edit distance (Levenshtein or what else) is used. Even if you Levenshtein the conversion from the edit distance to a % value can be computed in various ways. Not taking into account that edit distances can be weighted for insertions, deletion, replacements. Another point: How are inline elements and their differences matched? Penalty, stringified element difference… Many options. Creating a comparable quality measure is quite hard. As long as this is not standardised too. Klemens --------------------------------------- Von: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] Im Auftrag von Shirley Coady We also have TermBase matches at MultiCorpora, as well as fuzzy matches which I’m sure are already on everyone’s list. I’m not in favor of having a special category for what Helena is describing as “global matches” or “optimized matches”, as I’m sure every organization has special ways of pulling out the most relevant matches and I’m sure each organization’s way is different. In the end they are still exact or fuzzy matches, and Lucia’s comment about the provenance could handle these situations. Regards, SHIRLEY COADY PRODUCT MANAGER | GESTIONNAIRE DE PRODUIT (819)778-7070 ext./poste 229 From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Helena S Chapman I tend to agree with Rodolfo on the quality/score attribute on keeping it simple to just a well defined attribute. Any reason why something like edit distance could not be applied for "similarity" and if so why not just call it "edit_distance"?
|
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]