OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals


I have some concerns about the similarity attribute, until there is a openly acknowledged standard around matching proximity, having that attribute does not make sense to me. What does it mean when your tool say 75%? What happens if my tool does not acknowledge the calculation to be agreeable? Note that I am not suggesting it is not useful information but I think the cart is in front of the horse.

Based on an agreed upon matching standard, I do not believe there will be a need for subType. Most of the information specified by similarity would be sufficient enough for determining what the subType would be.




From:        Ryan King <ryanki@microsoft.com>
To:        Ryan King <ryanki@microsoft.com>, "Dr. David Filip" <David.Filip@ul.ie>
Cc:        Shirley Coady <scoady@multicorpora.com>, Yves Savourel <ysavourel@enlaso.com>, "xliff@lists.oasis-open.org" <xliff@lists.oasis-open.org>
Date:        12/15/2012 10:54 PM
Subject:        RE: [xliff] 1.2 to 2.0 Gaps and Proposals
Sent by:        <xliff@lists.oasis-open.org>




Further comments or discussion J?
 
From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Ryan King
Sent:
Tuesday, December 11, 2012 10:35 PM
To:
Dr. David Filip
Cc:
Shirley Coady; Yves Savourel; xliff@lists.oasis-open.org
Subject:
RE: [xliff] 1.2 to 2.0 Gaps and Proposals

 
To be honest, I originally proposed concatenated because I thought that was what we agreed on for subState at the f2f and I wanted to follow suit…but maybe I misremembered that. I actually think a separate attribute is better. It is cleaner as you say, and I don’t think it is really a heavy requirement to ask user agents to drop the subtype when the main type changes (or is deleted), which I agree is the correct behavior.
 
Should we define any sub values in Xliff such as “fuzzy” or “exact”? I would actually put “ice” here as well and not in the main type attribute. I reference Wikipedia for my reasoning J http://en.wikipedia.org/wiki/Translation_memory:
Retrieval
Several different types of matches can be retrieved from a TM.

Exact match
Exact matches appear when the match between the current source segment and the stored one is a character by character match. When translating a sentence, an exact match means the same sentence has been translated before. Exact matches are also called "100 % matches".
In-Context Exact (ICE) match or Guaranteed Match
An ICE match is an exact match that occurs in exactly the same context, that is, the same location in a paragraph. Context is often defined by the surrounding sentences and attributes such as document file name, date, and permissions.
Fuzzy match
When the match is not exact, it is a "fuzzy" match. Some systems assign percentages to these kinds of matches, in which case a fuzzy match is greater than 0% and less than 100%. Those figures are not comparable across systems unless the method of scoring is specified.
 
So now we would have something like this:

<match id=”1” similarity=”75.0” type=”tm” subtype=”xlf:fuzzy”>

<match id=”1” similarity=”99.0” type=”tm” subtype=”ms:near-exact”>
<match id=”1” similarity=”100.0” type=”tm” subtype=”xlf:exact”>
<match id=”1” similarity=”100.0” type=”tm” subtype=”xlf:ice”>
 
Thanks,
ryan
 
From: Dr. David Filip [mailto:David.Filip@ul.ie]
Sent:
Tuesday, December 11, 2012 4:06 PM
To:
Ryan King
Cc:
Shirley Coady; Yves Savourel;
xliff@lists.oasis-open.org
Subject:
Re: [xliff] 1.2 to 2.0 Gaps and Proposals

 
I support adding private subtype
 
Pending issues:
- Freeze of the normative top level list
- Mechnics of subtype, we should be probably using the same mechanics consistently, i.e. either concatenated or separate attributes. This is a spec wide issue
 
Separate seems cleaner, but concatenation seems better for processing, subtype is automatically dropped when main type changed, which seems desirable ??  
 
Cheers
dF
 
Dr. David Filip
=======================
LRC | CNGL | LT-Web | CSIS
University of Limerick, Ireland
telephone: +353-6120-2781
cellphone: +353-86-0222-158
facsimile: +353-6120-2734
mailto: david.filip@ul.ie
 
On Tue, Dec 11, 2012 at 11:32 PM, Ryan King <ryanki@microsoft.com> wrote:
Thanks Yves and Shirley, while we are discussing the correct list of match values, I'd like to know from the list if we have consensus on adding a subtype for match.

Thanks,
ryan


-----Original Message-----
From:
xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Shirley Coady
Sent: Tuesday, December 4, 2012 3:34 AM
To: Yves Savourel;
xliff@lists.oasis-open.org
Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals

Yves,

I still believe we need to add termbase matches to the list. I don't see any category below in which a termbase match could be grouped.
While I'm not disputing there may be some, I'm not personally aware of any tool that does not separate the terminology base from the TM. I understand that frequently the termbase is used to identify or replace terminology within a segment, and that's not a segment "match", but there are a lot of valid situations in which the entire segment is replaced from the termbase.
One of the best examples I have is when translating UN documents / conference meeting minutes, there is always a list, many pages long, of all participating delegates. We advise the users of our software to enter these in a termbase - I understand this is not traditional terminology but if you can automatically translate these, it's about saving time. Same thing with slogans, titles of government ministries that change routinely (at least in Canada they do!), standard disclaimers, etc.

Shirley

-----Original Message-----
From:
xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Yves Savourel
Sent: Saturday, December 01, 2012 10:05 AM
To:
xliff@lists.oasis-open.org
Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals

Hi Ryan, all,


> ... see my inline to your inline.
> Please let me know if there is anything I can do to help you document
> and get this added to the specification.
>  Do you feel we need to have a roll call vote on these items in the next TC call?

(this is related to the proposed changes in the match module) see below).

Personally I think it's best to work by consensus first, and only go to ballot when there is no consensus.
This TC is very ballot-driven so you should do whatever make sense in your opinion.

As for moving things forward:

- type probably needs a revised list

- subType and ref probably need to be defined as they would appear in the specification.

So people can see it and provide feedback if they want.
If there is no feedback, one can assume there is no dissent and update the specification.

I'm afraid I have not much time to do specification update currently, but Bryan, Tom or David may.

cheers, (and sorry for being slow to answer emails) -yves


-----Original Message-----
From:
xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Yves Savourel
Sent: Wednesday, November 28, 2012 7:48 PM
To: Ryan King;
xliff@lists.oasis-open.org
Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals

Hi Ryan, all,

Sorry for the delay: I'm just swamped and can't find the time to read emails anymore.

> 1. Be able to specify optional custom values for match type in
> <mtc:matches>

I suppose some mechanism similar to the subType we're using in inline codes and other places could allow for custom values while making sure a top-level category is also declared.

Since we are discussing values for match type: I'm still not convinced that the latest list makes sense:

am - Assembled Match
ebm - Example-based Machine Translation
idm - ID-based Match
ice - In-Context Exact Match
mt - Machine Translation
tm - Translation Memory Match

- 'Example-based Machine Translation' should not be there IMO: it's just MT, what type of MT is not relevant (but could be a candidate for the subtype)
- 'In-Context Exact Match' IMO should be 'in-context' only: the fact that's an exact one is captured in the similarity (and it could be an in-context fuzzy too).

[ryanki] I think this makes sense. For example, there's no reason each of these couldn't be valid (note ic instead of ice):
<match id=”1” similarity=”100.0” type=”ic/xlf:exact”> <match id=”1” similarity=”100.0” type=”mt/xlf:exact”> <match id=”1” similarity=”100.0” type=”tm/xlf:exact”> <match id=”1” similarity=”75.0” type=”ic/xlf:fuzzy”>
<match id=”1” similarity=”75.0” type=”mt/xlf:fuzzy”>
<match id=”1” similarity=”75.0” type=”tm/xlf:fuzzy”>

> 2. Support Reference Language in <mtc:matches> • Allow zero, one or
> more <mtc:matches> at each extension point, because you might have
> both recycling and reference language data.

I assume you mean: allow more than one <mtc:matches> where we currently allow one? Not in *all* extensions point. right?

[ryanki] exactement :)

> • Add an optional attribute reference=”yes|no” with no as default.
> Additionally, PR for a “reference match” would be to allow an xml:lang
> on the target different from the document and allow the <source> not
> to be present as it would be redundant information with the core
> <source>, e.g. Spanish reference for Quechua might look like this:

- reference='yes\no' and allowing a different language for xml:lang in those with reference='yes' seems ok to me.
- source not being present... I don't know. If we do that for those 'matches' why not for the normalmatches as well? If the source is the same.
I think we mandated the source originally that's to simplify processing: testing for the presence of not of the source may be cumbersome for some processors (XSLT maybe?).

[ryanki] in principle, we could carry around the redundant <source> the only side effect really being bloat to the XLIFF (but metadata will do that anyway...) I suggested it this way simply because <alt-trans> the previous element used for reference language in 1.2, does not require <source>, so this was for parity.

We would need to update the definition of what a "match" is as well.

hope this helps,
-ys



---------------------------------------------------------------------
To unsubscribe, e-mail:
xliff-unsubscribe@lists.oasis-open.org
For additional commands, e-mail:
xliff-help@lists.oasis-open.org




---------------------------------------------------------------------
To unsubscribe, e-mail:
xliff-unsubscribe@lists.oasis-open.org
For additional commands, e-mail:
xliff-help@lists.oasis-open.org
 


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]