OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals

To be honest, I originally proposed concatenated because I thought that was what we agreed on for subState at the f2f and I wanted to follow suit…but maybe I misremembered that. I actually think a separate attribute is better. It is cleaner as you say, and I don’t think it is really a heavy requirement to ask user agents to drop the subtype when the main type changes (or is deleted), which I agree is the correct behavior.


Should we define any sub values in Xliff such as “fuzzy” or “exact”? I would actually put “ice” here as well and not in the main type attribute. I reference Wikipedia for my reasoning J http://en.wikipedia.org/wiki/Translation_memory:

Several different types of matches can be retrieved from a TM.

Exact match

Exact matches appear when the match between the current source segment and the stored one is a character by character match. When translating a sentence, an exact match means the same sentence has been translated before. Exact matches are also called "100 % matches".

In-Context Exact (ICE) match or Guaranteed Match

An ICE match is an exact match that occurs in exactly the same context, that is, the same location in a paragraph. Context is often defined by the surrounding sentences and attributes such as document file name, date, and permissions.

Fuzzy match

When the match is not exact, it is a "fuzzy" match. Some systems assign percentages to these kinds of matches, in which case a fuzzy match is greater than 0% and less than 100%. Those figures are not comparable across systems unless the method of scoring is specified.


So now we would have something like this:

<match id=”1” similarity=”75.0” type=”tm” subtype=”xlf:fuzzy”>

<match id=”1” similarity=”99.0” type=”tm” subtype=”ms:near-exact”>

<match id=”1” similarity=”100.0” type=”tm” subtype=”xlf:exact”>

<match id=”1” similarity=”100.0” type=”tm” subtype=”xlf:ice”>





From: Dr. David Filip [mailto:David.Filip@ul.ie]
Sent: Tuesday, December 11, 2012 4:06 PM
To: Ryan King
Cc: Shirley Coady; Yves Savourel; xliff@lists.oasis-open.org
Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals


I support adding private subtype


Pending issues:

- Freeze of the normative top level list

- Mechnics of subtype, we should be probably using the same mechanics consistently, i.e. either concatenated or separate attributes. This is a spec wide issue


Separate seems cleaner, but concatenation seems better for processing, subtype is automatically dropped when main type changed, which seems desirable ??  





Dr. David Filip



University of Limerick, Ireland

telephone: +353-6120-2781

cellphone: +353-86-0222-158

facsimile: +353-6120-2734

On Tue, Dec 11, 2012 at 11:32 PM, Ryan King <ryanki@microsoft.com> wrote:

Thanks Yves and Shirley, while we are discussing the correct list of match values, I'd like to know from the list if we have consensus on adding a subtype for match.


-----Original Message-----
From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Shirley Coady
Sent: Tuesday, December 4, 2012 3:34 AM
To: Yves Savourel; xliff@lists.oasis-open.org
Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals


I still believe we need to add termbase matches to the list. I don't see any category below in which a termbase match could be grouped.
While I'm not disputing there may be some, I'm not personally aware of any tool that does not separate the terminology base from the TM. I understand that frequently the termbase is used to identify or replace terminology within a segment, and that's not a segment "match", but there are a lot of valid situations in which the entire segment is replaced from the termbase.
One of the best examples I have is when translating UN documents / conference meeting minutes, there is always a list, many pages long, of all participating delegates. We advise the users of our software to enter these in a termbase - I understand this is not traditional terminology but if you can automatically translate these, it's about saving time. Same thing with slogans, titles of government ministries that change routinely (at least in Canada they do!), standard disclaimers, etc.


-----Original Message-----
From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Yves Savourel
Sent: Saturday, December 01, 2012 10:05 AM
To: xliff@lists.oasis-open.org
Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals

Hi Ryan, all,

> ... see my inline to your inline.
> Please let me know if there is anything I can do to help you document
> and get this added to the specification.
>  Do you feel we need to have a roll call vote on these items in the next TC call?

(this is related to the proposed changes in the match module) see below).

Personally I think it's best to work by consensus first, and only go to ballot when there is no consensus.
This TC is very ballot-driven so you should do whatever make sense in your opinion.

As for moving things forward:

- type probably needs a revised list

- subType and ref probably need to be defined as they would appear in the specification.

So people can see it and provide feedback if they want.
If there is no feedback, one can assume there is no dissent and update the specification.

I'm afraid I have not much time to do specification update currently, but Bryan, Tom or David may.

cheers, (and sorry for being slow to answer emails) -yves

-----Original Message-----
From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Yves Savourel
Sent: Wednesday, November 28, 2012 7:48 PM
To: Ryan King; xliff@lists.oasis-open.org
Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals

Hi Ryan, all,

Sorry for the delay: I'm just swamped and can't find the time to read emails anymore.

> 1. Be able to specify optional custom values for match type in
> <mtc:matches>

I suppose some mechanism similar to the subType we're using in inline codes and other places could allow for custom values while making sure a top-level category is also declared.

Since we are discussing values for match type: I'm still not convinced that the latest list makes sense:

am - Assembled Match
ebm - Example-based Machine Translation
idm - ID-based Match
ice - In-Context Exact Match
mt - Machine Translation
tm - Translation Memory Match

- 'Example-based Machine Translation' should not be there IMO: it's just MT, what type of MT is not relevant (but could be a candidate for the subtype)
- 'In-Context Exact Match' IMO should be 'in-context' only: the fact that's an exact one is captured in the similarity (and it could be an in-context fuzzy too).

[ryanki] I think this makes sense. For example, there's no reason each of these couldn't be valid (note ic instead of ice):
<match id=”1” similarity=”100.0” type=”ic/xlf:exact”> <match id=”1” similarity=”100.0” type=”mt/xlf:exact”> <match id=”1” similarity=”100.0” type=”tm/xlf:exact”> <match id=”1” similarity=”75.0” type=”ic/xlf:fuzzy”>
<match id=”1” similarity=”75.0” type=”mt/xlf:fuzzy”>
<match id=”1” similarity=”75.0” type=”tm/xlf:fuzzy”>

> 2. Support Reference Language in <mtc:matches> • Allow zero, one or
> more <mtc:matches> at each extension point, because you might have
> both recycling and reference language data.

I assume you mean: allow more than one <mtc:matches> where we currently allow one? Not in *all* extensions point. right?

[ryanki] exactement :)

> • Add an optional attribute reference=”yes|no” with no as default.
> Additionally, PR for a “reference match” would be to allow an xml:lang
> on the target different from the document and allow the <source> not
> to be present as it would be redundant information with the core
> <source>, e.g. Spanish reference for Quechua might look like this:

- reference='yes\no' and allowing a different language for xml:lang in those with reference='yes' seems ok to me.
- source not being present... I don't know. If we do that for those 'matches' why not for the normalmatches as well? If the source is the same.
I think we mandated the source originally that's to simplify processing: testing for the presence of not of the source may be cumbersome for some processors (XSLT maybe?).

[ryanki] in principle, we could carry around the redundant <source> the only side effect really being bloat to the XLIFF (but metadata will do that anyway...) I suggested it this way simply because <alt-trans> the previous element used for reference language in 1.2, does not require <source>, so this was for parity.

We would need to update the definition of what a "match" is as well.

hope this helps,

To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: xliff-help@lists.oasis-open.org

To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: xliff-help@lists.oasis-open.org


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]