xliff message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals
- From: Helena S Chapman <hchapman@us.ibm.com>
- To: Ryan King <ryanki@microsoft.com>
- Date: Mon, 17 Dec 2012 11:41:57 -0500
I have some concerns about the similarity
attribute, until there is a openly acknowledged standard around matching
proximity, having that attribute does not make sense to me. What does it
mean when your tool say 75%? What happens if my tool does not acknowledge
the calculation to be agreeable? Note that I am not suggesting it is not
useful information but I think the cart is in front of the horse.
Based on an agreed upon matching standard,
I do not believe there will be a need for subType. Most of the information
specified by similarity would be sufficient enough for determining what
the subType would be.
From:
Ryan King <ryanki@microsoft.com>
To:
Ryan King <ryanki@microsoft.com>,
"Dr. David Filip" <David.Filip@ul.ie>
Cc:
Shirley Coady <scoady@multicorpora.com>,
Yves Savourel <ysavourel@enlaso.com>, "xliff@lists.oasis-open.org"
<xliff@lists.oasis-open.org>
Date:
12/15/2012 10:54 PM
Subject:
RE: [xliff]
1.2 to 2.0 Gaps and Proposals
Sent by:
<xliff@lists.oasis-open.org>
Further comments or discussion
J?
From: xliff@lists.oasis-open.org
[mailto:xliff@lists.oasis-open.org]
On Behalf Of Ryan King
Sent: Tuesday, December 11, 2012 10:35 PM
To: Dr. David Filip
Cc: Shirley Coady; Yves Savourel; xliff@lists.oasis-open.org
Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals
To be honest, I originally proposed concatenated
because I thought that was what we agreed on for subState at the f2f and
I wanted to follow suit…but maybe I misremembered that. I actually think
a separate attribute is better. It is cleaner as you say, and I don’t
think it is really a heavy requirement to ask user agents to drop the subtype
when the main type changes (or is deleted), which I agree is the correct
behavior.
Should we define any sub values in Xliff
such as “fuzzy” or “exact”? I would actually put “ice” here as well
and not in the main type attribute. I reference Wikipedia for my reasoning
J
http://en.wikipedia.org/wiki/Translation_memory:
Retrieval
Several different types of matches can be retrieved from a TM.
Exact match
Exact matches appear when the match between
the current source segment and the stored one is a character by character
match. When translating a sentence, an exact match means the same sentence
has been translated before. Exact matches are also called "100 % matches".
In-Context Exact (ICE) match or Guaranteed
Match
An ICE match is an exact match that occurs
in exactly the same context, that is, the same location in a paragraph.
Context is often defined by the surrounding sentences and attributes such
as document file name, date, and permissions.
Fuzzy match
When the match is not exact, it is a "fuzzy"
match. Some systems assign percentages to these kinds of matches, in which
case a fuzzy match is greater than 0% and less than 100%. Those figures
are not comparable across systems unless the method of scoring is specified.
So now we would have something like this:
<match id=”1” similarity=”75.0” type=”tm” subtype=”xlf:fuzzy”>
<match id=”1” similarity=”99.0”
type=”tm” subtype=”ms:near-exact”>
<match id=”1” similarity=”100.0”
type=”tm” subtype=”xlf:exact”>
<match id=”1” similarity=”100.0”
type=”tm” subtype=”xlf:ice”>
Thanks,
ryan
From: Dr. David Filip [mailto:David.Filip@ul.ie]
Sent: Tuesday, December 11, 2012 4:06 PM
To: Ryan King
Cc: Shirley Coady; Yves Savourel; xliff@lists.oasis-open.org
Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals
I support adding private subtype
Pending issues:
- Freeze of the normative top level
list
- Mechnics of subtype, we should
be probably using the same mechanics consistently, i.e. either concatenated
or separate attributes. This is a spec wide issue
Separate seems cleaner, but concatenation
seems better for processing, subtype is automatically dropped when main
type changed, which seems desirable ??
Cheers
dF
Dr. David Filip
=======================
LRC | CNGL | LT-Web | CSIS
University of Limerick, Ireland
telephone: +353-6120-2781
cellphone: +353-86-0222-158
facsimile: +353-6120-2734
mailto: david.filip@ul.ie
On Tue, Dec 11, 2012 at 11:32 PM,
Ryan King <ryanki@microsoft.com>
wrote:
Thanks Yves and Shirley, while
we are discussing the correct list of match values, I'd like to know from
the list if we have consensus on adding a subtype for match.
Thanks,
ryan
-----Original Message-----
From: xliff@lists.oasis-open.org
[mailto:xliff@lists.oasis-open.org]
On Behalf Of Shirley Coady
Sent: Tuesday, December 4, 2012 3:34 AM
To: Yves Savourel; xliff@lists.oasis-open.org
Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals
Yves,
I still believe we need to add termbase matches to the list. I don't see
any category below in which a termbase match could be grouped.
While I'm not disputing there may be some, I'm not personally aware of
any tool that does not separate the terminology base from the TM. I understand
that frequently the termbase is used to identify or replace terminology
within a segment, and that's not a segment "match", but there
are a lot of valid situations in which the entire segment is replaced from
the termbase.
One of the best examples I have is when translating UN documents / conference
meeting minutes, there is always a list, many pages long, of all participating
delegates. We advise the users of our software to enter these in a termbase
- I understand this is not traditional terminology but if you can automatically
translate these, it's about saving time. Same thing with slogans, titles
of government ministries that change routinely (at least in Canada they
do!), standard disclaimers, etc.
Shirley
-----Original Message-----
From: xliff@lists.oasis-open.org
[mailto:xliff@lists.oasis-open.org]
On Behalf Of Yves Savourel
Sent: Saturday, December 01, 2012 10:05 AM
To: xliff@lists.oasis-open.org
Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals
Hi Ryan, all,
> ... see my inline to your inline.
> Please let me know if there is anything I can do to help you document
> and get this added to the specification.
> Do you feel we need to have a roll call vote on these items
in the next TC call?
(this is related to the proposed changes in the match module) see below).
Personally I think it's best to work by consensus first, and only go to
ballot when there is no consensus.
This TC is very ballot-driven so you should do whatever make sense in your
opinion.
As for moving things forward:
- type probably needs a revised list
- subType and ref probably need to be defined as they would appear in the
specification.
So people can see it and provide feedback if they want.
If there is no feedback, one can assume there is no dissent and update
the specification.
I'm afraid I have not much time to do specification update currently, but
Bryan, Tom or David may.
cheers, (and sorry for being slow to answer emails) -yves
-----Original Message-----
From: xliff@lists.oasis-open.org
[mailto:xliff@lists.oasis-open.org]
On Behalf Of Yves Savourel
Sent: Wednesday, November 28, 2012 7:48 PM
To: Ryan King; xliff@lists.oasis-open.org
Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals
Hi Ryan, all,
Sorry for the delay: I'm just swamped and can't find the time to read emails
anymore.
> 1. Be able to specify optional custom values for match type in
> <mtc:matches>
I suppose some mechanism similar to the subType we're using in inline codes
and other places could allow for custom values while making sure a top-level
category is also declared.
Since we are discussing values for match type: I'm still not convinced
that the latest list makes sense:
am - Assembled Match
ebm - Example-based Machine Translation
idm - ID-based Match
ice - In-Context Exact Match
mt - Machine Translation
tm - Translation Memory Match
- 'Example-based Machine Translation' should not be there IMO: it's just
MT, what type of MT is not relevant (but could be a candidate for the subtype)
- 'In-Context Exact Match' IMO should be 'in-context' only: the fact that's
an exact one is captured in the similarity (and it could be an in-context
fuzzy too).
[ryanki] I think this makes sense. For example, there's no reason each
of these couldn't be valid (note ic instead of ice):
<match id=”1” similarity=”100.0” type=”ic/xlf:exact”> <match
id=”1” similarity=”100.0” type=”mt/xlf:exact”> <match id=”1”
similarity=”100.0” type=”tm/xlf:exact”> <match id=”1” similarity=”75.0”
type=”ic/xlf:fuzzy”>
<match id=”1” similarity=”75.0” type=”mt/xlf:fuzzy”>
<match id=”1” similarity=”75.0” type=”tm/xlf:fuzzy”>
> 2. Support Reference Language in <mtc:matches> • Allow zero,
one or
> more <mtc:matches> at each extension point, because you might
have
> both recycling and reference language data.
I assume you mean: allow more than one <mtc:matches> where we currently
allow one? Not in *all* extensions point. right?
[ryanki] exactement :)
> • Add an optional attribute reference=”yes|no” with no as default.
> Additionally, PR for a “reference match” would be to allow an xml:lang
> on the target different from the document and allow the <source>
not
> to be present as it would be redundant information with the core
> <source>, e.g. Spanish reference for Quechua might look like
this:
- reference='yes\no' and allowing a different language for xml:lang in
those with reference='yes' seems ok to me.
- source not being present... I don't know. If we do that for those 'matches'
why not for the normalmatches as well? If the source is the same.
I think we mandated the source originally that's to simplify processing:
testing for the presence of not of the source may be cumbersome for some
processors (XSLT maybe?).
[ryanki] in principle, we could carry around the redundant <source>
the only side effect really being bloat to the XLIFF (but metadata will
do that anyway...) I suggested it this way simply because <alt-trans>
the previous element used for reference language in 1.2, does not require
<source>, so this was for parity.
We would need to update the definition of what a "match" is as
well.
hope this helps,
-ys
---------------------------------------------------------------------
To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: xliff-help@lists.oasis-open.org
---------------------------------------------------------------------
To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: xliff-help@lists.oasis-open.org
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]