xliff message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals
- From: Helena S Chapman <hchapman@us.ibm.com>
- To: Ryan King <ryanki@microsoft.com>
- Date: Thu, 17 Jan 2013 21:24:08 -0500
Wait, so are we suggesting to include cost
(element) model along with matching attribute all in XLIFF? To the tools,
why would it matter whether or not a match is in-context-exact-match or
exact-match from a sister product's translation last 6 month, an exact-match
from public domain memories, or exact match from another product 20 years
ago etc. Most of us really only care about two types of exact matches:
1) a real exact match within context 2) everything else. How a vendor is
paid within the cost model of that company is established by an contractual
agreement that lives outside of the XLIFF document. Same principle should
apply to fuzzy matches.
Unless we are talking about mixing MS
content along with Oracle content in the same document and therefore there
is a need to distinguish between which one is which when you pay your vendor,
within the same organization, what's the value of having the ms namespace
tagged along with the content?
From:
Ryan King <ryanki@microsoft.com>
To:
Helena S Chapman/San
Jose/IBM@IBMUS
Cc:
"Dr. David Filip"
<David.Filip@ul.ie>, Shirley Coady <scoady@multicorpora.com>,
"xliff@lists.oasis-open.org" <xliff@lists.oasis-open.org>,
Yves Savourel <ysavourel@enlaso.com>
Date:
01/17/2013 01:08 PM
Subject:
RE: [xliff]
1.2 to 2.0 Gaps and Proposals
One of the main reasons why
having an extensible match subtype makes sense is because the cost and
billing models between content providers and localization supplier can
differ from one to the next. If I have a 100% match from a TM database,
that match might just be an exact match or it might be an in context exact
match. Microsoft might have a contract to pay their localization supplier
to review the exact match, but not the in context exact match. Another
company might have a different cost model where they pay to have the in
context exact match reviewed as well.
Thanks,
Ryan
From: xliff@lists.oasis-open.org
[mailto:xliff@lists.oasis-open.org]
On Behalf Of Helena S Chapman
Sent: Monday, December 17, 2012 8:42 AM
To: Ryan King
Cc: Dr. David Filip; Ryan King; Shirley Coady; xliff@lists.oasis-open.org;
Yves Savourel
Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals
I have some concerns about the similarity
attribute, until there is a openly acknowledged standard around matching
proximity, having that attribute does not make sense to me. What does it
mean when your tool say 75%? What happens if my tool does not acknowledge
the calculation to be agreeable? Note that I am not suggesting it is not
useful information but I think the cart is in front of the horse.
Based on an agreed upon matching standard, I do not believe there will
be a need for subType. Most of the information specified by similarity
would be sufficient enough for determining what the subType would be.
From: Ryan
King <ryanki@microsoft.com>
To: Ryan King
<ryanki@microsoft.com>,
"Dr. David Filip" <David.Filip@ul.ie>
Cc: Shirley
Coady <scoady@multicorpora.com>,
Yves Savourel <ysavourel@enlaso.com>,
"xliff@lists.oasis-open.org"
<xliff@lists.oasis-open.org>
Date: 12/15/2012
10:54 PM
Subject: RE:
[xliff] 1.2 to 2.0 Gaps and Proposals
Sent by: <xliff@lists.oasis-open.org>
Further comments or discussion J?
From: xliff@lists.oasis-open.org
[mailto:xliff@lists.oasis-open.org]
On Behalf Of Ryan King
Sent: Tuesday, December 11, 2012 10:35 PM
To: Dr. David Filip
Cc: Shirley Coady; Yves Savourel; xliff@lists.oasis-open.org
Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals
To be honest, I originally proposed concatenated because I thought that
was what we agreed on for subState at the f2f and I wanted to follow suit…but
maybe I misremembered that. I actually think a separate attribute is better.
It is cleaner as you say, and I don’t think it is really a heavy requirement
to ask user agents to drop the subtype when the main type changes (or is
deleted), which I agree is the correct behavior.
Should we define any sub values in Xliff such as “fuzzy” or “exact”?
I would actually put “ice” here as well and not in the main type attribute.
I reference Wikipedia for my reasoning J
http://en.wikipedia.org/wiki/Translation_memory:
Retrieval
Several different types of matches can be retrieved from a TM.
Exact match
Exact matches appear when the match between the current source segment
and the stored one is a character by character match. When translating
a sentence, an exact match means the same sentence has been translated
before. Exact matches are also called "100 % matches".
In-Context Exact (ICE) match or Guaranteed Match
An ICE match is an exact match that occurs in exactly the same context,
that is, the same location in a paragraph. Context is often defined by
the surrounding sentences and attributes such as document file name, date,
and permissions.
Fuzzy match
When the match is not exact, it is a "fuzzy" match. Some systems
assign percentages to these kinds of matches, in which case a fuzzy match
is greater than 0% and less than 100%. Those figures are not comparable
across systems unless the method of scoring is specified.
So now we would have something like this:
<match id=”1” similarity=”75.0” type=”tm” subtype=”xlf:fuzzy”>
<match id=”1” similarity=”99.0” type=”tm” subtype=”ms:near-exact”>
<match id=”1” similarity=”100.0” type=”tm” subtype=”xlf:exact”>
<match id=”1” similarity=”100.0” type=”tm” subtype=”xlf:ice”>
Thanks,
ryan
From: Dr. David Filip [mailto:David.Filip@ul.ie]
Sent: Tuesday, December 11, 2012 4:06 PM
To: Ryan King
Cc: Shirley Coady; Yves Savourel; xliff@lists.oasis-open.org
Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals
I support adding private subtype
Pending issues:
- Freeze of the normative top level list
- Mechnics of subtype, we should be probably using the same mechanics consistently,
i.e. either concatenated or separate attributes. This is a spec wide issue
Separate seems cleaner, but concatenation seems better for processing,
subtype is automatically dropped when main type changed, which seems desirable
??
Cheers
dF
Dr. David Filip
=======================
LRC | CNGL | LT-Web | CSIS
University of Limerick, Ireland
telephone: +353-6120-2781
cellphone: +353-86-0222-158
facsimile: +353-6120-2734
mailto: david.filip@ul.ie
On Tue, Dec 11, 2012 at 11:32 PM, Ryan King <ryanki@microsoft.com>
wrote:
Thanks Yves and Shirley, while we are discussing the correct list of match
values, I'd like to know from the list if we have consensus on adding a
subtype for match.
Thanks,
ryan
-----Original Message-----
From: xliff@lists.oasis-open.org
[mailto:xliff@lists.oasis-open.org]
On Behalf Of Shirley Coady
Sent: Tuesday, December 4, 2012 3:34 AM
To: Yves Savourel; xliff@lists.oasis-open.org
Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals
Yves,
I still believe we need to add termbase matches to the list. I don't see
any category below in which a termbase match could be grouped.
While I'm not disputing there may be some, I'm not personally aware of
any tool that does not separate the terminology base from the TM. I understand
that frequently the termbase is used to identify or replace terminology
within a segment, and that's not a segment "match", but there
are a lot of valid situations in which the entire segment is replaced from
the termbase.
One of the best examples I have is when translating UN documents / conference
meeting minutes, there is always a list, many pages long, of all participating
delegates. We advise the users of our software to enter these in a termbase
- I understand this is not traditional terminology but if you can automatically
translate these, it's about saving time. Same thing with slogans, titles
of government ministries that change routinely (at least in Canada they
do!), standard disclaimers, etc.
Shirley
-----Original Message-----
From: xliff@lists.oasis-open.org
[mailto:xliff@lists.oasis-open.org]
On Behalf Of Yves Savourel
Sent: Saturday, December 01, 2012 10:05 AM
To: xliff@lists.oasis-open.org
Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals
Hi Ryan, all,
> ... see my inline to your inline.
> Please let me know if there is anything I can do to help you document
> and get this added to the specification.
> Do you feel we need to have a roll call vote on these items
in the next TC call?
(this is related to the proposed changes in the match module) see below).
Personally I think it's best to work by consensus first, and only go to
ballot when there is no consensus.
This TC is very ballot-driven so you should do whatever make sense in your
opinion.
As for moving things forward:
- type probably needs a revised list
- subType and ref probably need to be defined as they would appear in the
specification.
So people can see it and provide feedback if they want.
If there is no feedback, one can assume there is no dissent and update
the specification.
I'm afraid I have not much time to do specification update currently, but
Bryan, Tom or David may.
cheers, (and sorry for being slow to answer emails) -yves
-----Original Message-----
From: xliff@lists.oasis-open.org
[mailto:xliff@lists.oasis-open.org]
On Behalf Of Yves Savourel
Sent: Wednesday, November 28, 2012 7:48 PM
To: Ryan King; xliff@lists.oasis-open.org
Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals
Hi Ryan, all,
Sorry for the delay: I'm just swamped and can't find the time to read emails
anymore.
> 1. Be able to specify optional custom values for match type in
> <mtc:matches>
I suppose some mechanism similar to the subType we're using in inline codes
and other places could allow for custom values while making sure a top-level
category is also declared.
Since we are discussing values for match type: I'm still not convinced
that the latest list makes sense:
am - Assembled Match
ebm - Example-based Machine Translation
idm - ID-based Match
ice - In-Context Exact Match
mt - Machine Translation
tm - Translation Memory Match
- 'Example-based Machine Translation' should not be there IMO: it's just
MT, what type of MT is not relevant (but could be a candidate for the subtype)
- 'In-Context Exact Match' IMO should be 'in-context' only: the fact that's
an exact one is captured in the similarity (and it could be an in-context
fuzzy too).
[ryanki] I think this makes sense. For example, there's no reason each
of these couldn't be valid (note ic instead of ice):
<match id=”1” similarity=”100.0” type=”ic/xlf:exact”> <match
id=”1” similarity=”100.0” type=”mt/xlf:exact”> <match id=”1”
similarity=”100.0” type=”tm/xlf:exact”> <match id=”1” similarity=”75.0”
type=”ic/xlf:fuzzy”>
<match id=”1” similarity=”75.0” type=”mt/xlf:fuzzy”>
<match id=”1” similarity=”75.0” type=”tm/xlf:fuzzy”>
> 2. Support Reference Language in <mtc:matches> • Allow zero,
one or
> more <mtc:matches> at each extension point, because you might
have
> both recycling and reference language data.
I assume you mean: allow more than one <mtc:matches> where we currently
allow one? Not in *all* extensions point. right?
[ryanki] exactement :)
> • Add an optional attribute reference=”yes|no” with no as default.
> Additionally, PR for a “reference match” would be to allow an xml:lang
> on the target different from the document and allow the <source>
not
> to be present as it would be redundant information with the core
> <source>, e.g. Spanish reference for Quechua might look like
this:
- reference='yes\no' and allowing a different language for xml:lang in
those with reference='yes' seems ok to me.
- source not being present... I don't know. If we do that for those 'matches'
why not for the normalmatches as well? If the source is the same.
I think we mandated the source originally that's to simplify processing:
testing for the presence of not of the source may be cumbersome for some
processors (XSLT maybe?).
[ryanki] in principle, we could carry around the redundant <source>
the only side effect really being bloat to the XLIFF (but metadata will
do that anyway...) I suggested it this way simply because <alt-trans>
the previous element used for reference language in 1.2, does not require
<source>, so this was for parity.
We would need to update the definition of what a "match" is as
well.
hope this helps,
-ys
---------------------------------------------------------------------
To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: xliff-help@lists.oasis-open.org
---------------------------------------------------------------------
To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: xliff-help@lists.oasis-open.org
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]