xliff message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: RE: [xliff] Seeking opinions on XLIFF 2.0 tool support for *basic* segmentation
- From: Helena S Chapman <hchapman@us.ibm.com>
- To: "Schnabel, Bryan S" <bryan.s.schnabel@tektronix.com>
- Date: Mon, 16 Jun 2014 14:37:18 -0400
Forgot to mention, the ULI stuff is sentence
level only. I personally think we should stay away from standardizing XLIFF
segmentation behavior and let the data drive the appropriate behavior where
the users see fit. I don't know if that's what you meant below though.
From:
"Schnabel, Bryan
S" <bryan.s.schnabel@tektronix.com>
To:
Helena S Chapman/San
Jose/IBM@IBMUS
Cc:
"xliff@lists.oasis-open.org"
<xliff@lists.oasis-open.org>, Yves Savourel <ysavourel@enlaso.com>
Date:
06/16/2014 02:22 PM
Subject:
RE: [xliff]
Seeking opinions on XLIFF 2.0 tool support for *basic* segmentation
Thanks Helena. I will take
a look.
Hmmm. Starting to wonder
if standardizing the way XLIFF references segmentation exceptions could
become a module for 2.x? Perhaps overkill though . . .
From: Helena S Chapman [mailto:hchapman@us.ibm.com]
Sent: Monday, June 16, 2014 10:54 AM
To: Schnabel, Bryan S
Cc: xliff@lists.oasis-open.org; Yves Savourel
Subject: RE: [xliff] Seeking opinions on XLIFF 2.0 tool support for
*basic* segmentation
Another source of exception is http://unicode.org/uli/trac/changeset/48
from Unicode Consortium. The exception data is mostly based on dbPedia
(http://www.dbpedia.org)
with vetting from IBM and Microsoft. The implementation is included in
ICU 53 as tech preview and will be released as part of main distribution
in ICU 54 in 3Q2014 which goes into many partner service environments (OS
or application software level), such as Google, Apple, and IBM.
It's currently available for English, German, French, Spanish, Italian,
Portuguese, and Russian. Working on another set of 19 languages with DBPedia
2H2014.
Best regards,
Helena Shih Chapman
Globalization Technologies and Architecture
+1-720-396-6323 or T/L 938-6323
Waltham, Massachusetts
From: "Schnabel,
Bryan S" <bryan.s.schnabel@tektronix.com>
To: Yves Savourel
<ysavourel@enlaso.com>,
"xliff@lists.oasis-open.org"
<xliff@lists.oasis-open.org>
Date: 06/16/2014
12:55 PM
Subject: RE:
[xliff] Seeking opinions on XLIFF 2.0 tool support for *basic* segmentation
Sent by: <xliff@lists.oasis-open.org>
[note: As I see this thread quickly drifting away from XLIFF-proper-subject-matter,
I will only cc the list on this reply, then take the remainder of the thread
offline - those who are interested in segmentation as it pertains to XLIFF
2.0 and my tool are welcome to opt in to the remaining conversation]
Yves,
This is very useful and has guided me to what I think is the most rational
approach for my tool. I now understand that SRX is useful for 2 types of
rules; Breaks and Exceptions. I think that supporting Breaks is more trouble
than it is worth for my tool. So I will simply "hard wire" standard
breaks that generally work for the sentence rules that I am aware of.
But I think I will support Exceptions. I will hard wire in the ones I know
of (for example to correctly segment the excellent example you sent, and
a few others I can think of). And in addition, I will support a look-up
config file that will allow users to add Exceptions (for sure, there will
be way more than I can think of).
Here is a follow-on question: are there any public Exceptions-files
for given languages? It seems like there are enough known need-to-code-for
exceptions that maybe the community could benefit from. Do you know of
any that are available?
Thanks,
Bryan
-----Original Message-----
From: xliff@lists.oasis-open.org
[mailto:xliff@lists.oasis-open.org]
On Behalf Of Yves Savourel
Sent: Sunday, June 15, 2014 12:52 PM
To: xliff@lists.oasis-open.org
Subject: RE: [xliff] Seeking opinions on XLIFF 2.0 tool support for *basic*
segmentation
Hi Bryan,
> I just added support for segmentation-at-the-sentence-level
> to my XLIFF 2.0 tool.
> ...
> I'm contemplating if this level of segmentation is useful?
> Or do I need to add full SRX support?.
> I'm really hoping the reply is "oh, no need for SRX - SRX is
overkill
> for most - sentences are fine."
My experience is that very quickly any segmenter needs some way to define
exceptions and just "basic" detection is not good enough.
But all depends on how you define "basic" and obviously one could
define exceptions other than using SRX.
If you can break properly something like:
[[
Mr. Holmes is from the U.K. not the U.S. <pc id="1">Is
Dr. Watson from there too?</pc> Yes: both are.<ph id="2"/>
]]
Then your segmentation engine is quite good already.
Cheers,
-ys
---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that generates
this mail. Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]