OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [xliff] Seeking opinions on XLIFF 2.0 tool support for *basic* segmentation


> I hope this helps

 

Wow. Immensely. Thanks!

 

From: Yves Savourel [mailto:ysavourel@enlaso.com]
Sent: Monday, June 16, 2014 10:19 AM
To: Schnabel, Bryan S
Cc: xliff@lists.oasis-open.org
Subject: RE: [xliff] Seeking opinions on XLIFF 2.0 tool support for *basic* segmentation

 

Ø  Here is a follow-on question: are there any public Exceptions-files for given languages? It seems like there are enough known need-to-code-for exceptions that maybe the community could benefit from. Do you know of any that are available?

 

Indeed, one might think the “community” would have come up with a common public set of lists for such simple and useful resource... Well, the community is not quite there yet I’m afraid.

 

The closest thing to such lists is probably here:

https://code.google.com/p/srx-repository/source/browse/

 

Those are public SRX files, mostly from the LanguageTool and the Okapi framework projects.

But this has not been updated in a long while.

 

Another place with a good set of list is the source for the segmenter of OmegaT:

http://sourceforge.net/p/omegat/code/ci/master/tree/src/org/omegat/core/segmentation/defaultRules.srx

 

 

I hope this helps,

-yves

 

 

From: Schnabel, Bryan S [mailto:bryan.s.schnabel@tektronix.com]
Sent: Monday, June 16, 2014 10:55 AM
To: Yves Savourel; xliff@lists.oasis-open.org
Subject: RE: [xliff] Seeking opinions on XLIFF 2.0 tool support for *basic* segmentation

 

[note: As I see this thread quickly drifting away from XLIFF-proper-subject-matter, I will only cc the list on this reply, then take the remainder of the thread offline - those who are interested in segmentation as it pertains to XLIFF 2.0 and my tool are welcome to opt in to the remaining conversation]

 

Yves,

 

This is very useful and has guided me to what I think is the most rational approach for my tool. I now understand that SRX is useful for 2 types of rules; Breaks and Exceptions. I think that supporting Breaks is more trouble than it is worth for my tool. So I will simply "hard wire" standard breaks that generally work for the sentence rules that I am aware of.

 

But I think I will support Exceptions. I will hard wire in the ones I know of (for example to correctly segment the excellent example you sent, and a few others I can think of). And in addition, I will support a look-up config file that will allow users to add Exceptions (for sure, there will be way more than I can think of).

 

Here is a follow-on question: are there any public Exceptions-files for given languages? It seems like there are enough known need-to-code-for exceptions that maybe the community could benefit from. Do you know of any that are available?

 

Thanks,

 

Bryan

 

 

 

-----Original Message-----
From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Yves Savourel
Sent: Sunday, June 15, 2014 12:52 PM
To: xliff@lists.oasis-open.org
Subject: RE: [xliff] Seeking opinions on XLIFF 2.0 tool support for *basic* segmentation

 

Hi Bryan,

 

> I just added support for segmentation-at-the-sentence-level

> to my XLIFF 2.0 tool.

> ...

> I'm contemplating if this level of segmentation is useful?

> Or do I need to add full SRX support?.

> I'm really hoping the reply is "oh, no need for SRX - SRX is overkill

> for most - sentences are fine."

 

My experience is that very quickly any segmenter needs some way to define exceptions and just "basic" detection is not good enough.

But all depends on how you define "basic" and obviously one could define exceptions other than using SRX.

 

If you can break properly something like:

 

[[

Mr. Holmes is from the U.K. not the U.S. <pc id="1">Is Dr. Watson from there too?</pc> Yes: both are.<ph id="2"/> ]]

 

Then your segmentation engine is quite good already.

 

Cheers,

-ys

 

 

---------------------------------------------------------------------

To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at:

 

 

 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]