[note: As I see this thread quickly drifting away from XLIFF-proper-subject-matter, I will only cc the list on this reply, then take the remainder of the thread offline - those who are interested
in segmentation as it pertains to XLIFF 2.0 and my tool are welcome to opt in to the remaining conversation]
This is very useful and has guided me to what I think is the most rational approach for my tool. I now understand that SRX is useful for 2 types of rules; Breaks and Exceptions. I think that supporting Breaks is more trouble than it is worth for my tool.
So I will simply "hard wire" standard breaks that generally work for the sentence rules that I am aware of.
But I think I will support Exceptions. I will hard wire in the ones I know of (for example to correctly segment the excellent example you sent, and a few others I can think of). And in addition, I will support a look-up config file that will allow users
to add Exceptions (for sure, there will be way more than I can think of).
Here is a follow-on question: are there any public Exceptions-files for given languages? It seems like there are enough known need-to-code-for exceptions that maybe the community could benefit from. Do you know of any that are available?
From: firstname.lastname@example.org [mailto:email@example.com
] On Behalf Of Yves Savourel
Sent: Sunday, June 15, 2014 12:52 PM
Subject: RE: [xliff] Seeking opinions on XLIFF 2.0 tool support for *basic* segmentation
> I just added support for segmentation-at-the-sentence-level
> to my XLIFF 2.0 tool.
> I'm contemplating if this level of segmentation is useful?
> Or do I need to add full SRX support?.
> I'm really hoping the reply is "oh, no need for SRX - SRX is overkill
> for most - sentences are fine."
My experience is that very quickly any segmenter needs some way to define exceptions and just "basic" detection is not good enough.
But all depends on how you define "basic" and obviously one could define exceptions other than using SRX.
If you can break properly something like:
Mr. Holmes is from the U.K. not the U.S. <pc id="1">Is Dr. Watson from there too?</pc> Yes: both are.<ph id="2"/> ]]
Then your segmentation engine is quite good already.
To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at: