[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [xliff-seg] Views on segments and segmentation
Hi Christian, Thank you for starting this thread! Here are my comments on this topic: A) In my opinion a segment could perhaps be better defined as
something like: (n) “A
piece of text based data that is linguistically suitable for translation.” Such a definition allows for different
types of segmentation, such as sentence based segmentation, paragraph based
segmentation, and even phrase and term based segmentation. Note that the word “segment”
can also be used as a verb (e.g. “to segment a file”), in which
case it could be defined as something like: (v) “The
process of dividing text based data into (segments) / (pieces that are individually
linguistically suitable for translation).” Segmentation in turn could be defined as something like: (n) “The
division of text based data into (segments) / (pieces that are individually linguistically
suitable for translation).” For optimal reuse of previous translations,
e.g. through a translation memory tool, experience shows that in most cases it
is most efficient to use sentence based segmentation, though there are cases
where paragraph segmentation or phrase segmentation can yield better results. Term
based segmentation yields problems in that even though the terms themselves may
be suitable for individual translation it is often the case that the
surrounding text (without the terms) is difficult to treat as segments, since
they do not always make linguistic sense without the terms themselves. B) Regarding SRX we touched upon this in our
first and second sub-committee meetings. SRX is a standard for expressing segmentation
rules for data in TMX format. Thus we would need to present the data in TMX
compliant format in the XLIFF files in order to fully be able to apply SRX. The
conclusion was that we need to look into the possibility of introducing TMX as
a namespace in XLIFF files. For this to be possible TMX must have an XML schema.
Currently there is only a DTD available, and Yves has an action item to push
the TMX committee to provide a schema that we could use. He will bring this up
in the next TMX committee meeting. Looking forward to additional comments on
this topic! Magnus From:
Lieske, Christian [mailto:christian.lieske@sap.com] Dear all, During the sub-committee meeting on 23-Mar-04, I developed
the feeling that I and possibly others would benefit from a discussion related
to the notion of 'segment' and 'segmentation'. Since the statement of purpose for the sub-committee reads "The XLIFF Segmentation Subcommittee goal is to recommend segmentation representations within an XLIFF document." a common understanding of these notions seems to be vital. Best regards, Christian A. Segment One way of starting a discussion, is to look at a kind of
standard definition for 'segment' in
the realm of localization: "A segment
is what a program considers the smallest translatable unit, usually a
sentence." Starting from here, I wonder what to say
about a French phrase like "Chaque patient à l'hôpital a une carte vitale" Here, 'carte vitale'
is a term (along the lines of English "health insurance card"), sth. which
from my understanding is the smallest translatable unit. Accordingly, I would
see a concatenation of two segments in the French phrase. One question which thus comes to mind when
I think about the sub-committee is the following: Would the work of the
sub-committee result in a recommendation like "Phrases which contain terms
which are available in a glossary attached to an XLIFF file are not to be split
into different segments"? B. Segmentation I wonder how for example the sub-committee's work is related
to TR-29 of the Unicode standard (see http://www.unicode.org/reports/tr29/)
or the ongoing work at LISA related to the Segmenation Rules Exchange format
(SRX). I have got the feeling that many observers will ask questions like this. |
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]