xliff message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: RE: [xliff] Segmentation as core or not
- From: Helena S Chapman <hchapman@us.ibm.com>
- To: "Rodolfo M. Raya" <rmraya@maxprograms.com>
- Date: Tue, 8 Nov 2011 10:12:40 -0500
Rodolfo. You brought up an interesting
point "To apply segmentation process to an
already existing XLIFF file is an optional task. Recording that
such task has been performed is the optional part. For the
process to be possible, the text must already be in the XLIFF file and
it has to be in some containers. "
I believe we are talking about two very
distinct process activities here: 1. partition content into parts (core)
2. refine the definition of #1 into segments (module)
I agree any existing XLIFF file will
already include "parts" of content. How these parts were defined
by what tools is something the module can then define. For example, one
might expect metadata about what the parts mean according to other standard
or non-standard definition. For example, word vs sentence according to
UAX#29 or paragraph vs chapter based on Acme Translation Agency Inc. internal
definition? The latter is what Steven is referring to as logging.
We definitely should rethink the taxonomy
of what we call "segmentation" today. Note that I didn't use
the word "terminology" to further pollute the conversation.
Best regards,
Helena Shih Chapman
Globalization Technologies and Architecture
+1-720-396-6323 or T/L 938-6323
Waltham, Massachusetts
From:
"Rodolfo M. Raya"
<rmraya@maxprograms.com>
To:
<xliff@lists.oasis-open.org>
Date:
11/08/2011 04:37 AM
Subject:
RE: [xliff]
Segmentation as core or not
Sent by:
<xliff@lists.oasis-open.org>
Hi,
I think there is a huge confusion between the segmentation process and
storing segments in XLIFF.
Text extracted for translation and stored in an XLIFF file needs to be
stored in some elements that act as containers. If XLIFF doesn't have containers
for holding localizable text, then the localizable text can't be exchanged
and the "L" and "I" fail in the XLIFF acronym.
Extracted text can be segmented before the XLIFF file is created (my tools
have been doing this for years) or after the XLIFF has been created. A
tool processing XLIFF files should not care about when segmentation was
done. More, the segmentation process is completely optional.
To apply segmentation process to an already existing XLIFF file is an optional
task. Recording that such task has been performed is the optional
part. For the process to be possible, the text must already be in the XLIFF
file and it has to be in some containers.
Storing translatable text in XLIFF files is not optional. Elements for
holding that text are required and elements for holding the translations
of that text are also an integral part of XLIFF.
What we have so far in the XLIFF schema draft is a set of elements and
attributes for holding translatable text and its translations.
In the schema we don't have information that indicates how and when segmentation
process occurred.
In the wiki we have a proposal for decorating current schema draft with
elements and attributes containing information about the segmentation process.
The proposal in the wiki augments the scope of the basic elements already
present in the schema draft by adding attributes and processing expectations
to elements that must be present in any XLIFF file.
Although some attributes mentioned in the segmentation section in the wiki
are not really necessary when an XLIFF file is created, the elements in
which they appear are absolutely necessary. We can't document an element
as part of the "core" schema and leave some of its attributes
as optional in a separate "module".
Minimalism is a fancy trend. I like it very much and see it useful in some
cases. We should not try to apply minimalism to the concept of XLIFF core;
this would be a mistake as big as the mistake in XLIFF 1.2 that enabled
custom extensions everywhere.
Balance is important.
Regards,
Rodolfo
--
Rodolfo M. Raya rmraya@maxprograms.com
Maxprograms http://www.maxprograms.com
> -----Original Message-----
> From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org]
On Behalf
> Of Yves Savourel
> Sent: Tuesday, November 08, 2011 3:24 AM
> To: xliff@lists.oasis-open.org
> Subject: RE: [xliff] Segmentation as core or not
>
> Hi Steven, all,
>
> > We discussed this a little bit in IBM today.
> > Our view would still be that segmentation does not need to be
in core
> > for interchange.
>
> I think most (all hopefully) of us would probably agree that one important
> criteria for an optional module is that it does not prevent
the tools
> implementing only the core to work properly.
>
> So if the representation of sentence-segmentation is optional it should
not
> prevent a tool XYZ, which understands only the core elements, to work.
>
> The question then is how does tool XYZ can work with a sentence-
> segmented file without knowing about <segment>?
>
> <unit id='1'>
> <segment>
> <source>Sentence one. </source>
> </segment>
> <segment>
> <source>Sentence two.</source>
> </segment>
> </unit>
>
> I don't think it can.
>
> The only way it could, would be if a unit was to store two copies
of the same
> content: one not sentence-segmented, and the other one reserved for
the
> tools that would implement the optional segmentation representation
> module.
>
> Needless to say this would result in a slew of troubles: Where does
tool ABC
> (which implements segmentation) puts its translation? How tools XYZ
(which
> does not implement segmentation) can access it? How do we resolve
> difference in source? Where do we put segment status? etc. Basically
it's all
> the problems of 1.2 all over again. In 1.2 we had no choice because
we
> needed to be backward compatible. But 2.0 we can have a clean way
of
> dealing with segments.
>
> So far, the only rationale I've heard for making <segment> optional,
is the
> argument that segmentation is a different process and therefore should
not
> be part of the core. But I think we have seen that segmentation in
general is
> broader than sentence-segmentation and clearly happens also during
> extraction (see the example with ITS <withinTextRule/>), so
that rationale
> doesn't really hold true.
>
> But maybe I'm missing other things: what are the advantages of keeping
the
> segmentation representation optional?
>
> Cheers,
> -yves
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: xliff-help@lists.oasis-open.org
---------------------------------------------------------------------
To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: xliff-help@lists.oasis-open.org
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]