[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [xliff] Proposal for Segmentation Notation in XLIFF
Hi David, Thank you for your feedback. I will try to
comment on the different issues you have raised one-by-one, to make the
discussion a bit easier to follow. 1) The reason for our choice of the <mrk> element to represent
segmentation, as opposed to introducing a new element (e.g. <seg>) is
that we feel that the introduction of a new element inside the <target>
content would cause too much potential incompatibility issues with existing
XLIFF implementation. We felt that the introduction of a new element inside
<target> would severely affect the possibilities of our proposal being
accepted as an amendment to the XLIFF 1.1 specification. However we did also agree
that the introduction of a specific element for representing segmentation would
be beneficial for the standard in the next major version, and when we start our
work on that I will raise this topic. 2) Regarding the issue of segment boundaries it is important to note
that we must not make assumptions about what kind of segmentation will be represented.
The purpose of the segmentation is to increase recycling rates when used with
tools such as translation memories. Different CAT tools and translation
memories use different segmentation algorithms, and not all of them require the
entire text content to be segmented. Quite often markup such as tags also
affect the segment boundaries. Some segmentation algorithms may e.g. choose to exclude
tags or formatting that appears before and after a sentence in the segment,
while others don’t. It is important to leave this flexibility to the segmentation
tools rather than enforcing a particular approach in the XLIFF standard. I
would also like to point out that it is still the entire content of the
<target> element that makes up the actual full translation, rather than
what is in the individual segments. The segments are there to aid certain tools
in safe recycling of content on sub-<trans-unit> level. 3) Regarding the use of SRX this topic has also been discussed in the
segmentation sub-committee. In our most recent discussion our conclusion was that
the specifics of embedding and/or referencing SRX is a topic that should be
pursued by the main XLIFF committee, in particular as it is likely to involve
closer cooperation with other standards groups. 4) The issue of whether segments should be represented by elements
spanning the segment content was also discussed in detail over a longer time
period in the subcommittee. In the proposal we all voted 100% for in the end we
chose our suggested approach of using <mrk> elements to span the segment
content. Here are some of the reasons: a. It is important to use a representation that is easy to process. This
approach has many benefits in this respect. In particular XML DOM-based tools
can be used to process content, which is not easily achievable with some of the
other suggested approaches. b. The issue with non-clonable <g> elements represents a bigger problem
than allowing or not allowing segmentation. If non-clonable <g> elements are
used in a way that the content they span may include more than single words or
isolated expressions they represent highly localisation unfriendly
content, and they are very likely to cause difficult problems during
translation. Being able to break a segment inside such an element may be the
smallest of the problems that tools would be faced with. In this case it is actually
rather an advantage that segmentation is not allowed at such points, as
the non-clonable <g> element clearly
represents a piece of content that must be translated as one piece,
no matter what. Perhaps I can illustrate what I mean with an example
translation from English to “Yoda-English” (for Star Wars fans): <source>This
is a <g>sentence. It has</g>
markup.</source> The translation into “Yoda-English”
would be: <target>A <g>sentence</g> this is. Markup <g>it has</g>.</target> However if the <g> element cannot be
cloned this is not possible, and as a result the content cannot be correctly
localised. This is in fact irrespective of whether segments are introduced here
or not. I hope this addresses your questions and
concerns, and I look forward to an interesting discussion on this topic later
today. Best regards, From:
David Pooley [mailto:dpooley@sdl.com] I'm more than a little concerned that
non-clonable <g>
elements prohibit segmentation of text. I'm also unclear as to why it is
necessary to potentially exclude any text from the original <source> when marking the
segment boundaries. In this case, we can have the situation where the sum of
the parts does not equal the whole. If SRX (which is based on Unicode TR-29) is being
considered to use with XLIFF, this standard defines where a segmentation break
should occur; not where a segment begins and ends. As such, there's no
provision for excluding text once it is segmented. Given the amount of assumed
functionality that is being passed on to the XLIFF editor I think it would be
reasonable to assume that this editor would also be capable of stripping
unwanted whitespace from the start or end of the segment where necessary. Is there a documented reason why the <mrk> element
was chosen to represent segmentation and not a new, empty element such as <seg/>? David
Pooley -----Original
Message----- Hi all, The segmentation subcommittee has voted unequivocally to put
forward the following proposal to the main XLIFF committee on how to represent segmentation
in XLIFF files. I would hereby like to request a formal review of the
proposal by the XLIFF Committee for its inclusion in the XLIFF draft
specification. The following document explains and details the proposed
changes to the XLIFF specification: Best regards, on behalf of the XLIFF Segmentation Subcommittee |
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]