you for showing this alternative idea. I've never had a situation where
I've needed to split sentences from the same element into separate
trans-units. It seems like needing to split trans-units this way would
present some challenges.
your example, I have a concern. I notice there are escaped tags
included. We specifically advised against this approach in section 2.4 of
the HTML profile:
2.4. Including Escaped Markup
The XLIFF specification allows for marking "beginning tags" and "ending tags"
(<bpt, <ept) in a way that the markup may be escaped and preserved.
This is generally seen as a way for non-XSLT based tools to abstract the
markup. However, XSLT does not parse escaped code efficiently. Since
there are efficient alternate ways to preserve the HTML code, it is not
recommended to use the <bpt and <ept tags.
While the following HTML could be expressed using the <bpt and <ept
tags, along with escaped HTML code:
big-air<ept id='1-2'></i></ept>, and
<bpt id='1-3'><i></bpt> yard-sale<ept
It is recommended to use the cleaner, more XSLT-friendly approach, like
<g id='n1' ctype='x-html-i'>picabo, big-air</g>,
and <g id='n2' ctype='x-html-i'>
you for your follow up,
On Mon, 2006-03-06 at 16:25 -0800,
I thought about this when I wrote that portion of
the HTML profile.
philosophical view, I strongly think I bpt/ept should only be used in
XLIFF files that are derived from non-markup formats (RTF, for
I really don't like the idea of using bpt/ept on
XLIFF files derived from HTML, XHTML, or XML files. I see "begin
paired tag" and "end paired tag" as an artificial device. It could
easily lead to malformed XML on the conversion from XLIFF back to
Assuming the source file is well formed, it would
be a shame to have to delimit inline elements in an artificial way. If
<g tags are defined in the spec in such a way that they are thought to be
for non-translatable text, I would vote to either update the specification,
or come up with a new element for identifying translatable inline elements
in <target elements.
Doug and Rodolfo for brining this issue to light,
I have my own concerns
against <bpt>/<ept> in general and <g> as used in the HTML
profile (although I always considered that <g> was reserved for
enclosing moveable non-translatable codes only).
Consider the following
<p>Italic texts starts <i>in the middle of first
sentence. Italics ends after the second
sentence.</i><p>If <g> is used to
enclose italicised text, the corresponding representation would be:
and sentence segmentation is not possible at
<source>Italic texts starts <g id='i1' ctype='x-html-i'>in the middle of
first sentence. Italics ends after the second sentence.</g></source>
Retrying with <bpt>/<ept> pairs:
we still have problems for splitting the text in two
segments without separating the <bpt> element from its matching
<source>Italic texts starts <bpt id="1"><i></bpt>in the middle of
first sentence. Italics ends after the second sentence.<ept id="1"></i></ept></source>
Two elements come to the rescue: <it> and
<source>Italic texts starts <it id="1" pos="open"><i></it>in the middle of
<source> Italics ends after the second sentence.<it id="1" pos="close"></i></it></source>
I prefer to use <ph> in my filters. This makes my
life a lot easier.
<source>Italic texts starts <ph id="1"><i></ph>in the middle of
<source> Italics ends after the second sentence.<ph id="1"></i></ph></source>
The information in this e-mail is intended strictly for
the addressee, without prejudices, as a confidential document.
Should it reach you, not being the addressee, it is not to be made
accessible to any other unauthorised person or copied, distributed
or disclosed to any other third party as this would constitute an
unlawful act under certain circumstances, unless prior approval is
given for its transmission. The content of this e-mail is solely
that of the sender and not necessarily that of Heartsome.