dita message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: FW: FW: [dita] indexing question
- From: "Esrig, Bruce (Bruce)" <esrig@lucent.com>
- To: "'dita@lists.oasis-open.org'" <dita@lists.oasis-open.org>
- Date: Mon, 17 Jul 2006 08:59:11 -0400
JoAnn
was able to get a very helpful clarification from Rodolfo
Raya.
Rodolfo concentrated on clarifying the impact on
translation memory of allowing index terms in both the prolog and the
content.
The
index terms in the content are best modeled as inline
information.
The
index terms in the prolog are best modeled as a subflow.
Rodolfo takes the point of view that translation would
not raise any issues with ranges across topics.
Best
wishes,
Bruce
Esrig
On Sat, 2006-07-15 at 15:33 -0600, JoAnn Hackos wrote:
I thought I would forward
one of the recent threads regarding indexing in the TC. Do you see any
potential problems with regard to translation in these proposals? Please let
me know more about the issue with index terms in the prolog pointing to the
topic and index terms in block elements. [Esrig, Bruce (Bruce)] ... Is it possible
to get a definition and example of a breaking element and
a subflow?
Hi Joann,
Let me start with an
explanation on element types (classified from translation tools point of view).
Consider this XML fragment:
<table>
<row>
<col>
<p>Segment one.</p>
</col>
<col>
<p>Segment two. Second sentence.</p>
<p>Segment with <b font="Times">bold</b> text.</p>
<p>Segment with <footnote>some comment</footnote> footnote.</p>
</col>
<col>
</col>
</row>
</table>
|
Six segments can be extracted for
translation from the example:
- Segment one.
- Segment two.
- Second sentence.
- Segment with «1»bold«2» text.
- Segment with «1» footnote.
- some comment
We can
classify the elements present in the fragment as:
Breaking |
Elements that contain text fragments that should be analysed as a
unit. A new segment should be created whenever this kind of element is
found at text extraction time. In CAT tools maker jargon, it "breaks" the
segment being processed and starts a new one. |
<p> |
Inline |
Elements that delimit text fragments that should be analysed as part
of the text from the parent element. These elements usually delimit
changes in style. |
<b> |
Subflow |
Elements that contain text fragments that should be analysed
separately. Processing of the enclosing segment does not end. The element
is replaced by a marker in the text and its processing is delayed. |
<footnote> |
Ignorable |
Elements that are not supposed to contain translatable text and can be
discarded, except when they appear as children of breaking elements in
which case they should be regarded as "inline". |
<table>, <row>, <col> |
In
the example given above, the element <p> is considered a "breaking"
element because it encloses text that should be
extracted as a unit. Notice that a <p>
element may contain several sentences and require additional processing based on
grammar rules that are independent from XML markup (see
items 2. and 3. in the list of segments).
The element <b> is
considered "inline" because it does not contain text that needs to be translated on its own. The text from this element
is supposed to belong to a bigger fragment. The XML markup of inline elements is
irrelevant to translators and it is replaced by "tags" in the extracted text
(«1» and «2» in item 4. of list
of segments).
The element <footnote>
contains text that can be considered a translation unit on its own. Its
content is related to the enclosing text, but it isn't part of the enclosing
text. At extraction time the content of a "subflow" element is placed in its own
segment and a "tag" is added in the segment that contains the original context
to mark the location of the material that has been separated (see items 5 and 6
from the list).
When we were discussing last Monday, I initially believed
that <indexterm> was considered an "inline" element. Near the end of the
talk someone clarified that <indexterm> is a "subflow" element.
Let
me try to explain with examples what would be wrong if <indexterm> is
always treated as "inline" or "subflow" element.
<topic>
<prolog>
<indexterm>term one</indexterm>
<indexterm>term two</indexterm>
</prolog>
<body>
<p>Paragraph that contains <indexterm>term one</indexterm>
and <indexterm>term two</indexterm> inside.</p>
</body>
</topic>
|
A) If <indexterm> is
treated as "inline", we get these segments after text extraction:
- «1»term one«2»«3»term two«4»
- Paragraph that contains «1»term one«2» and «3»term two«4» inside.
In this case, the translation of segment 1 cannot be reused for
translating segment 2.
B) If <indexterm> is considered a
"subflow" element, we will get these strings:
- term one
- term two
- Paragraph that contains «1» and «2» inside.
- term one
- term two
In this
case, translation of segment 3 becomes complicated because the sentence lacks
relevant portions.
C) If we treat <indexterm> as "subflow"
or "breaking" when it is a child of <prolog> and as "inline" anywhere
else, we get these strings::
- term one
- term two
- Paragraph that contains «1»term one«2» and «3»term two«4» inside.
In this case, translations of segments 1 and 2 can be reused as
terminology entries when translating segment 3.
In my opinion, case
C) is the best one. If stating that an element should be classified
differently according to context is difficult (and I guess it is), then case
A) should be considered as more reasonable alternative.
Finally,
the discussion of indexes and page ranges that happened in the main DITA list is
irrelevant from translation point of view. It doesn't matter if an index covers
one or more topics/pages.
Best regards,
Rodolfo
-- The information in this e-mail is intended strictly
for the addressee, without prejudices, as a confidential
document. Should it reach you, not being the addressee, it is
not to be made accessible to any other unauthorised person or
copied, distributed or disclosed to any other third party as
this would constitute an unlawful act under certain
circumstances, unless prior approval is given for its
transmission. The content of this e-mail is solely that of the
sender and not necessarily that of Heartsome.
| | |
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]