dita message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: RE: FW: FW: [dita] indexing question
- From: "Esrig, Bruce (Bruce)" <esrig@lucent.com>
- To: "'Erik Hennum'" <ehennum@us.ibm.com>, JoAnn Hackos <joann.hackos@comtech-serv.com>
- Date: Tue, 18 Jul 2006 11:20:01 -0400
Yes,
quoting from the DITA language guide:
indexterm
An <indexterm> is an index entry. You can
nest entries to create multi-level indexes. The content is not output as part of
topic content, only as part of the index.
When DITA topics are output to XHTML, any indexterm elements in the <keywords> element are placed in the Web page metadata,
in addition to becoming part of the generated index.
The
statement that indexterm content would be part of the flow is inconsistent with
the specification.
So
that would mean that indexterm content is never inline and always
subflow.
That
also eliminates the idea of separate element names depending on context. As Erik
points out, those would not be needed even if the behavior were
context-dependent.
Best
wishes,
Bruce
Hi, JoAnn and Bruce:
Regardless of whether an index entry is a point
(as Chris suggests) or a span (an alternative view), an index entry clearly
should never have an impact on flow. An index entry is an annotation on the
content much like a metadata property.
So, I would disagree with the
recommendation to treat the index entry as an inline if there is any
implication of affecting the layout or the parsing of text. An index entry
could appear in the middle of a word -- it shouldn't make any difference in
the processing.
Regarding interpretation of an index entry based on its
container, that applies to an index entry in the prolog. It should be
interpretted based on the topic (which is the effective container of
everything in the prolog). In particular, for an index entry in the prolog,
the index is either a point attached to the start of the topic or a span
covering the entire topic.
Even if the processing of index entries
_were_ different in different contexts, I don't see that this would
necessarily requires a different element name so long as the processing
conforms to expectations.
On the range questions, we should keep in
mind concerns about topic reuse. If we embed index start and end entries in
different topics, we run the risk of breaking the range when the start and end
topics are reused independently. That's part of the rationale for putting
ranges in the map.
Thanks,
Erik
Hennum
ehennum@us.ibm.com
"JoAnn Hackos"
<joann.hackos@comtech-serv.com>
"JoAnn Hackos"
<joann.hackos@comtech-serv.com>
07/18/2006 07:24 AM |
|
Here is the suggestion I received from Bruce in
response to Rodolfo’s clarification. I’m not certain that everyone has seen
it.
JoAnn T. Hackos, PhD
President
Comtech Services,
Inc.
710 Kipling Street, Suite 400
Denver, CO
80215
303-232-7586
joann.hackos@comtech-serv.com
joannhackos Skype
www.comtech-serv.com
From: Esrig, Bruce
(Bruce) [mailto:esrig@lucent.com]
Sent: Monday,
July 17, 2006 1:23 AM
To: JoAnn Hackos
Subject: RE: FW: [dita] indexing
question
Hi JoAnn,
Every two weeks I have a
meeting at 10 Eastern that runs for an hour or two. This is one of those
weeks, and it could be a long meeting. So I suspect I won't be able to attend
the translation SC meeting this week.
1. Thanks to Rodolfo for
explaining "breaking" and the related terms. Is he willing to have his message
posted in part or whole to the main DITA list? May I distribute it within
Lucent?
2. I agree that treating <indexterm> as a
subflow in the prolog and as an inline elsewhere is best among the
alternatives presented. Would the translation SC be opposed to specifying that
<indexterm> is filtered on the way to/from TM in order to distinguish an
<indexterm> that is to be treated as a subflow from an <indexterm>
that is to be treated as an inline? This could be done by creating two up to
two artificial elements <indextermsubflow> and <indexterminline>
that are used only in the TM processing.
If it
were possible to distinguish between subflow and inline uses of indexterm,
then DITA could also offer the following enhancement: add an attribute to
suppress printing in inline contexts, such as <indexterm print="no">.
This takes advantage of the ability to distinguish between a subflow and an
inline. If print="no" is specified, then in an inline context, the
<indexterm> would be treated as a subflow.
3. The
translation SC might wish (especially if the filtering proposal is not
feasible) to recommend a special element <indextermprolog>. The default
treatment of <indexterm> would be as an inline, but
<indextermprolog> would be treated as a subflow. Since DITA 1.1 is
expected to be backward compatible, <indextermprolog> could be an
optional alternative to <indexterm> in prolog contexts in DITA 1.1.
Subsequently, <indextermprolog> could become the standard element for
use in prolog contexts. This approach would still leave room for the
print="no" enhancement.
4. I'm delighted that Rodolfo
separates out the issue of multiple-topic ranges. If needed, the translation
SC could still discuss for approval or disapproval the point of view that ...
those groups that want to support index ranges that span multiple processes
will have to take responsibility for ensuring that their translation processes
support it. For example, such groups could extract their index range data in
advance, translate it in advance, and submit the translated data with the main
body of material to be translated.
Best wishes,
Bruce
Esrig
-----Original Message-----
From: JoAnn Hackos [mailto:joann.hackos@comtech-serv.com]
Sent: Sunday, July 16, 2006
10:49 PM
To:
esrig@lucent.com
Subject: FW: FW: [dita] indexing question
Bruce,
I think
you'll find Rodolfo's email clarifies the action of the translation
tools.We're meeting on this tomorrow. Please send me comments if you
cannot attend.
JoAnn
JoAnn T. Hackos, PhD
President
Comtech Services,
Inc.
710 Kipling Street, Suite 400
Denver CO
80215
303-232-7586
joann.hackos@comtech-serv.com
From: Rodolfo M.
Raya [mailto:rodolfo@heartsome.net]
Sent: Sunday,
July 16, 2006 7:41 AM
To: JoAnn Hackos
Cc: Andrzej
Zydron
Subject: Re: FW: [dita] indexing question
On Sat, 2006-07-15 at 15:33 -0600, JoAnn
Hackos wrote:
I
thought I would forward one of the recent threads regarding indexing in
the TC. Do you see any potential problems with regard to translation in
these proposals? Please let me know more about the issue with index terms
in the prolog pointing to the topic and index terms in block elements. I
believe you referred to the first instance as a breaking element and the
second as a subflow. Is it possible to get a definition and example of a
breaking element and a subflow?
Hi Joann,
Let me start with an explanation on element
types (classified from translation tools point of view). Consider this XML
fragment:
<table> <row> <col> <p>Segment one.</p>
</col> <col> <p>Segment two. Second sentence.</p> <p>Segment with <b font="Times">bold</b> text.</p> <p>Segment with <footnote>some comment</footnote> footnote.</p> </col> <col> </col> </row> </table> |
Six segments can be
extracted for translation from the example:
1. Segment
one.
2. Segment two.
3. Second sentence.
4. Segment with «1»bold«2» text.
5. Segment with «1» footnote.
6. some comment
We can classify the elements present in the fragment
as:
Breaking |
Elements that contain text fragments that should be analysed as a
unit. A new segment should be created whenever this kind of element is
found at text extraction time. In CAT tools maker jargon, it "breaks"
the segment being processed and starts a new one. |
<p> |
Inline
|
Elements that delimit text fragments that should be analysed as
part of the text from the parent element. These elements usually delimit
changes in style. |
<b> |
Subflow
|
Elements that contain text fragments that should be analysed
separately. Processing of the enclosing segment does not end. The
element is replaced by a marker in the text and its processing is
delayed. |
<footnote> |
Ignorable |
Elements that are not supposed to contain translatable text and
can be discarded, except when they appear as children of breaking
elements in which case they should be regarded as "inline". |
<table>, <row>, <col>
|
In the example given
above, the element <p> is considered a "breaking" element because it
encloses text that should be extracted as a unit. Notice that a <p>
element may contain several sentences and require additional processing
based on grammar rules that are independent from XML markup (see items 2.
and 3. in the list of segments).
The element <b> is
considered "inline" because it does not contain text that needs to be
translated on its own. The text from this element is supposed to belong to
a bigger fragment. The XML markup of inline elements is irrelevant to
translators and it is replaced by "tags" in the extracted text
(«1» and
«2» in item 4. of
list of segments).
The element <footnote> contains text that
can be considered a translation unit on its own. Its content is related to
the enclosing text, but it isn't part of the enclosing text. At extraction
time the content of a "subflow" element is placed in its own segment and a
"tag" is added in the segment that contains the original context to mark
the location of the material that has been separated (see items 5 and 6
from the list).
When we were discussing last Monday, I initially
believed that <indexterm> was considered an "inline" element. Near
the end of the talk someone clarified that <indexterm> is a
"subflow" element.
Let me try to explain with examples what would
be wrong if <indexterm> is always treated as "inline" or "subflow"
element.
<topic> <prolog> <indexterm>term one</indexterm> <indexterm>term two</indexterm> </prolog> <body> <p>Paragraph that contains
<indexterm>term one</indexterm> and <indexterm>term two</indexterm>
inside.</p> </body> </topic> |
A) If <indexterm> is treated as
"inline", we get these segments after text extraction:
1. «1»term one«2»«3»term two«4»
2.
Paragraph that
contains «1»term one«2» and
«3»term two«4» inside.
In this case, the translation of segment 1 cannot be reused for
translating segment 2.
B) If
<indexterm> is considered a "subflow" element, we will get these
strings:
1. term
one
2. term two
3. Paragraph that
contains «1» and
«2» inside.
4. term one
5. term two
In this case, translation of segment 3 becomes complicated
because the sentence lacks relevant portions.
C) If we treat <indexterm> as "subflow"
or "breaking" when it is a child of <prolog> and as "inline"
anywhere else, we get these strings::
1. term
one
2. term two
3. Paragraph that
contains «1»term one«2» and
«3»term two«4» inside.
In this case, translations of segments 1 and 2 can be reused as
terminology entries when translating segment 3.
In my opinion,
case C) is the best one. If stating that an element
should be classified differently according to context is difficult (and I
guess it is), then case A) should be
considered as more reasonable alternative.
Finally, the discussion
of indexes and page ranges that happened in the main DITA list is
irrelevant from translation point of view. It doesn't matter if an index
covers one or more topics/pages.
Best
regards,
Rodolfo
-- The information in this e-mail is intended
strictly for the addressee, without prejudices, as a
confidential document. Should it reach you, not being the
addressee, it is not to be made accessible to any other
unauthorised person or copied, distributed or disclosed to
any other third party as this would constitute an unlawful
act under certain circumstances, unless prior approval is
given for its transmission. The content of this e-mail is
solely that of the sender and not necessarily that of
Heartsome.
| | |
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]