OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [docbook-apps] Auto-generating an index


Hi,

>On Thu, 3 Feb 2011 15:07:51 -0600
>Tom Browder <tom.browder@gmail.com> wrote:
>
>> I have a medium sized document in docbook xml format.  I want to make
>> an index but am too lazy to go through it manually for a first order
>> cut.  I would rather auto-generate one and then fine tune it as I have
>> time.
>> 
>> Before I lay out my proposed method, I don't want to reinvent the
>> wheel: does anyone have any pointers to any auto-indexing tools for
>> DB?
>
>
>Sorry Tom, IMHO it is a long and boring job best suited for a human.
>Think of a couple of the terms you might want to index, then search for
>them. In a fair sized document it is amazing how often they crop up in
>'the wrong' place. 
>  I guess it depends on the quality of the index you want to create.
>
>I found the best support is from a good macro. Highlight the word/phrase
>then use one of a couple of macros to insert primary/secondary term
>markup. Emacs is great for this kind of job

Dave, although I agree that an index is best suited to a human, I think
it is not that black and white. :)

With some care on the markup it is possible to generate an automated
index but "fine-tune" it afterwards as Tom requested. I did this with 
my own book and it had worked great.

However, you need to be very consistent with your markup. For example,
I had a lot of tags that I would like to see in the index. Not all, but some
of them. Same for some functions.

What I did was the following: 

1. First I've decided which tag(s) was applicable. For a XML element name 
it is usually tag (DocBook 5) or sgmltag (DocBook 4). For a function name
the function element is needed.

2.  I've created a customization layer for the profiling stylesheet and 
inserted a template rule which matches for tag/sgmltag and function. 
Each rule just copies the respective element but adds an indexterm with
its corresponding primary and secondary content. 

3. The original DocBook document is "profiled" and its result contains
just the added indexterms. This result can then be transfomed into
other target formats. 

Of course, this is just a start and needs to be refined. With this method,
_all_  tag and function content show up in the index. This is probably not 
what you want. For this reason, you need to manually switch off
the indexterm addition. I did this with condition="noidx".

If needed you can also turn the logic and add nothing in the index
except for those elements with a condition="idx". That depends on your
document and what is more convenient.

This was just a quick overview of the process. If needed, I can show
you also my stylesheet modifications.

Tom


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]