docbook-apps message

Subject: Re: [docbook-apps] Japanese index

From: Tony Graham <tgraham@antenna.co.jp>
To: docbook-apps@lists.oasis-open.org
Date: Tue, 24 Apr 2018 20:53:59 +0100

On 24/04/2018 19:39, Jan Tosovsky wrote:

has anybody any experience with generating Japanese back-of-the-book index
from DocBook source?


More than 20 years ago.

I am facing same issues discussed in this old thread (all entries end up in
the Symbols section):
https://lists.oasis-open.org/archives/docbook-apps/200605/msg00063.html

If I understand correctly, indices in Japanese should be grouped
phonetically:
https://www.slideshare.net/k16shikano/imybp-light

I've found promising Kuromoji library https://github.com/atilika/kuromoji
I can imagine it could somehow pre-process all index entries and generate
values for the 'sortas' attribute.


Slide 35 of those slides shows a corner case that a morphological
analyzer could get wrong. (I'm not able to test it, myself.)

If you were using 'kuromoji', you could concatenate the values of the
'Reading' feature for all of the parts of speech of an index entry and
use that as the 'sortas' value.

But it is still unclear how to tweak the index code to generate groups from
non-latin characters.


I don't know, either.

Or are there better ways?


It's probably not what you want to hear, but Antenna House does have a
commercial product for doing DocBook indexes:

https://www.antennahouse.com/antenna1/i18n-index-library/

Regards,


Tony Graham.
--
Senior Architect
XML Division
Antenna House, Inc.
----
Skerries, Ireland
tgraham@antenna.co.jp

Follow-Ups:
- Re: [docbook-apps] Japanese index
  - From: Jirka Kosek <jirka@kosek.cz>

References:
- Japanese index
  - From: "Jan Tosovsky" <j.tosovsky@email.cz>