[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [docbook-apps] Japanese index
On 24/04/2018 19:39, Jan Tosovsky wrote:
has anybody any experience with generating Japanese back-of-the-book index from DocBook source?
More than 20 years ago.
I am facing same issues discussed in this old thread (all entries end up in the Symbols section): https://lists.oasis-open.org/archives/docbook-apps/200605/msg00063.html If I understand correctly, indices in Japanese should be grouped phonetically: https://www.slideshare.net/k16shikano/imybp-light I've found promising Kuromoji library https://github.com/atilika/kuromoji I can imagine it could somehow pre-process all index entries and generate values for the 'sortas' attribute.
Slide 35 of those slides shows a corner case that a morphological analyzer could get wrong. (I'm not able to test it, myself.) If you were using 'kuromoji', you could concatenate the values of the 'Reading' feature for all of the parts of speech of an index entry and use that as the 'sortas' value.
But it is still unclear how to tweak the index code to generate groups from non-latin characters.
I don't know, either.
Or are there better ways?
It's probably not what you want to hear, but Antenna House does have a commercial product for doing DocBook indexes: https://www.antennahouse.com/antenna1/i18n-index-library/ Regards, Tony Graham. -- Senior Architect XML Division Antenna House, Inc. ---- Skerries, Ireland email@example.com