OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Webhelp stemming and search indexer


Hi

I am using the DocBook XSL stylesheets (version 1.78.1) to produce Webhelp, and my documents are being translated into French, Japanese, Korean, and Simplified Chinese.

I have a couple of questions about configuring the Webhelp search which do not seem 100% obvious to me, having looked through the Webelp docs.

(1) The Webhelp XSL templates always output a link to a _javascript_ stemmer library. The file name of the library linked to is determined by the webhelp.indexer.language parameter. But Webhelp only includes stemmers for en, fr and de languages.

Question 1: Is it OK to use the default "en" _javascript_ stemmer with non-English locales, or is it best to customize the template that outputs the stemmer link and remove the link for languages that do not have a stemmer?

(2) The Java indexer command used with the Webelp build has the properties webhelp.indexer.language and enable.stemming.

In trying to establish a list of languages that have Java stemmer support, the Webhelp docs have this:

- In the section "Adding support for other (non-CJKV) languages") there is a list of non-CJKV languages that have stemmer support but no language codes.

- In the section "Search indexing" it says look in the build.properties file for the language code, but the build.properties file says look in the docs.

- In the section "New Stemmers" (in the developer docs) it seems to indicate a different list of languages with stemmers, with a list of language codes (including "cn" for Chinese?).

Question 2: If the enable.stemming property if set to true, is the value of webhelp.indexer.language used to determine whether a Java stemmer is used?

Question 3: Is there a definitive list of language codes that the Java indexer expects/accepts/supports for the language?

Question 4: If a language has no Java stemmer, is it best to set the enable.stemming property to "false", or does it not really matter?


Thanks






[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]