OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [docbook-apps] Webhelp search trims certain letters from search terms?

On Wed, Mar 9, 2011 at 1:36 PM, Peter Desjardins <peter.desjardins.us@gmail.com> wrote:

I'm producing webhelp output
(http://www.thingbag.net/docbook/gsoc2010/doc/content/index.html) and
I noticed that when I search for the term "nucleus," the webhelp
search function removes the letter s and searches for "nucleu."
"Nucleus" is a commonly used term in my document. I see the same
behavior with the search term "zeus" and "tutus" becomes "tutu."

Is this a configurable behavior? Is the search function purposely
simplifying my terms?

Hi Peter,

The searching happens for the stemmed words of the given query. i.e. it purposely get the root words of the given search terms to provide better searching support. Link [1] has an small introduction on what stemmer does and the limitations it has. WebHelp uses Porter stemmer for English [2], and Snowball stemmers for several other languages [3]. 
Does it return false results for 'nucleu' when searched for 'nucleus'? We tested the search with stemming, and it worked as expected except some few glitches which is ignorable compared to the power it adds!

[1] http://blog.kasunbg.org/2010/10/javascript-stemmer-for-french-language.html
[2] http://snowball.tartarus.org/algorithms/porter/stemmer.html
[3] http://docbook.sourceforge.net/release/xsl/current/webhelp/docs/content/ch03s02.html



Peter Desjardins

To unsubscribe, e-mail: docbook-apps-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-help@lists.oasis-open.org

Kasun Gajasinghe,
University of Moratuwa,
Sri Lanka.
Blog: http://blog.kasunbg.org
Twitter: http://twitter.com/kasunbg

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]