docbook-apps message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: WebHelp, English stemmer, problems with specific words
- From: "Bort, Paul" <pbort@tmwsystems.com>
- To: "docbook-apps@lists.oasis-open.org" <docbook-apps@lists.oasis-open.org>
- Date: Wed, 11 Jan 2012 01:33:55 +0000
Title: Default Signature
Hi,
I found the conversation about problems with the stemmer used with English at
http://lists.oasis-open.org/archives/docbook-apps/201103/msg00040.html very informative in tracking
down the problem I'm having with the stemmer, which is similar. In my case, the word that isn't
being stemmed correctly is "relay".(It comes out as "relai".) This does break searches: searching
for "relay" in a document that should have six matches returns an error "Your search returned no
results for relai".
The solution that I've implemented locally, and offer below for your consideration, is a list of
words to be stemmed manually. I've tried to follow your coding style but I'm not a serious
_javascript_ hacker so I may have stepped on some toes inadvertently.
Regards,
Paul Bort
Systems Engineer
TMW Systems, Inc.
pbort@tmwsystems.com
----------------------------------
--- en_stemmer.js
+++ en_stemmer.js
@@ -54,6 +54,14 @@
meq1 = "^(" + C + ")?" + V + C + "(" + V + ")?$", // [C]VC[V] is m=1
mgr1 = "^(" + C + ")?" + V + C + V + C, // [C]VCVC... is m>1
s_v = "^(" + C + ")?" + v; // vowel in stem
+
+ var exceptionWords = {
+ "relay":"relay",
+ "relaying":"relay",
+ "relays":"relay",
+ "nucleus":"nucleus",
+ "zeus":"zeus"
+ };
return function (w) {
var stem,
@@ -67,6 +75,8 @@
if (w.length < 3) { return w; }
+ if (w in exceptionWords) { return exceptionWords{w}; }
+
firstch = w.substr(0,1);
if (firstch == "y") {
w = firstch.toUpperCase() + w.substr(1);