OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [docbook-apps] sorting issue in Arabic glossary (glossary.sort)


Hey, just wanted to follow-up.

We have checked again the results of glossary sorting in Arabic and it is working after all. Thanks for your support!


From: Bob Stayton <bobs@sagehill.net>
To: Cosmin D <vcdanciu@yahoo.com>; DocBook Apps <docbook-apps@lists.oasis-open.org>
Sent: Tue, December 21, 2010 10:05:40 PM
Subject: Re: [docbook-apps] sorting issue in Arabic glossary (glossary.sort)

I'm not clear why that would happen.  I can tell you how the stylesheet applies sorting, though:
 
<xsl:apply-templates select="$entries" mode="glossary.as.list">
   <xsl:sort lang="{$language}"
             select="normalize-space(
                       translate(
                         concat(@sortas,
                                glossterm[not(parent::glossentry/@sortas) or
                                              parent::glossentry/@sortas = '']),
                         &lowercase;,
                         &uppercase;))"/>
</xsl:apply-templates>
It uses the XSLT xsl:sort instruction, which applies the sorting algorithm of the XSLT processor.  So the first thing I would try is changing XSLT processor.  Saxon and Xalan both use the Java sort classes, but xsltproc would use something different.
 
The "language" attribute value comes from calling the template named "l10n.language" on the glossary element. That gets the lang or xml:lang value from the document, based on the closest ancestor with such an attribute (usually the document's root element).
 
The "select" attribute determines what it sorts on.  The complex construction handles a couple of features: handling @sortas attributes, and mixing uppercase and lowercase. 
 
The "concat" function merges any @sortas attribute with the glossterm without @sortas.  Thus if @sortis is present and not empty, then the concat returns @sortas; otherwise, it returns the text value of the glossterm.  Do you have any @sortas attributes in your glossary?
 
To mix letter cases, the translate function tries to convert all entries to uppercase, using entities to indicate the letter arguments for translate().  Those letter arguments are filled in from two entries in the gentext file in the "common" directory (common/ar.xml in this case) whose names are "normalize.sort.input" and "normalize.sort.output".  This may be where it is breaking, as the entries for those two gentext items for ar.xml has no Arabic characters. They are just copied from the other Latin languages.  But that would effectively mean that the translate function makes no changes to the letters, and hence no changes to the sort order.
 
And the normalize-space() function removes any leading spaces that could mess up the sort order.
 
I hope this provides you with some clues to figure out why you are getting your results.
 
Bob Stayton
Sagehill Enterprises
bobs@sagehill.net
 
 
----- Original Message -----
From: Cosmin D
To: docbook-apps@lists.oasis-open.org
Sent: Tuesday, December 21, 2010 4:20 AM
Subject: [docbook-apps] sorting issue in Arabic glossary (glossary.sort)

Hi,

I've recently found the "glossary.sort" parameter and started testing it. It seems to be working well with Latin scripts (I've tested German, Spanish and French), and I've also received confirmation that it works in Russian.

In Arabic however, it seems to have a rather strange behavior. The terms in our glossary (with sorting enabled) are somehow separated in two distinct groups, and only the terms in each group are sorted correctly.

This is how Arabic sorting would look like in Latin scripts: (a abc bca bdb cab cba) (ab acc bcd ccc dca) - this is a random example, don't try to find a rule in the given ordering. We haven't checked if there's a specific rule based on which terms are placed into the first or the second group.

Is there anyone that has encountered a similar problem in Arabic, and may be Chinese scripts (haven't tested yet on Chinese) ? Does anyone know about known issues with the "glossary.sort" parameter ?

Thank you very much!
Cosmin





[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]