From: Bob Stayton <bobs@sagehill.net>
To: Cosmin D <vcdanciu@yahoo.com>; DocBook Apps <docbook-apps@lists.oasis-open.org>
Sent: Tue, December 21, 2010 10:05:40 PM
Subject: Re: [docbook-apps] sorting issue in Arabic
glossary (glossary.sort)
I'm not clear why that would happen. I can
tell you how the stylesheet applies sorting, though:
<xsl:apply-templates select="$entries"
mode="glossary.as.list">
<xsl:sort lang="{$language}"
select="normalize-space(
translate(
concat(@sortas,
glossterm[not(parent::glossentry/@sortas) or
parent::glossentry/@sortas = '']),
&lowercase;,
&uppercase;))"/>
</xsl:apply-templates>
It uses the XSLT xsl:sort
instruction, which applies the sorting algorithm of the XSLT
processor. So the first thing I would try is changing XSLT
processor. Saxon and Xalan both use the Java sort classes, but xsltproc
would use something different.
The "language" attribute value comes from calling the template
named "l10n.language" on the glossary element. That gets the lang or xml:lang
value from the document, based on the closest ancestor with such an attribute
(usually the document's root element).
The "select" attribute determines what it sorts
on. The complex construction handles a couple of features: handling
@sortas attributes, and mixing uppercase and lowercase.
The "concat" function merges any @sortas attribute
with the glossterm without @sortas. Thus if @sortis is present and not
empty, then the concat returns @sortas; otherwise, it returns the text value of
the glossterm. Do you have any @sortas attributes in your
glossary?
To mix letter cases, the translate function tries
to convert all entries to uppercase, using entities to indicate the letter
arguments for translate(). Those letter arguments are filled in from two
entries in the gentext file in the "common" directory (common/ar.xml in
this case) whose names are "normalize.sort.input" and
"normalize.sort.output". This may be where it is breaking, as the entries
for those two gentext items for ar.xml has no Arabic characters. They are just
copied from the other Latin languages. But that would effectively mean
that the translate function makes no changes to the letters, and hence no
changes to the sort order.
And the normalize-space() function removes any
leading spaces that could mess up the sort order.
I hope this provides you with some clues to figure
out why you are getting your results.
----- Original Message -----
Sent: Tuesday, December 21, 2010 4:20
AM
Subject: [docbook-apps] sorting issue in
Arabic glossary (glossary.sort)
Hi,
I've recently found the "glossary.sort" parameter and started
testing it. It seems to be working well with Latin scripts (I've tested
German, Spanish and French), and I've also received confirmation that it works
in Russian.
In Arabic however, it seems to have a rather strange
behavior. The terms in our glossary (with sorting enabled) are somehow
separated in two distinct groups, and only the terms in each group are sorted
correctly.
This is how Arabic sorting would look like in Latin
scripts: (a abc bca bdb cab cba) (ab acc bcd ccc dca) - this is a random
example, don't try to find a rule in the given ordering. We haven't checked if
there's a specific rule based on which terms are placed into the first or the
second group.
Is there anyone that has encountered a similar problem in
Arabic, and may be Chinese scripts (haven't tested yet on Chinese) ? Does
anyone know about known issues with the "glossary.sort" parameter
?
Thank you very
much!
Cosmin