office message

Subject: Re: [office] Proposal for Spreadsheets: New sort option "natural sort"
From: Michael Brauer - Sun Germany - ham02 - Hamburg <Michael.Brauer@Sun.COM>
To: office@lists.oasis-open.org
Date: Fri, 02 Feb 2007 19:00:18 +0100
Hi,

The following illustrates how two strings shall be compared under the 
natural sort algorithm.

Step 1. First of all, the two strings are compared by using the normal 
string comparison algorithm to ensure that they are not equal. If they 
are, the function will return immediately with equality.

Step 2. Next, each of the two strings is divided into three parts: 
1.Prefix substring 2.Numeric substring 3.Suffix substring The prefix 
substring is determined by locating the first occurrence of a digit 
character; the substring from the very first character through the 
character preceding the first digit is considered the prefix. Now, if 
the first digit happens to be the first character of the whole string, 
the prefix substring becomes empty. If there is no digit in either one 
of the compared strings, the natural sort process will end and the 
normal string comparison will be performed instead. The digit determined 
herein is locale-aware, and therefore is not limited to ASCII digits. A 
decimal separator may also be considered a digit so that real numbers 
can also be supported if the appropriate conditions are met (see "Note" 
below).

Step 3. After the prefix substring is extracted from both of the 
compared strings, a normal string comparison is performed on the 
extracted prefixes. If they differ, the result is returned and the 
process will end. If they are equal, it will proceed to the next step of 
numeric string comparison.

Step 4. In this step, the numeric substring is determined by locating 
the first occurrence of a non-digit character after the first digit 
character; the substring from the first digit character through the 
character preceding the first non-digit is considered the numeric 
substring. This substring is then converted into a double-precision 
variable. This step is performed on both of the compared strings, and 
the converted values are compared by simple numeric comparison. If these 
values differ, then the result will be returned and the process will 
end. If they are equal to one another, then the process will proceed to 
the next step.

Step 5. After the numeric comparison returns equality, the suffix 
substring, which is simply the rest of the string that occurs after the 
last digit of the numeric substring, will be extracted. This suffix 
substring will then replace the original string, and the whole process 
will repeat (i.e. back to Step 1).

This sorting process is illustrated in the picture below. Note that the 
term "normal string comparison" repeatedly mentioned in the algorithm 
description refers to a locale-specific string comparison; therefore the 
term does not refer to a simple ASCII string comparison. This locale 
setting is either explicitly given by the table:language and 
table:country attributes, or the default locale when the language option 
is not explicitly specified (current behavior).


Note: Treatment of decimal separators:The treatment of a decimal 
separator is context-dependent, that is, when a decimal separator occurs 
adjacent to one or two digit characters, it is considered a digit 
character as long as it's the only occurrence in that given numeric 
substring. In other words, a second occurrence of a decimal separator in 
any numeric substring is treated as a non-digit character; therefore the 
character immediately preceding the separator becomes the last character 
of the numeric substring, while the separator itself becomes the first 
character of the suffix substring.


Best regards

Michael

robert_weir@us.ibm.com wrote:
> 
> Interesting idea.
> 
> How far do we take it?  
> 
> For example do we allow multiple levels, as in:
> 
> A1.1, A1.2, A1.10, ... , A19.1, A20.3, etc.
> 
> -Rob
> 
> Michael.Brauer@Sun.COM wrote on 01/19/2007 05:26:03 AM:
> 
>  > Dear TC members,
>  >
>  > this is a proposal for a new attribute of the <table:sort> element:
>  >
>  > The attribute "table:natural-sort" specifies how string values are 
> sorted.
>  > If the attribute's value is "true", string-prefixed numbers will be 
> sorted
>  > in a "natural", number-aware way, i.e. A1, A2, A3, ... , A19, A20,
>  > instead of the normal, alpha-numeric behavior, i.e.
>  > A1, A10, A11, A12, ... ,A19, A2, A20, A3, A4, ... , A8, A9.
>  >
>  > <define name="table-sort-attlist" combine="interleave">
>  >      <optional>
>  >          <attribute name="table:natural-sort" a:defaultValue="false">
>  >              <ref name="boolean"/>
>  >          </attribute>
>  >      </optional>
>  > </define>
>  >
>  > Best regards
>  >
>  > Michael
>  > --
>  > Michael Brauer, Technical Architect Software Engineering
>  > StarOffice/OpenOffice.org
>  > Sun Microsystems GmbH             Nagelsweg 55
>  > D-20097 Hamburg, Germany          michael.brauer@sun.com
>  > http://sun.com/staroffice         +49 40 23646 500
>  > http://blogs.sun.com/GullFOSS
>  >


-- 
Michael Brauer, Technical Architect Software Engineering
StarOffice/OpenOffice.org
Sun Microsystems GmbH             Nagelsweg 55
D-20097 Hamburg, Germany          michael.brauer@sun.com
http://sun.com/staroffice         +49 40 23646 500
http://blogs.sun.com/GullFOSS
Follow-Ups:
- Re: [office] Proposal for Spreadsheets: New sort option "natural sort"
  - From: David Faure <faure@kde.org>