OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

# office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [office] Proposal for Spreadsheets: New sort option "natural sort"

• From: Michael Brauer - Sun Germany - ham02 - Hamburg <Michael.Brauer@Sun.COM>
• To: office@lists.oasis-open.org
• Date: Fri, 02 Feb 2007 19:00:18 +0100

```Hi,

The following illustrates how two strings shall be compared under the
natural sort algorithm.

Step 1. First of all, the two strings are compared by using the normal
string comparison algorithm to ensure that they are not equal. If they
are, the function will return immediately with equality.

Step 2. Next, each of the two strings is divided into three parts:
1.Prefix substring 2.Numeric substring 3.Suffix substring The prefix
substring is determined by locating the first occurrence of a digit
character; the substring from the very first character through the
character preceding the first digit is considered the prefix. Now, if
the first digit happens to be the first character of the whole string,
the prefix substring becomes empty. If there is no digit in either one
of the compared strings, the natural sort process will end and the
normal string comparison will be performed instead. The digit determined
herein is locale-aware, and therefore is not limited to ASCII digits. A
decimal separator may also be considered a digit so that real numbers
can also be supported if the appropriate conditions are met (see "Note"
below).

Step 3. After the prefix substring is extracted from both of the
compared strings, a normal string comparison is performed on the
extracted prefixes. If they differ, the result is returned and the
process will end. If they are equal, it will proceed to the next step of
numeric string comparison.

Step 4. In this step, the numeric substring is determined by locating
the first occurrence of a non-digit character after the first digit
character; the substring from the first digit character through the
character preceding the first non-digit is considered the numeric
substring. This substring is then converted into a double-precision
variable. This step is performed on both of the compared strings, and
the converted values are compared by simple numeric comparison. If these
values differ, then the result will be returned and the process will
end. If they are equal to one another, then the process will proceed to
the next step.

Step 5. After the numeric comparison returns equality, the suffix
substring, which is simply the rest of the string that occurs after the
last digit of the numeric substring, will be extracted. This suffix
substring will then replace the original string, and the whole process
will repeat (i.e. back to Step 1).

This sorting process is illustrated in the picture below. Note that the
term "normal string comparison" repeatedly mentioned in the algorithm
description refers to a locale-specific string comparison; therefore the
term does not refer to a simple ASCII string comparison. This locale
setting is either explicitly given by the table:language and
table:country attributes, or the default locale when the language option
is not explicitly specified (current behavior).

Note: Treatment of decimal separators:The treatment of a decimal
separator is context-dependent, that is, when a decimal separator occurs
adjacent to one or two digit characters, it is considered a digit
character as long as it's the only occurrence in that given numeric
substring. In other words, a second occurrence of a decimal separator in
any numeric substring is treated as a non-digit character; therefore the
character immediately preceding the separator becomes the last character
of the numeric substring, while the separator itself becomes the first
character of the suffix substring.

Best regards

Michael

robert_weir@us.ibm.com wrote:
>
> Interesting idea.
>
> How far do we take it?
>
> For example do we allow multiple levels, as in:
>
> A1.1, A1.2, A1.10, ... , A19.1, A20.3, etc.
>
> -Rob
>
> Michael.Brauer@Sun.COM wrote on 01/19/2007 05:26:03 AM:
>
>  > Dear TC members,
>  >
>  > this is a proposal for a new attribute of the <table:sort> element:
>  >
>  > The attribute "table:natural-sort" specifies how string values are
> sorted.
>  > If the attribute's value is "true", string-prefixed numbers will be
> sorted
>  > in a "natural", number-aware way, i.e. A1, A2, A3, ... , A19, A20,
>  > instead of the normal, alpha-numeric behavior, i.e.
>  > A1, A10, A11, A12, ... ,A19, A2, A20, A3, A4, ... , A8, A9.
>  >
>  > <define name="table-sort-attlist" combine="interleave">
>  >      <optional>
>  >          <attribute name="table:natural-sort" a:defaultValue="false">
>  >              <ref name="boolean"/>
>  >          </attribute>
>  >      </optional>
>  > </define>
>  >
>  > Best regards
>  >
>  > Michael
>  > --
>  > Michael Brauer, Technical Architect Software Engineering
>  > StarOffice/OpenOffice.org
>  > Sun Microsystems GmbH             Nagelsweg 55
>  > D-20097 Hamburg, Germany          michael.brauer@sun.com
>  > http://sun.com/staroffice         +49 40 23646 500
>  > http://blogs.sun.com/GullFOSS
>  >

--
Michael Brauer, Technical Architect Software Engineering
StarOffice/OpenOffice.org
Sun Microsystems GmbH             Nagelsweg 55
D-20097 Hamburg, Germany          michael.brauer@sun.com
http://sun.com/staroffice         +49 40 23646 500
http://blogs.sun.com/GullFOSS

```

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]