OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Proposal for Spreadsheets: New sort option "natural sort" (updated)


Hi,

below is the revised proposal for the "natural-sort" attribute of the
<table:sort> element:

The attribute "table:embedded-number-behavior" specifies how string 
values that contain digits are sorted. If the attribute's value is 
"integer" or "float", string-prefixed numbers will be sorted
in a "natural", number-aware way, i.e. A1, A2, A3, ... , A19, A20,
instead of the normal, alpha-numeric behavior, i.e. A1, A10, A11, A12,
... ,A19, A2, A20, A3, A4, ... , A8, A9.

<define name="table-sort-attlist" combine="interleave">
       <optional>
          <attribute name="table:embedded-number-behavior"
                     a:defaultValue="alpha-numeric">
              <choice>
		<value>alpha-numeric</value>
                 <value>integer</value>
                 <value>double</value>
              </choice>
          </attribute>
       </optional>
</define>


The following illustrates how two strings shall be compared if the
attribute value is "integer" or "float".

Step 1. First of all, the two strings are compared by using the normal
string comparison algorithm to ensure that they are not equal. If they
are, the function will return immediately with equality.

Step 2. Next, each of the two strings is divided into three parts:
1.Prefix substring 2.Numeric substring 3.Suffix substring The prefix
substring is determined by locating the first occurrence of a digit
character; the substring from the very first character through the
character preceding the first digit is considered the prefix. Now, if
the first digit happens to be the first character of the whole string,
the prefix substring becomes empty. If there is no digit in either one
of the compared strings, the natural sort process will end and the
normal string comparison will be performed instead. The digit determined
herein is locale-aware, and therefore is not limited to ASCII digits. If
the attribute value is "float", a decimal separator is
considered a digit so that real numbers are supported if the appropriate
conditions are met (see "Note" below).

Step 3. After the prefix substring is extracted from both of the
compared strings, a normal string comparison is performed on the
extracted prefixes. If they differ, the result is returned and the
process will end. If they are equal, it will proceed to the next step of
numeric string comparison.

Step 4. In this step, the numeric substring is determined by locating
the first occurrence of a non-digit character after the first digit
character; the substring from the first digit character through the
character preceding the first non-digit is considered the numeric
substring. This substring is then converted into a double-precision
variable. This step is performed on both of the compared strings, and
the converted values are compared by simple numeric comparison. If these
values differ, then the result will be returned and the process will
end. If they are equal to one another, then the process will proceed to
the next step.

Step 5. After the numeric comparison returns equality, the suffix
substring, which is simply the rest of the string that occurs after the
last digit of the numeric substring, will be extracted. This suffix
substring will then replace the original string, and the whole process
will repeat (i.e. back to Step 1).

This sorting process is illustrated in the picture below. Note that the
term "normal string comparison" mentioned in the algorithm
description refers to a locale-specific string comparison; therefore the
term does not refer to a simple ASCII string comparison. This locale
setting is either explicitly given by the table:language and
table:country attributes, or the default locale when the language option
is not explicitly specified.

Note: Treatment of decimal separators: If the attribute value is
"integer", then a decimal separator is is not considered as a
digit. If the attribute value is "float", the treatment of a
decimal separator is context-dependent, that is, when a decimal
separator occurs adjacent to one or two digit characters, it is
considered a digit character as long as it's the only occurrence in that
given numeric substring. In other words, a second occurrence of a
decimal separator in any numeric substring is treated as a non-digit
character; therefore the character immediately preceding the separator
becomes the last character of the numeric substring, while the separator
itself becomes the first character of the suffix substring.


Best regards

Michael
-- 
Michael Brauer, Technical Architect Software Engineering
StarOffice/OpenOffice.org
Sun Microsystems GmbH             Nagelsweg 55
D-20097 Hamburg, Germany          michael.brauer@sun.com
http://sun.com/staroffice         +49 40 23646 500
http://blogs.sun.com/GullFOSS




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]