[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [office] Proposal for Spreadsheets: New sort option "natural sort"
Hi, The following illustrates how two strings shall be compared under the natural sort algorithm. Step 1. First of all, the two strings are compared by using the normal string comparison algorithm to ensure that they are not equal. If they are, the function will return immediately with equality. Step 2. Next, each of the two strings is divided into three parts: 1.Prefix substring 2.Numeric substring 3.Suffix substring The prefix substring is determined by locating the first occurrence of a digit character; the substring from the very first character through the character preceding the first digit is considered the prefix. Now, if the first digit happens to be the first character of the whole string, the prefix substring becomes empty. If there is no digit in either one of the compared strings, the natural sort process will end and the normal string comparison will be performed instead. The digit determined herein is locale-aware, and therefore is not limited to ASCII digits. A decimal separator may also be considered a digit so that real numbers can also be supported if the appropriate conditions are met (see "Note" below). Step 3. After the prefix substring is extracted from both of the compared strings, a normal string comparison is performed on the extracted prefixes. If they differ, the result is returned and the process will end. If they are equal, it will proceed to the next step of numeric string comparison. Step 4. In this step, the numeric substring is determined by locating the first occurrence of a non-digit character after the first digit character; the substring from the first digit character through the character preceding the first non-digit is considered the numeric substring. This substring is then converted into a double-precision variable. This step is performed on both of the compared strings, and the converted values are compared by simple numeric comparison. If these values differ, then the result will be returned and the process will end. If they are equal to one another, then the process will proceed to the next step. Step 5. After the numeric comparison returns equality, the suffix substring, which is simply the rest of the string that occurs after the last digit of the numeric substring, will be extracted. This suffix substring will then replace the original string, and the whole process will repeat (i.e. back to Step 1). This sorting process is illustrated in the picture below. Note that the term "normal string comparison" repeatedly mentioned in the algorithm description refers to a locale-specific string comparison; therefore the term does not refer to a simple ASCII string comparison. This locale setting is either explicitly given by the table:language and table:country attributes, or the default locale when the language option is not explicitly specified (current behavior). Note: Treatment of decimal separators:The treatment of a decimal separator is context-dependent, that is, when a decimal separator occurs adjacent to one or two digit characters, it is considered a digit character as long as it's the only occurrence in that given numeric substring. In other words, a second occurrence of a decimal separator in any numeric substring is treated as a non-digit character; therefore the character immediately preceding the separator becomes the last character of the numeric substring, while the separator itself becomes the first character of the suffix substring. Best regards Michael robert_weir@us.ibm.com wrote: > > Interesting idea. > > How far do we take it? > > For example do we allow multiple levels, as in: > > A1.1, A1.2, A1.10, ... , A19.1, A20.3, etc. > > -Rob > > Michael.Brauer@Sun.COM wrote on 01/19/2007 05:26:03 AM: > > > Dear TC members, > > > > this is a proposal for a new attribute of the <table:sort> element: > > > > The attribute "table:natural-sort" specifies how string values are > sorted. > > If the attribute's value is "true", string-prefixed numbers will be > sorted > > in a "natural", number-aware way, i.e. A1, A2, A3, ... , A19, A20, > > instead of the normal, alpha-numeric behavior, i.e. > > A1, A10, A11, A12, ... ,A19, A2, A20, A3, A4, ... , A8, A9. > > > > <define name="table-sort-attlist" combine="interleave"> > > <optional> > > <attribute name="table:natural-sort" a:defaultValue="false"> > > <ref name="boolean"/> > > </attribute> > > </optional> > > </define> > > > > Best regards > > > > Michael > > -- > > Michael Brauer, Technical Architect Software Engineering > > StarOffice/OpenOffice.org > > Sun Microsystems GmbH Nagelsweg 55 > > D-20097 Hamburg, Germany michael.brauer@sun.com > > http://sun.com/staroffice +49 40 23646 500 > > http://blogs.sun.com/GullFOSS > > -- Michael Brauer, Technical Architect Software Engineering StarOffice/OpenOffice.org Sun Microsystems GmbH Nagelsweg 55 D-20097 Hamburg, Germany michael.brauer@sun.com http://sun.com/staroffice +49 40 23646 500 http://blogs.sun.com/GullFOSS
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]