office message

Subject: Re: [office] Proposal for Spreadsheets: New sort option "natural sort"

From: Michael Brauer - Sun Germany - ham02 - Hamburg <Michael.Brauer@Sun.COM>
To: David Faure <faure@kde.org>
Date: Mon, 05 Feb 2007 11:24:36 +0100

Hi David,

I think we have some kind of conflicting requirements here: You either 
want to be able to sort floating point numbers. You then need to 
interpret the decimal delimiter. Or you want to be able to sort version 
numbers. You then must not interpret the decimal delimiter.

What about resolving this conflict by having two options (or three, if 
the include the default character code based sorting) instead of one, 
"natural-integer" and "natural-float", where the first one sorts only 
integer values, while the 2nd one sorts floats?

Michael

David Faure wrote:
> On Friday 02 February 2007, Michael Brauer - Sun Germany - ham02 - Hamburg wrote:
>> Step 4. In this step, the numeric substring is determined by locating 
>> the first occurrence of a non-digit character after the first digit 
>> character; the substring from the first digit character through the 
>> character preceding the first non-digit is considered the numeric 
>> substring. This substring is then converted into a double-precision 
>> variable. This step is performed on both of the compared strings, and 
>> the converted values are compared by simple numeric comparison. If these 
>> values differ, then the result will be returned and the process will 
>> end. If they are equal to one another, then the process will proceed to 
>> the next step.
> 
> Here's a comment by Martin Pool, who implemented "natural sorting" in KDE.
> 
> "
> If I'm reading this correctly, that means that "1.3" > "1.20", in a
> locale where "." is the decimal separator.  In typical software version
> strings that's not correct, and that was the case I was originally
> trying to handle, and also apparently the case Robert Weir describes.
> Obviously sometimes sorting as floats is best but I suggest that when
> numbers are intermixed with non-digits the other algorithm is better.
> That is, to basically follow this algorithm but just treat the decimal
> separator as non-numeric.
> 
> Also, conversion to double might give unexpected results if there are
> very long runs of digits (barcodes?)  I'm not sure if that is a concern.
> 
> Also it seems rather odd that 1.2.3.4 will be sorted as (1.2, 3.4)... 
> "
> 


-- 
Michael Brauer, Technical Architect Software Engineering
StarOffice/OpenOffice.org
Sun Microsystems GmbH             Nagelsweg 55
D-20097 Hamburg, Germany          michael.brauer@sun.com
http://sun.com/staroffice         +49 40 23646 500
http://blogs.sun.com/GullFOSS

Follow-Ups:
- Re: [office] Proposal for Spreadsheets: New sort option "naturalsort"
  - From: "David A. Wheeler" <dwheeler@dwheeler.com>
- Re: [office] Proposal for Spreadsheets: New sort option "natural sort"
  - From: David Faure <faure@kde.org>

References:
- Re: [office] Proposal for Spreadsheets: New sort option "natural sort"
  - From: Michael Brauer - Sun Germany - ham02 - Hamburg <Michael.Brauer@Sun.COM>
- Re: [office] Proposal for Spreadsheets: New sort option "natural sort"
  - From: David Faure <faure@kde.org>