office-comment message

Subject: Integer Overflow and number Types in ODF
From: Leonard Mada <discoleo@gmx.net>
To: office-comment@lists.oasis-open.org
Date: Sun, 06 Jul 2008 23:48:07 +0300
Dear TC-Members,

while examining recently an OOo Calc issue, I came across a major 
ODF-shortcoming.

A recent bug posted on the OOo Bugzila documenting a miscalculation in 
OOo Calc (as well as in MS Excel and probably other spreadsheet 
programs, see: http://www.openoffice.org/issues/show_bug.cgi?id=91358), 
prompted me to revisit old notes and articles on floating point 
calculations.

BUG DESCRIPTION
================
Basically, the bug goes like this:
=29513736*92842033 yields: 2740115251665290.
This is plainly wrong. However, the user does NOT get any notification 
that an integer overflow did occur.

I began therefore to investigate the ODF-specification and revisit some 
older articles. An excerpt from the ODF section 4.2 is listed below:

> 4.2 Number
> A number is simply a numeric value such as 0, -4.5, or $1000. Numbers 
> shall be able to represent fractional values (they shall not be 
> limited to only integers). The "number" type may be displayed in many 
> different formats, including date, time, percentage, and currency.
>
> Typical implementations implement numbers as 64-bit IEEE floating 
> point values and use the CPU's floating-point instructions where 
> available (so intermediate values may be represented using more than 
> 64 bits). However, implementations have great freedom in how they 
> implement Number, and may use representations with a fixed bit length 
> or a variable bit length. A cell with a constant numeric value has the 
> number type.

The IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Std 
754-1985) implements various formats for representing floating-point 
values. ODF seems to omit any clear specification in this regard, and 
also seems to imply that only one format should be used.

For an application that deals primarily with numbers and computations 
and also is deeply rooted into the financial market, the previous 
paragraph is quite shallow and confuse and overlooks a lot of critical 
issues. While absolute constraints might not be indicated, because 
ODF-compliant applications might run on various platforms with different 
support for floating point-operations, the preceding paragraph is much 
too vague and basically ALL formulas (despite the TC's claim otherwise) 
have pretty undefined behaviour in the absence of a more precise 
definition of numbers and numerical operations.

A revision to the IEEE 754 standard is going on (actually it was 
approved in June 2008), and there is also the IEEE 854 standard (IEEE 
Standard for Radix-Independent Floating-Point Arithmetic).

Now, both these standards are quite old (1985). ODF should adopt 
bindingly some of the recommendations of these standards.

Otherwise, this might end as the maiden flight of the Arianne 5 rocket:
http://www.ima.umn.edu/~arnold/disasters/ariane.html
or the failed Patriot-rocket:
http://www.ima.umn.edu/~arnold/disasters/patriot.html
To name but a few floating-point miscalculations.

Especially, ODF should explicitly define:
1.) both binary floating point as well as integer (and optimally also 
decimal) numbers
2.) access to the floating-point exceptions

1.) INTEGERS / DECIMAL
====================
Financial calculations need a different precision (only 2-digits for 
most currencies when presenting the end results, although some 
intermediate operations might benefit from a far greater precision) and 
accuracy (the subtraction 0.3 - 0.2 - 0.1 is always 0). This was also 
the reason I insist on implementing type checking and *units* in 
spreadsheets (currency being one of these units and a unit that behaves 
slightly different than ordinary *binary*-floating point numbers).

2.) EXCEPTIONS
=============
=29513736*92842033 generates an overflow during integer-operations. An 
unsuspected user has NO chance to detect that the resulting floating 
point number is plainly wrong. However, IF both numbers were implemented 
as 64-bit integers, then the result of the multiplication would generate 
an overflow error and could be easily trapped by the program to display 
a warning to the unsuspecting user that the float-result is inaccurate 
due to this overflow (integer -> float intrinsic conversion)

3.) While UNDERFLOW exceptions can be generally ignored (except when 
used subsequently in other operations like exponentiation), 
OVERFLOW-exceptions have most often some dire consequences.

Therefore, I would like to have the ability as a user to detect 
specifically such overflows, especially on integer-operations. 
Therefore, I definitely support implementing integers as well as 
binary-floating point numbers in spreadsheets as separate number types. 
Any integer representable as an integer, shall be stored and used as an 
integer. Overflow shall be trapped. Operations that need/generate a 
floating point shall implicitly convert the integer to float (and, on 
specific occasions - warn the user about the loss in accuracy). [see P.S.]

4.) A spreadsheet used in particular numerical-analysis domains, might 
want to implement also radix-independent floating point arithmetic (aka 
decimal floating point, dFP). I would like to have the spreadsheet 
functions extended to dFP arithmetic (in an extended category). I do 
agree that - in the absence of hardware support - such functions will be 
slightly slower. However, a user that needs this functionality, will 
specifically access these extra dFP-functions (e.g. a DSUM) when he 
desires a more accurate result. [These functions would be an addition to 
those mentioned in a previous post dealing with fast - but less accurate 
- algorithms.]

Engineers and mathematicians would welcome such a feature.

For a lot more details, please visit Prof. Kahan's website:
http://www.cs.berkeley.edu/~wkahan/

I especially recommend the following articles:

1.) How Futile are Mindless Assessments of Roundoff in Floating-Point 
Computation?
http://www.cs.berkeley.edu/~wkahan/Mindless.pdf

2.) How JAVA's Floating-Point Hurts Everyone Everywhere
http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf
[Especially the sections on Exceptions, the *Prevalent Misconceptions 
about Floating-Point Arithmetic* and the *Extra Precision as a Way to 
Conserve Interest Rates’ Monotonicity* sections.]

3.) http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF

It is worth reading the articles a number of times: I found new 
interesting details I missed during my first reading.

I would like to end with one quote from the previous article - in a 
slightly different perspective:
Existing spreadsheets "have withstood the Test of Time, not passed it."

I hope that the ODF-committee will significantly improve the 
ODF-specification in order to address its many shortcomings in the 
fields of numeric calculations.

Anticipating some specific criticism, let me state that standardising 
existing shortcomings is not a firm basis for a solid standard (this 
pseudo-quote is slightly adapted from a Monty Python quote).

I am also looking forward to see support for the double-extended data type.

Sincerely,

Leonard


P.S.
ODF should explicitly permit and define numeric INTEGER types and 
operations.These would be done mostly abstracted from the user.

1.) Any integer that can be represented as an integer (i.e. NO 
overflow), shall be stored and handled as an integer.

2.) Any operation preserving integer type (addition, subtraction, 
multiplication, and exponentiation using integer exponents) shall 
preserve the integer type. If the result cannot be stored as an integer, 
it should be cast to a floating point number and the user shall be 
explicitly warned of the outcome.
[NOTE: integer-exponent exponentiation shall preserve the integer type, 
even though the algorithm might be based on floating points. Therefore, 
a fp -> integer cast might be necessary with nearest neighbor rounding. 
This should work fine for small exponents that generate a result that 
can still be stored as an integer.]

3.) any operation not preserving integers (like most divisions, 
operations with non-integers, other exponentiation, ...), shall 
intrinsically convert the result to floating point numbers *without* an 
explicit user warning. (The "without" is debatable, but would be in the 
spirit of existing spreadsheets; for users needing more accurate 
results, the decimal floating point extension would offer that benefit.)
Follow-Ups:
- Re: [office-comment] Integer Overflow and number Types in ODF
  - From: "David A. Wheeler" <dwheeler@dwheeler.com>