[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Integer Overflow and number Types in ODF
Dear TC-Members, while examining recently an OOo Calc issue, I came across a major ODF-shortcoming. A recent bug posted on the OOo Bugzila documenting a miscalculation in OOo Calc (as well as in MS Excel and probably other spreadsheet programs, see: http://www.openoffice.org/issues/show_bug.cgi?id=91358), prompted me to revisit old notes and articles on floating point calculations. BUG DESCRIPTION ================ Basically, the bug goes like this: =29513736*92842033 yields: 2740115251665290. This is plainly wrong. However, the user does NOT get any notification that an integer overflow did occur. I began therefore to investigate the ODF-specification and revisit some older articles. An excerpt from the ODF section 4.2 is listed below: > 4.2 Number > A number is simply a numeric value such as 0, -4.5, or $1000. Numbers > shall be able to represent fractional values (they shall not be > limited to only integers). The "number" type may be displayed in many > different formats, including date, time, percentage, and currency. > > Typical implementations implement numbers as 64-bit IEEE floating > point values and use the CPU's floating-point instructions where > available (so intermediate values may be represented using more than > 64 bits). However, implementations have great freedom in how they > implement Number, and may use representations with a fixed bit length > or a variable bit length. A cell with a constant numeric value has the > number type. The IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985) implements various formats for representing floating-point values. ODF seems to omit any clear specification in this regard, and also seems to imply that only one format should be used. For an application that deals primarily with numbers and computations and also is deeply rooted into the financial market, the previous paragraph is quite shallow and confuse and overlooks a lot of critical issues. While absolute constraints might not be indicated, because ODF-compliant applications might run on various platforms with different support for floating point-operations, the preceding paragraph is much too vague and basically ALL formulas (despite the TC's claim otherwise) have pretty undefined behaviour in the absence of a more precise definition of numbers and numerical operations. A revision to the IEEE 754 standard is going on (actually it was approved in June 2008), and there is also the IEEE 854 standard (IEEE Standard for Radix-Independent Floating-Point Arithmetic). Now, both these standards are quite old (1985). ODF should adopt bindingly some of the recommendations of these standards. Otherwise, this might end as the maiden flight of the Arianne 5 rocket: http://www.ima.umn.edu/~arnold/disasters/ariane.html or the failed Patriot-rocket: http://www.ima.umn.edu/~arnold/disasters/patriot.html To name but a few floating-point miscalculations. Especially, ODF should explicitly define: 1.) both binary floating point as well as integer (and optimally also decimal) numbers 2.) access to the floating-point exceptions 1.) INTEGERS / DECIMAL ==================== Financial calculations need a different precision (only 2-digits for most currencies when presenting the end results, although some intermediate operations might benefit from a far greater precision) and accuracy (the subtraction 0.3 - 0.2 - 0.1 is always 0). This was also the reason I insist on implementing type checking and *units* in spreadsheets (currency being one of these units and a unit that behaves slightly different than ordinary *binary*-floating point numbers). 2.) EXCEPTIONS ============= =29513736*92842033 generates an overflow during integer-operations. An unsuspected user has NO chance to detect that the resulting floating point number is plainly wrong. However, IF both numbers were implemented as 64-bit integers, then the result of the multiplication would generate an overflow error and could be easily trapped by the program to display a warning to the unsuspecting user that the float-result is inaccurate due to this overflow (integer -> float intrinsic conversion) 3.) While UNDERFLOW exceptions can be generally ignored (except when used subsequently in other operations like exponentiation), OVERFLOW-exceptions have most often some dire consequences. Therefore, I would like to have the ability as a user to detect specifically such overflows, especially on integer-operations. Therefore, I definitely support implementing integers as well as binary-floating point numbers in spreadsheets as separate number types. Any integer representable as an integer, shall be stored and used as an integer. Overflow shall be trapped. Operations that need/generate a floating point shall implicitly convert the integer to float (and, on specific occasions - warn the user about the loss in accuracy). [see P.S.] 4.) A spreadsheet used in particular numerical-analysis domains, might want to implement also radix-independent floating point arithmetic (aka decimal floating point, dFP). I would like to have the spreadsheet functions extended to dFP arithmetic (in an extended category). I do agree that - in the absence of hardware support - such functions will be slightly slower. However, a user that needs this functionality, will specifically access these extra dFP-functions (e.g. a DSUM) when he desires a more accurate result. [These functions would be an addition to those mentioned in a previous post dealing with fast - but less accurate - algorithms.] Engineers and mathematicians would welcome such a feature. For a lot more details, please visit Prof. Kahan's website: http://www.cs.berkeley.edu/~wkahan/ I especially recommend the following articles: 1.) How Futile are Mindless Assessments of Roundoff in Floating-Point Computation? http://www.cs.berkeley.edu/~wkahan/Mindless.pdf 2.) How JAVA's Floating-Point Hurts Everyone Everywhere http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf [Especially the sections on Exceptions, the *Prevalent Misconceptions about Floating-Point Arithmetic* and the *Extra Precision as a Way to Conserve Interest Rates’ Monotonicity* sections.] 3.) http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF It is worth reading the articles a number of times: I found new interesting details I missed during my first reading. I would like to end with one quote from the previous article - in a slightly different perspective: Existing spreadsheets "have withstood the Test of Time, not passed it." I hope that the ODF-committee will significantly improve the ODF-specification in order to address its many shortcomings in the fields of numeric calculations. Anticipating some specific criticism, let me state that standardising existing shortcomings is not a firm basis for a solid standard (this pseudo-quote is slightly adapted from a Monty Python quote). I am also looking forward to see support for the double-extended data type. Sincerely, Leonard P.S. ODF should explicitly permit and define numeric INTEGER types and operations.These would be done mostly abstracted from the user. 1.) Any integer that can be represented as an integer (i.e. NO overflow), shall be stored and handled as an integer. 2.) Any operation preserving integer type (addition, subtraction, multiplication, and exponentiation using integer exponents) shall preserve the integer type. If the result cannot be stored as an integer, it should be cast to a floating point number and the user shall be explicitly warned of the outcome. [NOTE: integer-exponent exponentiation shall preserve the integer type, even though the algorithm might be based on floating points. Therefore, a fp -> integer cast might be necessary with nearest neighbor rounding. This should work fine for small exponents that generate a result that can still be stored as an integer.] 3.) any operation not preserving integers (like most divisions, operations with non-integers, other exponentiation, ...), shall intrinsically convert the result to floating point numbers *without* an explicit user warning. (The "without" is debatable, but would be in the spirit of existing spreadsheets; for users needing more accurate results, the decimal floating point extension would offer that benefit.)
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]