office-comment message

Subject: Re: [office-comment] Integer Overflow and number Types in ODF
From: Leonard Mada <discoleo@gmx.net>
To: office-comment@lists.oasis-open.org
Date: Mon, 07 Jul 2008 20:50:26 +0300
Hello David, Hello everyone,

I tend to disagree very strongly - but I have strong arguments, too. Old 
spreadsheet models worked *maybe* fine in the last 20 years (more on 
this later, see ** comment). Lets try to analyse however the present and 
the future:

There are 2 big companies on this TC (actually more than 2):
Sun Microsystems and IBM (and Google, and ...)

Lets ask their representatives a simple question:
What are the revenues of your company?

Taking a look at Jonathan's blog, I anticipate a revenue of ~10 billions 
in 2007 (fiscal years are slightly differently defined, but lets ignore 
this fact).

Now, for storing a number of 10 billions, you need 10 digits. Being 
accurate to cent level, you need an additional 2 digits. This makes a 
total of 12 digits. Damn close to the accuracy limit of 15 digits.

IF by chance, this number is multiplied with 1000 (for a currency 
conversion, or any conversion, where the user first uses a 
multiplication), you loose accuracy, and you can loose it pretty fast. 
[* see  comment]

Now, lets look at this issue into some other perspective:

1.) some companies operate revenues 1, 2 and even >3 orders of magnitude 
bigger than the one mentioned (think of financial institutions - a big 
bank, the stock exchange, government, ...)

2.) some currencies are weaker than the dollar, maybe 10 times, or even 
1,000 times weaker, so you end with values significantly bigger than the 
ones mentioned

3.) precision needs to be sometimes more than 2 decimal places, e.g. 
when working with exchange rates

A precision of 15-digits is inappropriate in all these instance. 
Auditors want every cent to be accounted for! Yet, the accuracy loss 
would be even in the thousands of dollars, and sometimes even higher. 
Added over time, the complete catastrophe.

The problem is amplified by other factors as well. Spreadsheets become 
larger and more complex. I recently experienced the limits of OOo Calc. 
I was absolutely sure that the row limit did increase to > 65535, yet I 
came across a document that had more rows (and it did confirm me that 
OOo Calc still doesn't support more rows). The real issue is: *I 
stumbled upon a large spreadsheet!*

This wasn't supposedly an issue 20 years back. This is an issue today. 
Errors add up, and sometimes pretty fast. Adding up 70,000 rows will 
generate sometimes large numbers. Intermediate results might get easily 
broken.

Did Lotus 1-2-3 support 70,000 rows? Did it support 200,000 rows? Did it 
compute a regression on 1,000,000 rows? Did it apply complex models to 
forecast the needs in the health care system over the next 10 years?

I doubt it.

Now, health care is something I have knowledge about. The health 
expenditures in the US for 2006 were $ 2,105.5 billions, that is $ 2.1 
trillion - this is 13 digits long (ignoring any decimals). Working with 
such numbers is a real challenge for every 15-digit arithmetic. ***

For the projections in the US during 2007-2017, please see:
http://www.cms.hhs.gov/NationalHealthExpendData/03_NationalHealthAccountsProjected.asp#TopOfPage
http://www.cms.hhs.gov/NationalHealthExpendData/Downloads/proj2007.pdf

I hope therefore that spreadsheets *urgently* adapt to this situation.

I worked up to last year for a national (governmental) institution. 
During that time, a custom (proprietary) solution was implemented (worth 
some 6-8 million dollars), yet - although everything was covered in the 
program - some data (for very specific analysis) did still get exported 
to a spreadsheet. Fate or irony?

So, in the end, unwillingly, you end with the spreadsheet limitations. 
Most of us do NOT even anticipate the places where a spreadsheet might 
get used. Although their use in professional settings is dwindling, some 
unfortunate result might still get computed using a spreadsheet.

Sincerely,

Leonard


* Conversion factors might be larger than 1, but even factors around 1 
or smaller than one might easily generate an overflow, because during a 
conversion, users might easily first multiply the result with a large 
number, thereby generating the integer overflow, and later dividing it 
to normalize the result.

** emphasis during spreadsheet auditing was on "user-generated" errors, 
i.e. formula errors and data errors (although most errors still stem 
from spreadsheet limitations ;) ); I did not verify the accuracy of the 
computer arithmetic calculations. Therefore, there is NO proof 
whatsoever, that spreadsheets were fine during the past 20 years. If 
more carefully examined, I suspect a lot more errors to surface.

*** please note, such results get often created by adding up a lot of 
tiny values: consider e.g. the stock exchange: every transaction is of 
limited value, but adding them all up over a trade day, will generate a 
BIG SUM; and stock exchanges might want to have a lot of decimals, too.



David A. Wheeler wrote:
> Leonard Mada:
>   
>> while examining recently an OOo Calc issue, I came across a major 
>> ODF-shortcoming.
>>     
>
> This isn't major.  For nearly all uses, this is irrelevant, and it
> would be foolish for a user to depend on these kinds of precisions
> when exchanging documents among spreadsheets because NO
> spreadsheet implements what you're requesting.
> In engineering, you're lucky to have two significant digits.
> In budgeting, I know of no one that uses spreadsheets for
> such a truly large range and actually cares; generally they
> get rounded to "human-readable" results and again these
> extreme precisions are unneeded.
>
> I strongly encourage experimentation and improvement;
> innovation is a wonderful thing.  But this is a standards committee.
> We are defining standards for what already exists.  We want to know
> about innovations so that we can avoid accidentally forbidding them,
> but standards bodies are simply not the right forum to add
> brand-new untested requirements to an established market.
> Create a spreadsheet implementation with these great new
> precisions, and if enough people use it, perhaps we could
> define a niche extension for that group.
>
>   
>> A recent bug posted on the OOo Bugzila documenting a miscalculation in 
>> OOo Calc (as well as in MS Excel and probably other spreadsheet 
>> programs, see: http://www.openoffice.org/issues/show_bug.cgi?id=91358), 
>> prompted me to revisit old notes and articles on floating point 
>> calculations.
>>
>> BUG DESCRIPTION
>> ================
>> Basically, the bug goes like this:
>> =29513736*92842033 yields: 2740115251665290.
>> This is plainly wrong.
>>     
>
> No, that's not a bug.  That is obviously acceptable behavior,
> and the spec even makes it clear that this is acceptable.
>
> All existing spreadsheet implementations use 32 or 64 bit
> binary floating-point numbers; in certain cases some MAY
> use another (80-bit) but predicting when that occurs is
> quite tricky.  An implementation is free to
> do better, and we've carefully spec'ed it to permit better,
> but it would be unwise to expect better.
>
> The specification does not forbid implementing exact integers,
> so the specification does not limit future improvements.
>
>   
>> However, the user does NOT get any notification 
>> that an integer overflow did occur.
>>     
>
> That is by design.  Users of scientific calculators don't get any
> such notifications either, but most scientific calculators
> do integer overflow too.
>
>   
>> The IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Std 
>> 754-1985) implements various formats for representing floating-point 
>> values. ODF seems to omit any clear specification in this regard, and 
>> also seems to imply that only one format should be used.
>>     
>
> If you can point to a place where it _requires_ one format, let me know;
> that should be removed.
>
> We can't reasonably reference the binary format; at no time do we
> use a binary format.  And I don't think we can actually mandate the
> exact IEEE rules for floating-point operations, either.
> IIRC, Intel chips cannot follow the rules exactly
> without a severe performance penalty, so I believe all actual
> spreadsheet implementations intentionally do NOT follow them.
> So there is nothing left to mandate.
>
> There is no point in creating a beautiful specification that will not
> be used.
>
> --- David A. Wheeler
>
>
Follow-Ups:
- Re: [office-comment] Integer Overflow and number Types in ODF
  - From: Sheldon Britton <sab_orion@verizon.net>
References:
- Integer Overflow and number Types in ODF
  - From: Leonard Mada <discoleo@gmx.net>
- Re: [office-comment] Integer Overflow and number Types in ODF
  - From: "David A. Wheeler" <dwheeler@dwheeler.com>