[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: Data Grid Size element proposal
Dear Andreas,
I will follow 2 approaches to explain this again and add an insight into
the future:
1.) Reductio ad absurdum
If the application knows how big the spreadsheets is, then references
outside this range are either errors or wrap arounds for names (due to
poor design both for the spreadsheet and current ODF - but still can be
handled because we know the size of the spreadsheet). Thus I do not need
the MAX_GRID_SIZE.
qed
2.) But even considering the poor ODF-design, the spreadsheet size is a
far better parameter where to wrap around: obviously at the boundaries
of the spreadsheet, not at an abstract MAX_GRID_SIZE. And I add my
suggestion to raise an error if cells get referenced outside this range.
A significant number of errors would be caught. This is an option that
actually structures the spreadsheet.
[Of course, it becomes necessary to define the actual spreadsheet size.
Simply put: A1: up to the column and the row that contain some data or
some formula, or an array formula extending up to there. Strictly
speaking, an image outside this area is NOT extending the spreadsheet up
to there. Neither does a reference to an empty cell outside this area
extend the spreadsheet.]
3.) I do agree that more often than not, the application can determine
the size of the spreadsheet. But I would not bet that this is well
defined in every case. It is able to compute this size precisely and
unmistakenly, only IF the spreadsheet was created in the same session
and formulas were added sequentially. [In the above example, the
spreadsheet size would go up to column IV, and therefore it would wrap
around to cover IU1 and IV1, without any need to MAX_GRID_SIZE. If IV1
was undefined, than most certainly that would be an error, and I want it
to be reported as such, not silently wrap around at an application
defined limit. How often did anyone on this list need exactly 256
columns? I did never use or see a spreadsheet with exactly 256 columns!
This reminds me of another issue.]
This specified size becomes more relevant with array calculations, and I
hope that spreadsheets move that way. Also, as of currently ODF does NOT
support whole column / rows references - but this may well change in the
future.
Lets say, we write the following array formula inside cell A1:
{ = SOME_ARRAY_FUNCTION(B:B, C:C) }
where SOME_ARRAY_FUNCTION will compute an array of same size as its input.
If we had first the data within columns B and C, then that function
would cover only as much data is in B and C, and would NOT make any
sense beyond that range. However, IF a user opens this spreadsheet, it
becomes almost impossible to determine the true size of the spreadsheet
(while in this trivial example it may be trivial, a fairly complex
spreadsheet is unlikely to yield a correct size). This is the reason why
the true size of the spreadsheet needs to be tracked. [As I said, I hope
to see more array functionality in the future, and tracking the
spreadsheet size won't be trivial any more.]
I still could create spreadsheets without having actually any data written.
Lets say A and B are empty, and we write in column C: { =
SOME_ARRAY_FUNCTION(A:A; B:B) }
Obviously, the spreadsheet size is 1 row and 3 columns, because the
range is delimited by cell C1. Writing the next formula will also give a
completely defined state. If we wrote something within columns A or B,
then the rows would be extended up to that point. If we inputted some
data outside of columns A, B and C, then the spreadsheet would extend up
to that cell. Therefore it is perfectly defined. Every new input will
redefine the size of the spreadsheet, and the program can easily track
this size.
However, if the completed spreadsheet is re-opened, the size might be
quite ambiguous and non-determinable (because we do not have the
sequence the formulas were added).
This is relevant especially in the following high-end scenario: a
professional spreadsheet designer creates the logic for a new
spreadsheet, without having actually any data written in the
spreadsheet. He will use array formulas that will automatically adapt
for every new data entry. Array formulas are used so that the end-users
will just enter the data. They will never ever have to update formulas.
[I hope to see this one day.]
When the end user opens the final product, it might be very ambiguous
what the size of the spreadsheet is. There are a lot of array formulas
referencing each other, and every formula has an (as of yet) undefined size.
Also, if opening 1 million rows brings a severe penalty, I would want
that my application allocates only as much columns/rows as really
needed. But I want it to be still able to open the relatively rare case
with 1 million rows without iterating twice through the document.
I hope this sheds more light on this issue.
Sincerely,
Leonard
P.S. some rants are inline
Andreas J. Guelzow wrote:
> On 11/28/2008, "Leonard Mada" <discoleo@gmx.net> wrote:
>
>
>> Dear TC Members,
>>
>> while reading the discussion, I have to express my dismay to the
>> proposal and more broadly to the various spreadsheet concepts. It seems
>> little has been learned from past mistakes.
>>
>> Warren Turkal wrote:
>> [...]
>>
>>> This doesn't seem useful to me. If I never address anything in those
>>> cells that are outside my range, I'd say it's pretty likely I don't
>>> care about them, and just knowing the max row/col size of the saving
>>> application doesn't tell you that.
>>>
>
>
> That's not simply not true: If I know the grid size of teh saving
> application I can determine whether ever any cell outside the grid was
> accessed or the "wrapping" feature was used.
>
> If I don't know the grid size I cannot do that.
>
Well, of course you can. If you know the size of the spreadsheet, than
you can't access anything beyond those boundaries. [Even when accepting
the flawed wrap-around concept. Names referencing something outside this
range would be wrapped - but I hope this gets deprecated. But even in
this case, spreadsheet size is a more robust way to handle wrapping.]
>> [see http://lists.oasis-open.org/archives/office/200811/msg00138.html]
>>
>> I must strongly back up Warren on this one.
>>
>> Lets say it this way: suppose I have a spreadsheet program WITHOUT any
>> grid-size limit (because it implements a very clever iterative mechanism
>> - making it virtually infinitely wide). I create a 2 columns by 2 rows
>> spreadsheet. BUT this one is virtually non-openable in any other
>> program!
>>
>
> Why would it not be openable?
>
Well, because GRID_SIZE = Infinity, and I wish to quote one of your posts:
> Of course we could tell the user that "hey there are formulas that
> appear to require 319 columns and 1231124 rows" after we have parsed the
> whole file and all of its formulas. I believe it would be desirable to
> tell the user that the sheet was created in a program with 400 columns
> and 2000000 rows so that there might be references or content outside
> the range before we start reading such a big file in its entirety.
Therefore, I believe that every new Excel file will raise this message.
Considering the widespread use of Excel, this is a bad omen for the
alternative spreadsheets, at least from a UX-perspective. Users hate
such annoying messages quite a lot.
And IF you still open the file, then WHY the message in the first place?
You could post it after opening the file anyway. Again, actual
spreadsheet size is a far better approach: it gives the actual size, and
so the application HAS to support this size in order to open the
spreadsheet.
>> This is again wrong logic. The application does NOT want to know the
>> MAX_GRID_SIZE, BUT the *actual spreadsheet size*!
>>
>
> The application can easily figure out the actual spread sheet size. It
> cannot figure out the maximum size which may affect the evaluation of
> formulas.
>
Do you often use memory references past the allocated memory when
writing code for gnumeric?
I am just wondering: is it a good practice?
Should we follow this practice when coding spreadsheets? What is your
recommendation?
I hope the questions can be answered by anyone on this list. As far as I
know, everyone on this list has at least some coding experience.
For a more constructive discussion on spreadsheet size, see the top of
this post. I presented evidence that formulas should be evaluated only
within the defined spreadsheet size. Anything referencing a cell outside
this size should raise an error. Wrapping (while bad), would be
excellently covered by spreadsheet size, as it defines a *sensible*
wrap-region, not an arbitrary one. And I gave evidence that spreadsheet
size might not be trivial to determine, especially in the future when
using array formulas.
>> So IF the TC is clever it will follow Apple's approach, and limit the
>> spreadsheet size to its actual size, and not limit the application.
>>
>
> Nobody is suggesting to limit any application. Where do you get that idea?
>
> Andreas Guelzow
>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]