[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

*Subject*: **Re: Data Grid Size element proposal**

*From*:**Leonard Mada <discoleo@gmx.net>***To*: office-comment@lists.oasis-open.org, "Andreas J. Guelzow" <aguelzow@math.concordia.ab.ca>*Date*: Sat, 29 Nov 2008 03:09:25 +0200

Dear Andreas, I will follow 2 approaches to explain this again and add an insight into the future: 1.) Reductio ad absurdum If the application knows how big the spreadsheets is, then references outside this range are either errors or wrap arounds for names (due to poor design both for the spreadsheet and current ODF - but still can be handled because we know the size of the spreadsheet). Thus I do not need the MAX_GRID_SIZE. qed 2.) But even considering the poor ODF-design, the spreadsheet size is a far better parameter where to wrap around: obviously at the boundaries of the spreadsheet, not at an abstract MAX_GRID_SIZE. And I add my suggestion to raise an error if cells get referenced outside this range. A significant number of errors would be caught. This is an option that actually structures the spreadsheet. [Of course, it becomes necessary to define the actual spreadsheet size. Simply put: A1: up to the column and the row that contain some data or some formula, or an array formula extending up to there. Strictly speaking, an image outside this area is NOT extending the spreadsheet up to there. Neither does a reference to an empty cell outside this area extend the spreadsheet.] 3.) I do agree that more often than not, the application can determine the size of the spreadsheet. But I would not bet that this is well defined in every case. It is able to compute this size precisely and unmistakenly, only IF the spreadsheet was created in the same session and formulas were added sequentially. [In the above example, the spreadsheet size would go up to column IV, and therefore it would wrap around to cover IU1 and IV1, without any need to MAX_GRID_SIZE. If IV1 was undefined, than most certainly that would be an error, and I want it to be reported as such, not silently wrap around at an application defined limit. How often did anyone on this list need exactly 256 columns? I did never use or see a spreadsheet with exactly 256 columns! This reminds me of another issue.] This specified size becomes more relevant with array calculations, and I hope that spreadsheets move that way. Also, as of currently ODF does NOT support whole column / rows references - but this may well change in the future. Lets say, we write the following array formula inside cell A1: { = SOME_ARRAY_FUNCTION(B:B, C:C) } where SOME_ARRAY_FUNCTION will compute an array of same size as its input. If we had first the data within columns B and C, then that function would cover only as much data is in B and C, and would NOT make any sense beyond that range. However, IF a user opens this spreadsheet, it becomes almost impossible to determine the true size of the spreadsheet (while in this trivial example it may be trivial, a fairly complex spreadsheet is unlikely to yield a correct size). This is the reason why the true size of the spreadsheet needs to be tracked. [As I said, I hope to see more array functionality in the future, and tracking the spreadsheet size won't be trivial any more.] I still could create spreadsheets without having actually any data written. Lets say A and B are empty, and we write in column C: { = SOME_ARRAY_FUNCTION(A:A; B:B) } Obviously, the spreadsheet size is 1 row and 3 columns, because the range is delimited by cell C1. Writing the next formula will also give a completely defined state. If we wrote something within columns A or B, then the rows would be extended up to that point. If we inputted some data outside of columns A, B and C, then the spreadsheet would extend up to that cell. Therefore it is perfectly defined. Every new input will redefine the size of the spreadsheet, and the program can easily track this size. However, if the completed spreadsheet is re-opened, the size might be quite ambiguous and non-determinable (because we do not have the sequence the formulas were added). This is relevant especially in the following high-end scenario: a professional spreadsheet designer creates the logic for a new spreadsheet, without having actually any data written in the spreadsheet. He will use array formulas that will automatically adapt for every new data entry. Array formulas are used so that the end-users will just enter the data. They will never ever have to update formulas. [I hope to see this one day.] When the end user opens the final product, it might be very ambiguous what the size of the spreadsheet is. There are a lot of array formulas referencing each other, and every formula has an (as of yet) undefined size. Also, if opening 1 million rows brings a severe penalty, I would want that my application allocates only as much columns/rows as really needed. But I want it to be still able to open the relatively rare case with 1 million rows without iterating twice through the document. I hope this sheds more light on this issue. Sincerely, Leonard P.S. some rants are inline Andreas J. Guelzow wrote: > On 11/28/2008, "Leonard Mada" <discoleo@gmx.net> wrote: > > >> Dear TC Members, >> >> while reading the discussion, I have to express my dismay to the >> proposal and more broadly to the various spreadsheet concepts. It seems >> little has been learned from past mistakes. >> >> Warren Turkal wrote: >> [...] >> >>> This doesn't seem useful to me. If I never address anything in those >>> cells that are outside my range, I'd say it's pretty likely I don't >>> care about them, and just knowing the max row/col size of the saving >>> application doesn't tell you that. >>> > > > That's not simply not true: If I know the grid size of teh saving > application I can determine whether ever any cell outside the grid was > accessed or the "wrapping" feature was used. > > If I don't know the grid size I cannot do that. > Well, of course you can. If you know the size of the spreadsheet, than you can't access anything beyond those boundaries. [Even when accepting the flawed wrap-around concept. Names referencing something outside this range would be wrapped - but I hope this gets deprecated. But even in this case, spreadsheet size is a more robust way to handle wrapping.] >> [see http://lists.oasis-open.org/archives/office/200811/msg00138.html] >> >> I must strongly back up Warren on this one. >> >> Lets say it this way: suppose I have a spreadsheet program WITHOUT any >> grid-size limit (because it implements a very clever iterative mechanism >> - making it virtually infinitely wide). I create a 2 columns by 2 rows >> spreadsheet. BUT this one is virtually non-openable in any other >> program! >> > > Why would it not be openable? > Well, because GRID_SIZE = Infinity, and I wish to quote one of your posts: > Of course we could tell the user that "hey there are formulas that > appear to require 319 columns and 1231124 rows" after we have parsed the > whole file and all of its formulas. I believe it would be desirable to > tell the user that the sheet was created in a program with 400 columns > and 2000000 rows so that there might be references or content outside > the range before we start reading such a big file in its entirety. Therefore, I believe that every new Excel file will raise this message. Considering the widespread use of Excel, this is a bad omen for the alternative spreadsheets, at least from a UX-perspective. Users hate such annoying messages quite a lot. And IF you still open the file, then WHY the message in the first place? You could post it after opening the file anyway. Again, actual spreadsheet size is a far better approach: it gives the actual size, and so the application HAS to support this size in order to open the spreadsheet. >> This is again wrong logic. The application does NOT want to know the >> MAX_GRID_SIZE, BUT the *actual spreadsheet size*! >> > > The application can easily figure out the actual spread sheet size. It > cannot figure out the maximum size which may affect the evaluation of > formulas. > Do you often use memory references past the allocated memory when writing code for gnumeric? I am just wondering: is it a good practice? Should we follow this practice when coding spreadsheets? What is your recommendation? I hope the questions can be answered by anyone on this list. As far as I know, everyone on this list has at least some coding experience. For a more constructive discussion on spreadsheet size, see the top of this post. I presented evidence that formulas should be evaluated only within the defined spreadsheet size. Anything referencing a cell outside this size should raise an error. Wrapping (while bad), would be excellently covered by spreadsheet size, as it defines a *sensible* wrap-region, not an arbitrary one. And I gave evidence that spreadsheet size might not be trivial to determine, especially in the future when using array formulas. >> So IF the TC is clever it will follow Apple's approach, and limit the >> spreadsheet size to its actual size, and not limit the application. >> > > Nobody is suggesting to limit any application. Where do you get that idea? > > Andreas Guelzow >

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]