[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Extending Named Ranges: DataFrames
Hello list members, One very useful feature in Spreadsheets are the named ranges. They permit abstraction from the physical columns/rows and so generate more structured spreadsheets - reducing therefore the error rate. I already proposed one enhancement to improve the named ranges (using Namespaces), and I will propose now a 2nd enhancement going in a slightly different direction. TOC ==== 1. USE CASE 2. PROBLEM 3. SOLUTION 4. COMPETITIVE ANALYSIS 5. IMPLEMENTATION DETAIL USE CASE ========= Lets imagine a typical use case: a user wants to compare 2 lists (or import some info from a 2nd list). He most probably will use one of the LOOKUP() funcitons, e.g. = VLOOKUP(A1; 'other_sheet'.'some_range'; 2; FALSE) Now, instead of the explicit 'other_sheet'.'some_range' it is wise to use a named range, e.g. 'my_range'. The problem is, except for the most trivial of situations, you do not need the 2nd row in the range, but rather a very far one, e.g. there are 12 months in a year, and intermediate calculations might get you to column 24 or 36, or something completely different. PROBLEM ======== Selecting column number 'x' in a named range is counterintuitive, difficult to perform and error prone. The columns are identified as 'A', 'B', ... and not as '1', '2', ... and even then, users will usually add a more accurate column identifier/name (e.g. 'Jan', 'Feb', ...). SOLUTION ========= Allow data frames. A data frame is like a named range, but one is allowed to name the individual vectors inside the range, e.g. lets say 'my_range' has 12 columns for the 12 months, then I would name every column in the data frame as: 'Jan', ... 'Dec'. The formula would then become: = VLOOKUP(A1; 'my_range'; 'my_range$Dec'; FALSE) So, instead of selecting '12', we would use the suitable name: 'my_range$Dec', or, in this case - because it is obvious that the data frame is 'my_range', simply just 'Dec'. = VLOOKUP(A1; 'my_range'; 'Dec'; FALSE) COMPETITIVE ANALYSIS ===================== R (http://cran.R-project.org, and more broadly the S++ language) does have the data frame data type. This is one of the most useful data types. [In R, one can always use the shortcut like 'Dec' directly by specifying the data frame with e.g. data='my_range' when passing the arguments to a function.] IMPLEMENTATION DETAIL ====================== When defining a named range, it should be possible to specify IF the first row or column are descriptive. In this case: - a data frame is created - the actual data range shall exclude this row/column - the names of the individual vectors are derived from this row/column - vectors without a name will be assigned automatically a number Knowing the enormous potential of data frames, I am looking forward to see data frames implemented in spreadsheets. Although the time is too short for the ODF 1.2 window, I hope that it will make it in ODF 2.0. Sincerely, Leonard Mada ========================================= Some examples in R: Commands: my.frame<-data.frame(x=c(1,2,3), y=c(2,2,6)) my.frame Result: x y 1 1 2 2 2 2 3 3 6 Commands: x.lm<-lm(x~y,data=my.frame) x.lm Result: Call: lm(formula = x ~ y, data = my.frame) Coefficients: (Intercept) y 0.750 0.375 =============================================
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]