OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Extending Named Ranges: DataFrames


Hello list members,

One very useful feature in Spreadsheets are the named ranges. They 
permit abstraction from the physical columns/rows and so generate more 
structured spreadsheets - reducing therefore the error rate.

I already proposed one enhancement to improve the named ranges (using 
Namespaces), and I will propose now a 2nd enhancement going in a 
slightly different direction.


TOC
====
  1. USE CASE
  2. PROBLEM
  3. SOLUTION
  4. COMPETITIVE ANALYSIS
  5. IMPLEMENTATION DETAIL


USE CASE
=========
Lets imagine a typical use case: a user wants to compare 2 lists (or 
import some info from a 2nd list).

He most probably will use one of the LOOKUP() funcitons, e.g.
 = VLOOKUP(A1; 'other_sheet'.'some_range'; 2; FALSE)

Now, instead of the explicit 'other_sheet'.'some_range' it is wise to 
use a named range, e.g. 'my_range'.

The problem is, except for the most trivial of situations, you do not 
need the 2nd row in the range, but rather a very far one, e.g. there are 
12 months in a year, and intermediate calculations might get you to 
column 24 or 36, or something completely different.

PROBLEM
========
Selecting column number 'x' in a named range is counterintuitive, 
difficult to perform and error prone. The columns are identified as 'A', 
'B', ... and not as '1', '2', ... and even then, users will usually add 
a more accurate column identifier/name (e.g. 'Jan', 'Feb', ...).

SOLUTION
=========
Allow data frames.

A data frame is like a named range, but one is allowed to name the 
individual vectors inside the range, e.g. lets say 'my_range' has 12 
columns for the 12 months, then I would name every column in the data 
frame as: 'Jan', ... 'Dec'.

The formula would then become:
 = VLOOKUP(A1; 'my_range'; 'my_range$Dec'; FALSE)

So, instead of selecting '12', we would use the suitable name: 
'my_range$Dec', or, in this case - because it is obvious that the data 
frame is 'my_range', simply just 'Dec'.
 = VLOOKUP(A1; 'my_range'; 'Dec'; FALSE)


COMPETITIVE ANALYSIS
=====================
R (http://cran.R-project.org, and more broadly the S++ language) does 
have the data frame data type. This is one of the most useful data types.

[In R, one can always use the shortcut like 'Dec' directly by specifying 
the data frame with e.g. data='my_range' when passing the arguments to a 
function.]


IMPLEMENTATION DETAIL
======================
When defining a named range, it should be possible to specify IF the 
first row or column are descriptive. In this case:
  - a data frame is created
  - the actual data range shall exclude this row/column
  - the names of the individual vectors are derived from this row/column
  - vectors without a name will be assigned automatically a number


Knowing the enormous potential of data frames, I am looking forward to 
see data frames implemented in spreadsheets. Although the time is too 
short for the ODF 1.2 window, I hope that it will make it in ODF 2.0.

Sincerely,

Leonard Mada



=========================================
Some examples in R:

Commands:
my.frame<-data.frame(x=c(1,2,3), y=c(2,2,6))
my.frame

Result:
   x  y
1 1 2
2 2 2
3 3 6

Commands:
x.lm<-lm(x~y,data=my.frame)
x.lm

Result:
Call:
lm(formula = x ~ y, data = my.frame)

Coefficients:
(Intercept)            y 
      0.750        0.375
=============================================


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]