OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [office-comment] Extending Named Ranges: DataFrames


Hello everyone,

I just noticed a small glitch.

The language used in R is S+ (not S++). I got astray with all those X++ 
languages. ;)

Now, I would like to point to some additional benefits when using data 
frames.

a.) it is easier to use an informative name, instead of an ambiguous 
number, e.g. within the different LOOKUP() functions

b.) abstracting the row/column enhances the structure of the spreadsheet 
and reduces the error rate

c.) *automatic update of the data frame*
When inserting a new column/row / deleting a column/row before the used 
one, every explicit formula needs to be updated, which is very error 
prone. However, the data frame will automatically adapt to this new 
data, as the column/row header is automatically shifted and so to 
continue my previous example, 'my_range$Dec' will always point to the 
column named 'Dec', even IF this column is shifted to the left or right. 
No user action is needed to correct this error.

All this features strongly favour the use of data frames and these are 
the reasons why data frames are preferred in R.

Sincerely,

Leonard


Leonard Mada wrote:
> Hello list members,
>
> One very useful feature in Spreadsheets are the named ranges. They 
> permit abstraction from the physical columns/rows and so generate more 
> structured spreadsheets - reducing therefore the error rate.
>
> I already proposed one enhancement to improve the named ranges (using 
> Namespaces), and I will propose now a 2nd enhancement going in a 
> slightly different direction.
>
>
> TOC
> ====
>  1. USE CASE
>  2. PROBLEM
>  3. SOLUTION
>  4. COMPETITIVE ANALYSIS
>  5. IMPLEMENTATION DETAIL
>
>
> USE CASE
> =========
> Lets imagine a typical use case: a user wants to compare 2 lists (or 
> import some info from a 2nd list).
>
> He most probably will use one of the LOOKUP() funcitons, e.g.
> = VLOOKUP(A1; 'other_sheet'.'some_range'; 2; FALSE)
>
> Now, instead of the explicit 'other_sheet'.'some_range' it is wise to 
> use a named range, e.g. 'my_range'.
>
> The problem is, except for the most trivial of situations, you do not 
> need the 2nd row in the range, but rather a very far one, e.g. there 
> are 12 months in a year, and intermediate calculations might get you 
> to column 24 or 36, or something completely different.
>
> PROBLEM
> ========
> Selecting column number 'x' in a named range is counterintuitive, 
> difficult to perform and error prone. The columns are identified as 
> 'A', 'B', ... and not as '1', '2', ... and even then, users will 
> usually add a more accurate column identifier/name (e.g. 'Jan', 'Feb', 
> ...).
>
> SOLUTION
> =========
> Allow data frames.
>
> A data frame is like a named range, but one is allowed to name the 
> individual vectors inside the range, e.g. lets say 'my_range' has 12 
> columns for the 12 months, then I would name every column in the data 
> frame as: 'Jan', ... 'Dec'.
>
> The formula would then become:
> = VLOOKUP(A1; 'my_range'; 'my_range$Dec'; FALSE)
>
> So, instead of selecting '12', we would use the suitable name: 
> 'my_range$Dec', or, in this case - because it is obvious that the data 
> frame is 'my_range', simply just 'Dec'.
> = VLOOKUP(A1; 'my_range'; 'Dec'; FALSE)
>
>
> COMPETITIVE ANALYSIS
> =====================
> R (http://cran.R-project.org, and more broadly the S++ language) does 
> have the data frame data type. This is one of the most useful data types.
>
> [In R, one can always use the shortcut like 'Dec' directly by 
> specifying the data frame with e.g. data='my_range' when passing the 
> arguments to a function.]
>
>
> IMPLEMENTATION DETAIL
> ======================
> When defining a named range, it should be possible to specify IF the 
> first row or column are descriptive. In this case:
>  - a data frame is created
>  - the actual data range shall exclude this row/column
>  - the names of the individual vectors are derived from this row/column
>  - vectors without a name will be assigned automatically a number
>
>
> Knowing the enormous potential of data frames, I am looking forward to 
> see data frames implemented in spreadsheets. Although the time is too 
> short for the ODF 1.2 window, I hope that it will make it in ODF 2.0.
>
> Sincerely,
>
> Leonard Mada
>
>
>
> =========================================
> Some examples in R:
>
> Commands:
> my.frame<-data.frame(x=c(1,2,3), y=c(2,2,6))
> my.frame
>
> Result:
>   x  y
> 1 1 2
> 2 2 2
> 3 3 6
>
> Commands:
> x.lm<-lm(x~y,data=my.frame)
> x.lm
>
> Result:
> Call:
> lm(formula = x ~ y, data = my.frame)
>
> Coefficients:
> (Intercept)            y      0.750        0.375
> =============================================
>



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]