[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [office-comment] DISTINCT Values
Hi Patrick,
Patrick Durusau wrote:
> ...
>
> What we need are details. Use case scenarios are useful but only up to
> a point.
>
> For example, you mention the R as factor () below.
Maybe I was not able to clearly explain what I meant.
In simple words, I wanted an *enhanced function corresponding* to the
*pivot tables*. Well, maybe it is now easier to understand. Current
implementations of pivot tables seem quite weak to me. And they are NOT
functions. I therefore do want:
- something more advanced
- easily expandable / flexible
- and defined as spreadsheet functions
The function DISTINCT() was meant as the first step in this process.
This would generate the groups of data / make the categories. Indeed,
these categories would behave like factors (in R, and generally in
statistics, these are called factors - respectively levels of a
variable). Further functions should have followed, which would generate
the various reports (these would imply extensive vector operations).
Indeed, factors are extensively used in vector/matrix operations.
> Recalling that OpenDocument is an *interchange* format, how do we deal
> with the following issue?
>
>> Factors are currently implemented using an integer array to specify
>> the actual levels and a second array of names that are mapped to the
>> integers. Rather unfortunately users often make use of the
>> implementation in order to make some calculations easier. This,
>> however, is an implementation issue and is not guaranteed to hold in
>> all implementations of R. (Section 2.3.1 Factors, R Definition Language)
## THIS IS A SIDE NOTE
- the previous WARNING is irrelevant both to ODF and to R-users that
stick to the S+ standard
- for someone working with factors, it is irrelevant how factors are
*INTERNALLY* stored in R
- 'is.factor()' will ALWAYS return TRUE for a factor-object
irrespective of its internal storage ('as.factor()' interprets
something as a factor)
- internally (in R), factors are currently stored in a way that uses
integers
- THIS data structure should however NEVER be known nor assumed by
users, and
therefore, it should NEVER be used (as open-source, of course you
can get the details)
- these are hidden methods
(thats why you declare 'private' and 'protected' in C++ classes,
to hide the implementation)
- however, obviously, there are users who make use of this
and even worse, perform mathematical calculations with factors
(it makes NO sense to compare mathematically a level "A" with a
level "B",
or with an integer, BUT some do exactly that)
## END SIDE NOTE
> We do not specify implementation details so it is possible for an "as
> factor()" function to work differently depending upon implementation
> details.
>
> Having a function defined by a standard work differently is a bad thing.
## SIDE NOTE
- 'is.factor()' and 'as.factor()' WILL work as expected in R even in
the future
- users who interpret this result as an integer are affected, and I
fully support this idea,
they should have never supposed those factors to be stored as integers
- *A factor may be purely nominal or may have ordered categories*!!!
NO mention of integers.
## END SIDE NOTE
CONCLUSIONS
============
Indeed, spreadsheets should have functions that perform assignment of
some data into *categories*. DISTINCT() was supposed to do so. These
categories would then behave like the described factors. Pivot Tables
(aka Data Tables) do currently similar things, though I wanted something
more advanced. And I wanted a function.
Hope this explanation clarifies some of the issues.
Sincerely,
Leonard
> I don't know whether that would actually change the result of a
> function or not but it is an example of the level of detail that is
> necessary to consider when defining a function in a standard.
>
> I suspect it would be possible to define "as factor()" such that it
> had a standardized result and if someone allowed used based on
> implementation details they would be non-conformant. I say that not
> having looked at the details. And by details I do not mean use or test
> cases but a formal definition of the function.
>
> I know the formula SC has a number of functions that still need some
> work so maybe we need a rule that welcomes new function proposals but
> grants priority to requests accompanied by work on functions already
> accepted for standardization.
>
> David, what say you?
>
> Hope you are having a great weekend!
>
> Patrick
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]