[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

*Subject*: **Re: [office-comment] DISTINCT Values**

*From*:**Leonard Mada <discoleo@gmx.net>***To*: office-comment@lists.oasis-open.org*Date*: Sat, 09 Jun 2007 22:39:15 +0300

Hi Patrick, Patrick Durusau wrote: > ... > > What we need are details. Use case scenarios are useful but only up to > a point. > > For example, you mention the R as factor () below. Maybe I was not able to clearly explain what I meant. In simple words, I wanted an *enhanced function corresponding* to the *pivot tables*. Well, maybe it is now easier to understand. Current implementations of pivot tables seem quite weak to me. And they are NOT functions. I therefore do want: - something more advanced - easily expandable / flexible - and defined as spreadsheet functions The function DISTINCT() was meant as the first step in this process. This would generate the groups of data / make the categories. Indeed, these categories would behave like factors (in R, and generally in statistics, these are called factors - respectively levels of a variable). Further functions should have followed, which would generate the various reports (these would imply extensive vector operations). Indeed, factors are extensively used in vector/matrix operations. > Recalling that OpenDocument is an *interchange* format, how do we deal > with the following issue? > >> Factors are currently implemented using an integer array to specify >> the actual levels and a second array of names that are mapped to the >> integers. Rather unfortunately users often make use of the >> implementation in order to make some calculations easier. This, >> however, is an implementation issue and is not guaranteed to hold in >> all implementations of R. (Section 2.3.1 Factors, R Definition Language) ## THIS IS A SIDE NOTE - the previous WARNING is irrelevant both to ODF and to R-users that stick to the S+ standard - for someone working with factors, it is irrelevant how factors are *INTERNALLY* stored in R - 'is.factor()' will ALWAYS return TRUE for a factor-object irrespective of its internal storage ('as.factor()' interprets something as a factor) - internally (in R), factors are currently stored in a way that uses integers - THIS data structure should however NEVER be known nor assumed by users, and therefore, it should NEVER be used (as open-source, of course you can get the details) - these are hidden methods (thats why you declare 'private' and 'protected' in C++ classes, to hide the implementation) - however, obviously, there are users who make use of this and even worse, perform mathematical calculations with factors (it makes NO sense to compare mathematically a level "A" with a level "B", or with an integer, BUT some do exactly that) ## END SIDE NOTE > We do not specify implementation details so it is possible for an "as > factor()" function to work differently depending upon implementation > details. > > Having a function defined by a standard work differently is a bad thing. ## SIDE NOTE - 'is.factor()' and 'as.factor()' WILL work as expected in R even in the future - users who interpret this result as an integer are affected, and I fully support this idea, they should have never supposed those factors to be stored as integers - *A factor may be purely nominal or may have ordered categories*!!! NO mention of integers. ## END SIDE NOTE CONCLUSIONS ============ Indeed, spreadsheets should have functions that perform assignment of some data into *categories*. DISTINCT() was supposed to do so. These categories would then behave like the described factors. Pivot Tables (aka Data Tables) do currently similar things, though I wanted something more advanced. And I wanted a function. Hope this explanation clarifies some of the issues. Sincerely, Leonard > I don't know whether that would actually change the result of a > function or not but it is an example of the level of detail that is > necessary to consider when defining a function in a standard. > > I suspect it would be possible to define "as factor()" such that it > had a standardized result and if someone allowed used based on > implementation details they would be non-conformant. I say that not > having looked at the details. And by details I do not mean use or test > cases but a formal definition of the function. > > I know the formula SC has a number of functions that still need some > work so maybe we need a rule that welcomes new function proposals but > grants priority to requests accompanied by work on functions already > accepted for standardization. > > David, what say you? > > Hope you are having a great weekend! > > Patrick

**References**:**DISTINCT Values***From:*Leonard Mada <discoleo@gmx.net>

**Re: [office-comment] DISTINCT Values***From:*Patrick Durusau <patrick@durusau.net>

**Re: [office-comment] DISTINCT Values***From:*Leonard Mada <discoleo@gmx.net>

**Re: [office-comment] DISTINCT Values***From:*Patrick Durusau <patrick@durusau.net>