[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

*Subject*: **Re: [office-comment] Extending LOOKUP() Functions**

*From*:**Leonard Mada <discoleo@gmx.net>***To*: office-comment@lists.oasis-open.org*Date*: Wed, 13 Aug 2008 22:53:26 +0300

Hello Eike, hello everyone, There is a small, *but essential difference* [although my first post might have not captured this clearly]: Eike Rathke wrote: >> ... >> Instead of repeating the formula numerous times, it makes sense to use >> this formula just once and apply it as an array formula: >> {= VLOOKUP( '_array_to_be_searched_' ; 'my_range'; 2; FALSE) } >> > > This is not specific to any [HV]LOOKUP function and already covered. See > section 3.2 Non-Scalar Evaluation (aka 'Array expressions') subsection > 2.2) Calculations with non-scalar inputs are a generalization of (2.1). > | When evaluating a formula in 'matrix' mode, and a non-scalar value is > | passed to a function argument that expects a scalar, the function is > | evaluated multiple times, Now, this is the culprit: evaluating *multiple times*. My idea is to make the [HV]LOOKUP() functions more efficient. My interpretation of that paragraph is as follows: the corresponding function will be called again and again for every value in the first range. [This is how I interpret existing array calculations.] Of course, a smart caching algorithm might speed things up, but a great burden is put on the implementer to do it right, and it still won't be perfect. On the other hand, a "specific array function" can be optimized much better. It is a single function working on the array. The other method is NOT a single function and is a spreadsheet "artifact". [This might have important implications like collateral effects and the like: I imagine that there is NO requirement that a classical array function needs to compute at once, or in the particular order of the first array. So, computing the value for VLOOKUP( A10; ...) first, then VLOOKUP(A3; ...) then VLOOKUP(A22; ...) is perfectly valid and allowed.] Doing a []LOOKUP() with n-values on an m-rows table will behave like O(n*m). The full array version could be implemented like: - rank firstly the first array - search sorted array-values in the m-rows table -- worst case: [HV]LOOKUP( _array_ , _unsorted_array_ , x , FALSE ) => rank-sort: O(m * log(m)) -- is NOT needed with [HV]LOOKUP(..., TRUE ) Correlating elements from one ranked array to the sorted array should proceed much faster than the blind []LOOKUP(). Also, this function will perform all calculations internally. Side effects shall not exist (even though internally the function will access the array data non-linearly). Sincerely, Leonard P.S. I know that the notation of my function proposal is basically the same as the notation of an ordinary array calculation. I still have some doubts what the best notation would be. = [HV]LOOKUP( _array_ ; _range_ ; n ; TRUE/FALSE ) is another option, BUT this is truly an array function. > iterating over the non-scalar input(s) and > | putting the function result into a matrix at the position corresponding > | to the input. > > Following your example above, the matrix/array formula would be > > {= VLOOKUP(A1:A3; 'my_range'; 2; FALSE)} > > entered over an array of 3 rows. Which btw is already implemented in > OpenOffice.org > > Eike > >

**References**:**Re: [office-comment] Extending LOOKUP() Functions***From:*Eike Rathke <erack@sun.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]