office-comment message

Subject: Re: [office-comment] Extending LOOKUP() Functions
From: Leonard Mada <discoleo@gmx.net>
To: office-comment@lists.oasis-open.org
Date: Wed, 13 Aug 2008 22:53:26 +0300
Hello Eike, hello everyone,

There is a small, *but essential difference* [although my first post 
might have not captured this clearly]:


Eike Rathke wrote:
>> ...
>> Instead of repeating the formula numerous times, it makes sense to use  
>> this formula just once and apply it as an array formula:
>>  {= VLOOKUP( '_array_to_be_searched_' ; 'my_range'; 2; FALSE) }
>>     
>
> This is not specific to any [HV]LOOKUP function and already covered. See
> section 3.2 Non-Scalar Evaluation (aka 'Array expressions') subsection
> 2.2) Calculations with non-scalar inputs are a generalization of (2.1).
> | When evaluating a formula in 'matrix' mode, and a non-scalar value is
> | passed to a function argument that expects a scalar, the function is
> | evaluated multiple times,

Now, this is the culprit:  evaluating *multiple times*.

My idea is to make the [HV]LOOKUP() functions more efficient.

My interpretation of that paragraph is as follows: the corresponding 
function will be called again and again for every value in the first 
range. [This is how I interpret existing array calculations.] Of course, 
a smart caching algorithm might speed things up, but a great burden is 
put on the implementer to do it right, and it still won't be perfect.

On the other hand, a "specific array function" can be optimized much 
better. It is a single function working on the array. The other method 
is NOT a single function and is a spreadsheet "artifact". [This might 
have important implications like collateral effects and the like: I 
imagine that there is NO requirement that a classical array function 
needs to compute at once, or in the particular order of the first array. 
So, computing the value for VLOOKUP( A10; ...) first, then VLOOKUP(A3; 
...) then VLOOKUP(A22; ...) is perfectly valid and allowed.]

Doing a []LOOKUP() with n-values on an m-rows table will behave like O(n*m).

The full array version could be implemented like:
 - rank firstly the first array
 - search sorted array-values in the m-rows table
   -- worst case: [HV]LOOKUP( _array_ , _unsorted_array_ , x , FALSE )
       => rank-sort: O(m * log(m))
   -- is NOT needed with [HV]LOOKUP(..., TRUE )

Correlating elements from one ranked array to the sorted array should 
proceed much faster than the blind []LOOKUP(). Also, this function will 
perform all calculations internally. Side effects shall not exist (even 
though internally the function will access the array data non-linearly).

Sincerely,

Leonard

P.S. I know that the notation of my function proposal is basically the 
same as the notation of an ordinary array calculation. I still have some 
doubts what the best notation would be.

= [HV]LOOKUP( _array_ ; _range_ ; n ; TRUE/FALSE )

is another option, BUT this is truly an array function.


> iterating over the non-scalar input(s) and
> | putting the function result into a matrix at the position corresponding
> | to the input.
>
> Following your example above, the matrix/array formula would be
>
> {= VLOOKUP(A1:A3; 'my_range'; 2; FALSE)}
>
> entered over an array of 3 rows. Which btw is already implemented in
> OpenOffice.org
>
>   Eike
>
>
References:
- Re: [office-comment] Extending LOOKUP() Functions
  - From: Eike Rathke <erack@sun.com>