OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

# office-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [office-comment] Extending LOOKUP() Functions

• To: office-comment@lists.oasis-open.org
• Date: Wed, 13 Aug 2008 22:53:26 +0300

```Hello Eike, hello everyone,

There is a small, *but essential difference* [although my first post
might have not captured this clearly]:

Eike Rathke wrote:
>> ...
>> Instead of repeating the formula numerous times, it makes sense to use
>> this formula just once and apply it as an array formula:
>>  {= VLOOKUP( '_array_to_be_searched_' ; 'my_range'; 2; FALSE) }
>>
>
> This is not specific to any [HV]LOOKUP function and already covered. See
> section 3.2 Non-Scalar Evaluation (aka 'Array expressions') subsection
> 2.2) Calculations with non-scalar inputs are a generalization of (2.1).
> | When evaluating a formula in 'matrix' mode, and a non-scalar value is
> | passed to a function argument that expects a scalar, the function is
> | evaluated multiple times,

Now, this is the culprit:  evaluating *multiple times*.

My idea is to make the [HV]LOOKUP() functions more efficient.

My interpretation of that paragraph is as follows: the corresponding
function will be called again and again for every value in the first
range. [This is how I interpret existing array calculations.] Of course,
a smart caching algorithm might speed things up, but a great burden is
put on the implementer to do it right, and it still won't be perfect.

On the other hand, a "specific array function" can be optimized much
better. It is a single function working on the array. The other method
is NOT a single function and is a spreadsheet "artifact". [This might
have important implications like collateral effects and the like: I
imagine that there is NO requirement that a classical array function
needs to compute at once, or in the particular order of the first array.
So, computing the value for VLOOKUP( A10; ...) first, then VLOOKUP(A3;
...) then VLOOKUP(A22; ...) is perfectly valid and allowed.]

Doing a []LOOKUP() with n-values on an m-rows table will behave like O(n*m).

The full array version could be implemented like:
- rank firstly the first array
- search sorted array-values in the m-rows table
-- worst case: [HV]LOOKUP( _array_ , _unsorted_array_ , x , FALSE )
=> rank-sort: O(m * log(m))
-- is NOT needed with [HV]LOOKUP(..., TRUE )

Correlating elements from one ranked array to the sorted array should
proceed much faster than the blind []LOOKUP(). Also, this function will
perform all calculations internally. Side effects shall not exist (even
though internally the function will access the array data non-linearly).

Sincerely,

Leonard

P.S. I know that the notation of my function proposal is basically the
same as the notation of an ordinary array calculation. I still have some
doubts what the best notation would be.

= [HV]LOOKUP( _array_ ; _range_ ; n ; TRUE/FALSE )

is another option, BUT this is truly an array function.

> iterating over the non-scalar input(s) and
> | putting the function result into a matrix at the position corresponding
> | to the input.
>
> Following your example above, the matrix/array formula would be
>
> {= VLOOKUP(A1:A3; 'my_range'; 2; FALSE)}
>
> entered over an array of 3 rows. Which btw is already implemented in
> OpenOffice.org
>
>   Eike
>
>

```

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]