office-comment message

Subject: Re: [office-comment] Re: Very Weak String Support in ODF
From: Patrick Durusau <patrick@durusau.net>
To: Leonard Mada <discoleo@gmx.net>
Date: Sat, 09 Jun 2007 06:15:50 -0400
Leonard,

Leonard Mada wrote:

> David A. Wheeler wrote:
>
>> I'm not fundamentally opposed to adding a few string functions 
>> (primarily to the "Large" group), but generally we've only been 
>> including functions which are ALREADY implemented in at least ONE 
>> spreadsheet application (Excel, Gnumeric, Corel Word Perfect Quattro 
>> Pro, KSpread, Lotus 1-2-3, etc.).  There's no end to the number of 
>> functions that COULD exist, and the lack of presence in ANY 
>> implementation is a sign that perhaps this isn't as widespread a need 
>> as you'd think. Spreadsheets tend to not be used for a lot of text 
>> processing (though which is the cause and which is the effect could 
>> be argued, I guess).
>>   
>
>
> I like to correct this.
>
> Spreadsheets are *THE MOST* used input medium that researches use in 
> biomedical and life-sciences. This is both true of my country and of 
> ALL western countries  I am aware of. Actually, spreadsheet use will 
> likely increase in the future, as more data gets digital. My position 
> permits me to fairly accurately predict this.
>
Ok, then there should be some listing or a means to derive such a 
listing of the functions that are in actual use. Yes?

In other words, simply consulting a string function textbook isn't going 
to be much help in determining what should be in or out of a formula 
standard.

For example, if there was a listing by frequency of use, of string 
functions used by genome researchers, then an attempt could be made to 
add some portion of those to a standard.

But, realize that standardizing a string function that does not have a 
generally accepted semantics would probably be a bad idea. That is to 
say we should not standardize a function that is going to give some 
people an expected result but mislead others as to the actually result 
of the function. It is always possible that someone will get an 
unexpected result but that should be the exception rather than the rule. 
But I think we should avoid taking sides where there is no a clear 
consensus on the semantics of a particular function.

You are probably aware of all the variations in regex syntaxes, 
including the choices made in XML Schema that are inconsistent with most 
other regex languages. That sort of variance doesn't help anyone.

So, I think yes, some string functions might attract enough support to 
be included but:

1. There needs to be some showing of usage so we can judge between the 
essential or popular vs. 1 person uses this sometimes (there is effort 
involved in adding this sort of thing to a standard), and

2. As much information on how to define the function, including 
references to where more information can be found about the suggested 
function.

Since I was trained as a text critic I am not unsympathetic to the need 
for string functions but I also think that the more detailed and helpful 
a request is in that regard the more likely it is to attract the 
interest of the committee.

> Of course, custom solutions could replace spreadsheets, BUT then only 
> because spreadsheets did NOT correct the primary design flaws.
>
> So, the current stance prohibits the effective use of spreadsheets in 
> a big segment of users. The life sciences are NOT the only one 
> affected, actually, a much broader segment of researchers are struck 
> with these shortcomings, and even commercial businesses. I am working 
> in a governmental office and some 30-40% of ALL spreadsheets, done by 
> ALL employees, contain significant portions of text. (I mean with text 
> that is analysable, NOT just labels or descriptive text!)
>
>> Also, there's always a risk that "committee invention" will have 
>> implementation or usage problems.  Standards are generally better 
>> when they stick with what's already in use, and make sure that 
>> they're extensible for future experimentation.  Yes, such functions 
>> certainly exist in other languages (so the risk is lower), but 
>> there's still a risk that a definition would have a hidden problem 
>> unless at least ONE supplier has implemented it with the other 
>> functions that ARE defined.  #3 in particular presupposes support for 
>> arrays of variable size, which certainly NOT all implementations 
>> support (nor do they need to for many use cases).  #1 and #2 are easy 
>> enough, though I'd like to know what to name #1.
>>   
>
>
> STANDARDS should NOT depend on the implementation. That would be a 
> poor standard. It usually should be the other way round. ;-)
>
Well, actually there are two quite legitimate positions in that regard.

One position, obviously the one you prefer, is that standards should be 
out in front of practice. Examples of that include the processor 
standards that are fixed years in advance of actual design and 
production of microprocessors.

The other, at least as well represented as the first, is that standards 
codify existing practice so that everyone does some activity the same 
way. Usually after attempts with varying success of a variety of 
methods. One emerges as a defacto standard with some variation and a 
standard is made to fix all of the details to enhance interoperability.

OpenDocument is something of a mix of the two. There are innovations 
forthcoming in metadata support, for example, but as I understand the 
formula work (I am not actually a participant in that SC) it is a 
question of imposing some minimal order on the chaos that is the realm 
of formulas. Note that I said *minimal* order. There was no attempt to 
ferret out every possible formula or function.

It might be helpful to remember that with few exceptions most of the 
people working on OpenDocument have day jobs that are not primarily 
related to standards work. So the effort doesn't go around hunting for 
things to work on. ;-) On the other hand, suggestions, particularly 
those with enough details to both justify and assist in the adding 
something to the standard are very likely to attract favorable attention.

Hope you are having a great weekend!

Patrick

-- 
Patrick Durusau
Patrick@Durusau.net
Chair, V1 - Text Processing: Office and Publishing Systems Interface
Co-Editor, ISO 13250, Topic Maps -- Reference Model
Member, Text Encoding Initiative Board of Directors, 2003-2005

Topic Maps: Human, not artificial, intelligence at work!
References:
- Re: Very Weak String Support in ODF
  - From: Leonard Mada <discoleo@gmx.net>