office-formula message

Subject: Re: [office-formula] New draft of openformula specification

From: "David A. Wheeler" <dwheeler@dwheeler.com>
To: robert_weir@us.ibm.com
Date: Fri, 14 Jul 2006 18:12:51 -0400 (EDT)

I said:
> > * For case-sensitive=false, I've claimed that it should work as if 
> everything
> >    was converted to lowercase first.  But is that what should take 
> place?
> >    How should comparing with "_" be handled, for example?
> >    Also - is what is considered "lower case" locale-sensitive? I 
> > suspect it is...!
> 
> I assume you are talking about 6.3.8 and how text is compared?
> 
> You could phrase it as "if all alphabetic characters are converted to 
> lowercase..."  That removes the worry of punctuation and other characters. 
>  There might also be something in the Unicode spec we can reference that 
> gives a better definition of "lower case". 
> 
> But that doesn't get you out of the water.
> 
> As you note, there are global sensitivites in case-conversions.  In some 
> case case-conversions are ambiguous in one direction.  For example, lower 
> case Greek has two lower case sigmas, one used only for sigma as final 
> letter in a word.  They both convert to the same upper case sigma.  So you 
> can imagine the fun when you need to convert the capital sigma to 
> lowercase -- you get a different answer depending on position.

Good point. It's even worse than that, because I've studied a little (koine) Greek.
Traditionally, one lower-case sigma character is used at the end of the word,
and the other character everywhere else, but in fact for certain compound words
(etc.) the "terminal" lowercase sigma is sometimes used in the MIDDLE
of a word in Greek, not JUST the end.  This is
not a problem for Unicode assignment, they simply have different codes,
but you're right, that makes comparisons... umm... "interesting".

I'm going to back off the comparison description slightly, and insert a TODO
asking for help.  Can anyone give me good text on how comparison should
work in spreadsheets in a modern international world when it's NOT case-sensitive?
I've put in SOME text that makes sense, but help wanted.
I'll release a new version of the spec in just a moment.

> So, four choices on string compares:
> 
> 1) lexical
> 2) collation 
> 3) implementation defined
> 4) make it a mode, like "case-sensitive"
> 
> It it was up to me, I'd get rid of the 'case-sensitive' mode and do it all 
> via lexical compares.  We already have an UPPER() and LOWER() function, so 
> anyone who wants a case-insensitive comparison already has these tools at 
> hand. 

These are not so accessible for use as database criteria, though.
And many people depend on spreadsheets which are not case-sensitive.


--- David A. Wheeler

Follow-Ups:
- Re: [office-formula] New draft of openformula specification
  - From: robert_weir@us.ibm.com

References:
- New draft of openformula specification
  - From: "David A. Wheeler" <dwheeler@dwheeler.com>
- Re: [office-formula] New draft of openformula specification
  - From: robert_weir@us.ibm.com