office-formula message

Subject: Re: [office-formula] New draft of openformula specification

From: robert_weir@us.ibm.com
To: office-formula@lists.oasis-open.org
Date: Fri, 14 Jul 2006 16:23:20 -0400

"David A. Wheeler" <dwheeler@dwheeler.com> wrote on 07/13/2006 10:59:41 PM: > * For case-sensitive=false, I've claimed that it should work as if everything > was converted to lowercase first. But is that what should take place? > How should comparing with "_" be handled, for example? > Also - is what is considered "lower case" locale-sensitive? I > suspect it is...!
I assume you are talking about 6.3.8 and how text is compared?

You could phrase it as "if all alphabetic characters are converted to lowercase..." That removes the worry of punctuation and other characters. There might also be something in the Unicode spec we can reference that gives a better definition of "lower case".

But that doesn't get you out of the water.

As you note, there are global sensitivites in case-conversions. In some case case-conversions are ambiguous in one direction. For example, lower case Greek has two lower case sigmas, one used only for sigma as final letter in a word. They both convert to the same upper case sigma. So you can imagine the fun when you need to convert the capital sigma to lowercase -- you get a different answer depending on position.

Keep in mind that there is plain lexical (character by character sorting, according to the numeric value of each character in some character set) string comparisons, and then there are locale-sensitive comparisons, sometimes called collations. These take into account things like where the preferred sort order is not the same as the lexical order. For example in German, an O-umlaut would sort with Oe, and the esszett which looks like a lowercase Greek beta character but sorts with "ss". A lexical sort will not give a user what they would expect to see in a phone book or a dictionary. So I think we need to say what exactly we want when we compare strings. Even when case-sensitive, we need to say whether we are doing a lexical comparison or a collation. There is something called the Unicode Collation Algorithm, so we could reference that.

So, four choices on string compares:

1) lexical
2) collation
3) implementation defined
4) make it a mode, like "case-sensitive"

It it was up to me, I'd get rid of the 'case-sensitive' mode and do it all via lexical compares. We already have an UPPER() and LOWER() function, so anyone who wants a case-insensitive comparison already has these tools at hand.

-Rob

Follow-Ups:
- Re: [office-formula] New draft of openformula specification
  - From: "David A. Wheeler" <dwheeler@dwheeler.com>

References:
- New draft of openformula specification
  - From: "David A. Wheeler" <dwheeler@dwheeler.com>