office-formula message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: Re: [office-formula] New draft of openformula specification
- From: robert_weir@us.ibm.com
- To: office-formula@lists.oasis-open.org
- Date: Fri, 14 Jul 2006 16:23:20 -0400
"David A. Wheeler" <dwheeler@dwheeler.com>
wrote on 07/13/2006 10:59:41 PM:
> * For case-sensitive=false, I've claimed that it should work as if
everything
> was converted to lowercase first. But is that what
should take place?
> How should comparing with "_" be handled, for
example?
> Also - is what is considered "lower case" locale-sensitive?
I
> suspect it is...!
I assume you are talking about 6.3.8 and how text
is compared?
You could phrase it as "if all alphabetic characters
are converted to lowercase..." That removes the worry of punctuation
and other characters. There might also be something in the Unicode
spec we can reference that gives a better definition of "lower case".
But that doesn't get you out of the water.
As you note, there are global sensitivites in case-conversions.
In some case case-conversions are ambiguous in one direction. For
example, lower case Greek has two lower case sigmas, one used only for
sigma as final letter in a word. They both convert to the same upper
case sigma. So you can imagine the fun when you need to convert the
capital sigma to lowercase -- you get a different answer depending on position.
Keep in mind that there is plain lexical (character
by character sorting, according to the numeric value of each character
in some character set) string comparisons, and then there are locale-sensitive
comparisons, sometimes called collations. These take into account
things like where the preferred sort order is not the same as the lexical
order. For example in German, an O-umlaut would sort with Oe, and
the esszett which looks like a lowercase Greek beta character but
sorts with "ss". A lexical sort will not give a user what
they would expect to see in a phone book or a dictionary. So I think
we need to say what exactly we want when we compare strings. Even
when case-sensitive, we need to say whether we are doing a lexical comparison
or a collation. There is something called the Unicode Collation Algorithm,
so we could reference that.
So, four choices on string compares:
1) lexical
2) collation
3) implementation defined
4) make it a mode, like "case-sensitive"
It it was up to me, I'd get rid of the 'case-sensitive'
mode and do it all via lexical compares. We already have an UPPER()
and LOWER() function, so anyone who wants a case-insensitive comparison
already has these tools at hand.
-Rob
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]