[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

*Subject*: **Semantics**

*From*:**"David A. Wheeler" <dwheeler@dwheeler.com>***To*: office-formula@lists.oasis-open.org*Date*: Mon, 27 Feb 2006 20:15:17 -0500

Okay, here's an email specifically about semantics; please reply to this email to discuss semantics. That way, those of us with threading email clients can follow the discussion. (Those of you without a threading client... good luck :-) ). Tomas Mecir wrote: >> * Semantics. How strongly should we constrain semantics, and how should we determine them? >> > Constraints should be high - that's what this is about. If things are > too loose, there's no interoperability, because results differ. > I guess I'm thinking we should "constrain semantics as much as we can... and no more". The real test should be, "will typical spreadsheets with formulas interchange correctly?" There may be some areas where we have to leave "wiggle room" in the standard, because we can't reach consensus, but as long as USERS can exchange their spreadsheets, that's not ideal but okay. But those cases should be as rare as we can make them. Generally spreadsheets agree on nearly all semantics anyway. If you SUM a range, EVERYBODY skips the text values in the range. If you add two numbers, you unsurprisingly get the total of the two numbers ;-). And so on . > Lowering constraints as low levels is fine, but the highest level > should be as strict as reasonably possible. > Yes, I think that makes sense. Where this gets awkward is in type handling and type conversions, and some examples will help: * Is there a DISTINGUISHED logical type? Excel, Gnumeric, and SheetToGo treat True/False as NOT a number, and thus SKIP them in SUM, etc. (SUM takes a list of NumberSequences, and NumberSequences skip non-numbers... since Logical is NOT a number, they are SKIPPED). OOo2, Lotus 1-2-3, Quattro Pro treat them as just another number, and INCLUDE them in SUM. * What happens if Number is needed, but Text or Reference-to-Text is provided? E.G., operator + takes two Number arguments. If given a non-number, everyone does something different (and here are my comments on their side-effects): + Excel/Gnumeric/SheetToGo: Text and reference-to-text is taken, and it tries to convert to a number. It returns the converted number, else an Error. Basically, if given text or a reference to text, but the operator/function requires type Number, they automatically call the function VALUE() on the text. So "3"+3 is 6, and if A1 has "3", then A1+3 is 6. "ab"+3 is an Error. + Lotus 1-2-3 and Quattro Pro: Text and reference-to-text is ALWAYS 0. So "3"+3 is 3, as is A1+3 and "ab"+3. This is VERY consistent; it's also consistent when you compare it to SUM's semantics. But it's confusing to naive users who don't understand that "3" and 3 are different, and it's lethal to those who still type "l" for 1 and "O" for 0 (it causes silent errors). I'm not sure it's REALLY consistent with NumberSequence semantics; I suspect that AVERAGE shows that NumberSequence SKIPS the text, instead of considering text to be 0. So the average of "5l" isn't the same as "51". + OpenOffice.org 2: Inline text is taken and a conversion is attempted, returning the number (if successful) or Error (if not). If it's a REFERENCE to a text value, then 0 is ALWAYS returned. So "3"+3 is 6, but A1+3 is 3 if A1 contains "3" (!). This inconsistency between "3"+3 and A1+3 is very odd to me. The OOo2 folks' argument is that this is more consistent with SUM. + KSpread 1.4: Always an Error. The problem here is that naive users don't understand this.. naive users generally are clueless about data types (data types are NOT documented well by ANY implementation). + KSpread 1.5-beta: Text and reference-to-text is taken, and conversion attempted; if success, returns converted number, else returns 0. The risk of this semantic is that again, conversions "seem" okay, but failure go unnoticed. The value "510" works fine, but "5l0" is silently 0 (remember, many older typists were TRAINED to type l for 1 and O for 0, and you REALLY don't want to be surprised by their results). (VALUE() is the portable string-to-number function... EVERYONE supports that function to convert a Text value into a Number value.) The problem with MANDATING these details at lower levels is that they typically don't matter. Most people use Numbers where they plan to do calculations, so handling type mixing simply doesn't occur often in practice. But in the rare cases it DOES occur, there's a risk of silent problems. OpenFormula allowed many of these at lower levels, and picked a specific semantic (the most common one, Excel/Gnumeric/SheetToGo) at level 3. That way, we didn't impose a requirement where it appeared to be unnecessary for most, but for those who wanted semantics to be very specific, that was available. At the lower levels, if you're worried about text-to-number conversions, you're required to manually invoke VALUE() yourself (which DOES work portably, everywhere)... which isn't ideal, but is probably rare in practice. Oh, why that particular semantic at level 3? One advantage of this particular semantic is that naive users either get what they were probably expecting, OR they get a warning message that there's a problem (instead of silently getting the wrong answer). There's no easy way to force SUM to directly convert text values in ANY implementation today, and the spec simply reflects that. Yes, you could abuse IMSUM to do this, but that'd be locale-dependent due to comma-vs-period issues, not everyone has IMSUM, and it's abuse anyway. If you REALLY want that, create a column/row that uses VALUE() to do the conversion... or even better, just create numbers in the first place if you want to use data as numbers. Generally it's better to make values numbers if you want to use them as numbers, so there seems to be no need to create a function to sum ranges of text. > There should also be formal definitions of what the functions compute, > and a reference algorithm should be provided, if possible. > A formal definition in terms of a mathematical expression would be good. I don't think we should mandate a particular algorithm, but mandating particular test case results are fair game, and including NON-normative references to "how to implement this well" would probably REALLY help implementors "get it right". The OpenFormula project did that with STDEV, for example.. the "obvious" way turns out to lead to square roots of negative numbers (due to numerical effects), so a reference to the "better way to do it" seemed in order. --- David A. Wheeler

**Follow-Ups**:**Re: [office-formula] Semantics***From:*"Tomas Mecir" <mecirt@gmail.com>

**Re: [office-formula] Semantics***From:*Daniel Carrera <daniel.carrera@zmsl.com>

**References**:**RE: [office-formula] Proposal: Work ONLY electronically after initial telecomm, except when we decide otherwise***From:*Mary McRae <marypmcrae@gmail.com>

**Re: [office-formula] Proposal: Work ONLY electronically after initialtelecomm, except when we decide otherwise***From:*Daniel Carrera <daniel.carrera@zmsl.com>

**Re: [office-formula] Proposal: Work ONLY electronically afterinitial telecomm, except when we decide otherwise***From:*"David A. Wheeler" <dwheeler@dwheeler.com>

**Key Issues***From:*"David A. Wheeler" <dwheeler@dwheeler.com>

**Re: [office-formula] Key Issues***From:*"Tomas Mecir" <mecirt@gmail.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]