OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office-formula message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Analyzing the CONVERT function, in detail - comments?


CONVERT is a nasty function to REALLY define properly.  But I've dug in, and have some ideas (as well as a lot of draft text).  Does anyone object to the following?

I currently plan to define CONVERT so that it includes ALL of the unit names and prefixes supported by _EITHER_ Excel _OR_ OpenOffice.org, which will make it easy for EVERYONE to switch to OpenFormula.  I believe that will cover all other implementations as well; I specifically looked at Gnumeric's docs and didn't see anything that would be missed by a merge.  Does anyone have an objection to having CONVERT support the _merge_ of Excel and OpenOffice.org?  And will I miss anything?

I think we should allow "m/sec" as well as OpenOffice.org's "m/s", allow "m/hr" as well as OpenOffice.org's "m/h", and allow "^" to prefix power values (so "m^2" and "m^3" would be permitted as well as m2 and m3).  It's nice for consistency, and more importantly it sets up the ability for applications to support general powers, multipliers, and divisions in the future (like "ft*lbf/sec^2") if future applications developers want to do that.  I find that clean, general formats avoid a world of problems anyway.  Note that these are all based on OpenOffice.org capabilities; Excel can't handle velocities or areas at all, and it's very limited in its support for volumes (it can really only handle liquid volume measures + litre; there isn't even a cubic centimeter or cubic meter).

Should we also allow "L" for liter? NIST recommends it, though neither program supports it currently.  Currently everyone uses "l", which is really easy to misread as "1".  I think we should.

Should we allow "s" for second?  Pretty common in the field, but neither app does it (that would NOT introduce a name conflict from the prefixes... I checked).  I think we should.

Should we add "ly" for lightyear? If so, there another common abbreviation we should use instead? OOo has parsec but not lightyear, for reasons I can't begin to explain. Excel has neither (I guess they're consistent in ignoring astronomy, though that's not exactly impressive).  If we're going to have parsec, may as well have lightyear, though I don't know if "ly" is the best unit name.

OpenOffice.org (OOo) and Gnumeric support the SI prefixes Y, Z, z, and y. Excel and Ecma do not, but they should, so I intend to include them in CONVERT.

I've written documentation for optionally supporting B/byte and b/bit, as well as binary prefixes for them.  Currently in my draft text it's optional, but by documenting it, I hope that it'll encourage consistency between apps that DO support it.  But I'm rethinking that approach.  We already have to require changes in CONVERT implementors so that everyone can read everything; if we do a merge, no one app does everything today.  So if that's so, I think we may as well require support for these now; that'll be one less problem for interoperation between OpenFormula implementations.  Certainly the need for doing measurements of information is only going to increase over time.  Any protest?

I've done a programmatic comparison of the CONVERT units supported by OpenOffice.org (OOo)'s CONVERT_ADD and the ones supported by CONVERT in Excel 2003 plus Ecma (Excel/Ecma aren't the same in either direction so I merged them), to make sure there are no naming conflicts.   The program generates all possible names, assuming that you can prefix ANYTHING, and see if there's a duplicate. (That's extreme, we won't REQUIRE all prefixes everywhere, but it helps to analyze the most conservative case.) Currently the only real conflict it finds is between "hp" (is that horsepower or hectoPascals?).  This conflict exists in Ecma's spec, too, and as usual they didn't do any analysis so they don't deal with it. This could be resolved by this resolution rule: "in case of conflict, interpret the unit as if it has no prefix" (e.g., horsepower would be selected in this case).  In other words, if you spell out a unit's name, presume you meant to use that unit (and not some other unit with a prefix).  This seems to work very nicely; typically if you spell out a unit you mean to use that unit, and it's not like hectoPascals are a common unit.  If we allowed "h" for hour, we would have a similar problem ("hh" is both horsepower-hour and hectoHours)... but we probably won't allow "h" for hour, and the same approach solves that problem anyway (it would be determined to be horsepower-hour).  Note that since I plan to prohibit translation "between" groups (you can't convert meters to Celsius), this would be detected immediately.

Oh, pedants: I plan to claim that Mass and Energy are different unit systems, so CONVERT won't convert between them automatically (it'll be an Error instead).  I'm fully aware of E=mc^2, but in most cases, trying to convert between mass and energy is a mistake, so CONVERT should flag it.  You can easily convert to a mass, then multiply by c^2 in the appropriate unit system when you actually need to do that.

Volume measures are insane.  It's not CONVERT's fault, it's that the units themselves in English-speaking countries are insane.  There are several different teaspoons and tablespoons; the one that appears intended is the U.S. customary, historic unit (not the Imperial measure, nor the U.S. federal law's definition).  Changing that definition for teaspoons or tablespoons would silently cause nasty changes to some spreadsheets; if important I would prefer to define NEW units that had these values, rather than change existing widely-used ones.  (Ecma doesn't bother to define these units at all, sadly.)  There are also multiple quarts, gallons, etc.; there are U.S. liquid, U.S. dry, and Imperial (and yes, they're all different).  In these cases, it's clear that U.S. liquid is what was intended. I checked, and both OpenOffice.org and Excel use the same (U.S. liquid, traditional) definitions for qt, tbs, and tsp.  So we'll fix that, and actually state what these terms mean.  It'll be the first time anywhere, as far as I can tell, that exactly what measure is intended would be defined.

I plan to avoid U.S./U.K. dry measures; no spreadsheet supports them to my knowledge, they're typically specialized for different materials, and in many cases they're being abandoned anyway.

Not supported: angle measures (degrees, radians, grad).  We have some functions that do some of that, elsewhere.

In general, I want to make sure that existing documents can be converted INTO OpenFormula, and can be easily exchanged between OpenFormula implementations.  To do that well, all existing implementations will need to add some extensions to handle the entire range of possibilities.  I think that's normal when creating a standard that merges the best bits from various implementations. But making it an implementation nightmare is NOT okay; I want to ensure that anyone can easily implement this, easily, if they want to.  I think the above additions fit the bill.

Below are the differences in the unit names supported by OpenOffice.org's CONVERT_ADD and Excel plus some Ecma requirements.  Excel 2003 doesn't actually support everything per Ecma, and vice-versa.  OOo supports MANY more units than Excel; OpenOffice.org supports 91 units in total (including things Excel doesn't do at all, like velocity), while Excel plus new Ecma requirements support only 65 units. By merging, we get them all.

I've attached the analysis program I used to check for unit name conflicts, as well as some of the input I used to create it (namely, the list of units in Excel+Ecma and the list of units in OOo).  I don't know if attachments can get through the mailing list, I guess we'll find out :-).

Here are the units only supported by OOo:
Glass
HPh
Middy
Morgen
Nmi2
Nmi3
PS
Pica2
Pica3
Rank
Reau
Schooner
Torr
acre
admkn
ang2
ang3
ar
barrel
bushel
ell
ft2
ft3
grain
ha
hweight
in2
in3
kn
m/h
m/s
m2
m3
mi2
mi3
mph
parsec
pond
psi
pweight
regton
shweight
stone
ton
yd2
yd3

Here are the units only supported by Excel+Ecma:
H
Hph
P
at
btu
cel
dy
ev
fah
flb
hh
hp
kel
lbf
lt
p
uk_pt
us_pt
w
wh


Here's the same list, sorted alphabetically. The ones with "-" are ones that only Excel supports.  The ones with "+" are the ones only OOo supports:

-H
+Glass
-Hph
+HPh
+Middy
+Morgen
-P
+Nmi2
+Nmi3
+PS
+Pica2
+Pica3
+Rank
+Reau
+Schooner
+Torr
+acre
+admkn
-at
+ang2
+ang3
+ar
-btu
+barrel
+bushel
-cel
-dy
-ev
-fah
-flb
+ell
+ft2
+ft3
-hh
-hp
+grain
+ha
+hweight
-kel
+in2
+in3
+kn
-lbf
-lt
+m/h
+m/s
+m2
+m3
+mi2
+mi3
+mph
-p
+parsec
+pond
+psi
+pweight
+regton
+shweight
+stone
+ton
-uk_pt
-us_pt
-w
-wh
+yd2
+yd3



--- David A. Wheeler 

convertfunc-excel

convertfunc-ooo

check-convert-list.py



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]